Live Demo
Thai NLP running entirely in your browser via WebAssembly — no server required. Segment text, explore POS & NE tags, split sentences, or test phonetic matching.
Each sentence ends at a new line,
! ?,
. followed by a space, or Thai markers
ฯ ๚ ๛.
Thai text with no markers will appear as a single sentence.
Samples:
Splits on: newline · ! ? . · Thai markers ฯ ๚ ๛ · Plain Thai prose without punctuation stays as one sentence.
Phonetic Soundex
12 consonant groups · 4-char code · most widely used
Code —
vs
Code —
Try:
Text Normalizer
Collapses duplicate tone marks and composes nikhahit + sara aa into sara am (อำ).
Try:
Before
—
After
—
removed / collapsed
composed
Token kinds
- Thai · Named entity
- Latin
- Number
- Punctuation
- Emoji
FTS mode extras
- POS — 13 ORCHID-derived categories
- NE — Person · Place · Org
- Stop — built-in stopword list
- Roman — RTGS romanization toggle
- Synonyms — number normalization
Soundex algorithms
- LK82 — 12 groups · 4-char code
- Udom83 — 14 groups · finer sibilants
- MetaSound — 3 chars per syllable
- Used for fuzzy / phonetic FTS search