The Fastest Thai NLP Library
for Production
kham is an open-source Thai NLP engine written in Rust — zero external dependencies,
no_std core,
and a complete pipeline from raw text to structured tokens.
Complete Thai NLP pipeline
Every module is available in Rust, Python, WebAssembly, and C FFI — one library, all platforms.
Word Segmentation
Maximal matching over a 62,102-word DAWG dictionary. 33–34 MiB/s on Apple M-series. F1 1.000 on 228 curated test cases.
POS Tagging
13-category part-of-speech tagger derived from the ORCHID tagset. Integrated into the FTS pipeline for lexeme-level filtering.
Named Entity Recognition
Person, Place, and Organization tags via a built-in gazetteer. NE tokens emit colocated lexemes in FTS for entity-aware search.
Spell Correction
Levenshtein ≤ 2 candidates re-ranked by lk82 phonetic similarity and TNC corpus frequency. Single-word and full-text modes.
Phonetic Encoding
Three Thai soundex systems: lk82 (general), udom83 (finer sibilants), MetaSound (per-syllable). Cross-language Thai–English phonetics.
RTGS Romanization
Royal Thai General System of Transcription. Sentence-level romanization: Thai/Named tokens transliterated, others pass through unchanged.
Keyword Extraction
TF × inverse-corpus-frequency scoring, stopwords excluded. Unigram and n-gram (bigram/trigram) keyphrase extraction.
Number Normalization
Thai digit strings to ASCII equivalents, Thai word-to-number conversion, and baht text conversion for financial documents.
kham vs PyThaiNLP
Feature comparison for developers choosing a Thai NLP library.
| Feature | kham | PyThaiNLP |
|---|---|---|
| Segmentation speed | 33–34 MiB/s | ~2 MiB/s (newmm) |
| F1 accuracy | 1.000 (228 test cases) | ≈ 0.94 (on kham test set) |
| Zero dependencies | Yes — no_std core | No — requires Python + torchnlp etc. |
| WebAssembly | Yes — 300 KB binary | No |
| PostgreSQL FTS | Native parser extension | No |
| SQLite FTS5 | Loadable extension | No |
| Spell correction | Yes — Levenshtein + phonetic | Yes — edit distance only |
| Phonetic encoding | lk82, udom83, MetaSound | lk82 only |
| Offline / embedded | Yes — no network, no server | Partial |
| License | MIT OR Apache-2.0 | Apache-2.0 |
PyThaiNLP data from public documentation and benchmarks. Run your own: python scripts/compare_pythainlp.py
Get started in 60 seconds
Pick your platform — the API is consistent across all targets.
[dependencies]
kham-core = "0.8" pip install kham npm install kham-wasm docker run --rm -e POSTGRES_PASSWORD=secret \
-p 5432:5432 nickmsft/kham-pg:latest One library, every platform
kham-core compiles to native Rust, WASM, Python wheels, a C header, and two database extensions — from a single codebase.
Try Thai NLP in your browser
Powered by kham-wasm — no server, no install required.
Ready to add Thai NLP to your project?
Free, open-source, MIT OR Apache-2.0. No API keys, no rate limits, no server required.