Skip to main content
v0.8.2 · Release notes →

The Fastest Thai NLP Library for Production

kham is an open-source Thai NLP engine written in Rust — zero external dependencies, no_std core, and a complete pipeline from raw text to structured tokens.

33 MiB/s
Segmentation throughput
F1 1.000
Accuracy on 228 test cases
62k words
Built-in dictionary
6 targets
Rust · Python · WASM · C · PG · SQLite

kham vs PyThaiNLP

Feature comparison for developers choosing a Thai NLP library.

Feature kham PyThaiNLP
Segmentation speed 33–34 MiB/s ~2 MiB/s (newmm)
F1 accuracy 1.000 (228 test cases) ≈ 0.94 (on kham test set)
Zero dependencies Yes — no_std core No — requires Python + torchnlp etc.
WebAssembly Yes — 300 KB binary No
PostgreSQL FTS Native parser extension No
SQLite FTS5 Loadable extension No
Spell correction Yes — Levenshtein + phonetic Yes — edit distance only
Phonetic encoding lk82, udom83, MetaSound lk82 only
Offline / embedded Yes — no network, no server Partial
License MIT OR Apache-2.0 Apache-2.0

PyThaiNLP data from public documentation and benchmarks. Run your own: python scripts/compare_pythainlp.py

Get started in 60 seconds

Pick your platform — the API is consistent across all targets.

Cargo.toml
[dependencies]
kham-core = "0.8"
pip
pip install kham
npm
npm install kham-wasm
Docker
docker run --rm -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 nickmsft/kham-pg:latest

Try Thai NLP in your browser

Powered by kham-wasm — no server, no install required.

Samples:

Ready to add Thai NLP to your project?

Free, open-source, MIT OR Apache-2.0. No API keys, no rate limits, no server required.