Getting Started
kham v0.8.2 — pick your target and be up and running in minutes.
📦
Pre-built binaries available
Download ready-to-use binaries from the GitHub Releases page — no Rust toolchain required.
| Package | Platforms | File |
|---|---|---|
| kham-cli | macOS arm64 / x86_64 · Linux x86_64 · Windows x86_64 | kham-cli-<version>-<target>.tar.gz / .zip |
| kham-sqlite | macOS · Linux · Windows · Android (arm64-v8a, armeabi-v7a, x86_64, x86) | libkham_sqlite.dylib / .so / .dll |
| kham-pg | Linux x86_64 & aarch64 · PostgreSQL 14–18 | kham-pg-<version>-pg<N>-<arch>-unknown-linux-gnu.tar.gz |
| kham-pg Docker | linux/amd64 & arm64 · PostgreSQL 14–18 | nickmsft/kham-pg:<version>-pg<N> |
🦀
Rust
Native crate — zero-copy, no_std core
Install
[dependencies]
kham-core = "0.8" Usage
use kham_core::Tokenizer;
fn main() {
let tok = Tokenizer::new();
// Segment into &str slices (zero-copy)
let tokens = tok.segment("กินข้าวกับปลา");
println!("{:?}", tokens);
// ["กิน", "ข้าว", "กับ", "ปลา"]
// Rich tokens with kind + byte/char spans
for t in tok.segment("Hello กรุงเทพ 2024") {
println!("{:?} kind={:?} chars={}..{}", t.text, t.kind, t.char_span.start, t.char_span.end);
}
} 🐍
Python
PyO3 bindings — segment() and segment_tokens()
Install
pip install kham Usage
import kham
# Segment into a list of strings
tokens = kham.segment("กินข้าวกับปลา")
print(tokens)
# ['กิน', 'ข้าว', 'กับ', 'ปลา']
# Rich Token objects
for tok in kham.segment_tokens("Hello กรุงเทพ 2024"):
print(tok.text, tok.kind, tok.char_start, tok.char_end) 🌐
WebAssembly / npm
Runs in browser and Node.js — no server needed
Install
npm install kham-wasm Usage
import init, { segment, segment_tokens } from 'kham-wasm';
// Initialise once (fetches the .wasm file)
await init();
const words = segment("กินข้าวกับปลา");
console.log(words);
// ["กิน", "ข้าว", "กับ", "ปลา"]
const tokens = segment_tokens("Hello กรุงเทพ 2024");
tokens.forEach(t => console.log(t.text, t.kind, t.char_start, t.char_end)); 💻
CLI
Command-line Thai segmenter — segment, spell-check, extract keywords, romanize
Install
cargo install kham-cli Usage
# Segment Thai text
kham "กินข้าวกับปลา"
# กิน|ข้าว|กับ|ปลา
# Show token kind and confidence score
kham --kind --confidence "กินข้าวกับปลา"
# กิน:Thai:conf=0.92|ข้าว:Thai:conf=0.98|กับ:Thai:conf=1.00|ปลา:Thai:conf=0.99
# FTS mode — kind, POS, NE, stop, syn per token
kham --fts "กินข้าวกับปลา"
# Spell-check a word
kham --spell "กีนข้าว"
# Extract keywords and keyphrases
kham --keywords "นักวิทยาศาสตร์ค้นพบดาวเคราะห์ใหม่ในระบบสุริยะ"
# Romanize Thai to RTGS Latin
kham --romanize "กินข้าวกับปลา"
# kin khao kap pla 🐘
PostgreSQL FTS
Full-text search parser extension for PostgreSQL 14+
Install
# Build and install (requires pg_config in PATH)
cargo build -p kham-pg --release
make -C kham-pg install Usage
-- Load extension
LOAD 'kham';
-- Create a text search configuration
CREATE TEXT SEARCH CONFIGURATION kham_cfg (PARSER = kham);
ALTER TEXT SEARCH CONFIGURATION kham_cfg
ADD MAPPING FOR thai WITH simple;
-- Index and search
CREATE TABLE docs (id SERIAL, body TEXT, tsv TSVECTOR);
UPDATE docs SET tsv = to_tsvector('kham_cfg', body);
CREATE INDEX docs_tsv ON docs USING GIN(tsv);
SELECT id, ts_headline('kham_cfg', body, query) AS snippet
FROM docs, to_tsquery('kham_cfg', 'ข้าว') query
WHERE tsv @@ query; 🗃️
SQLite FTS5
Loadable tokenizer extension for SQLite FTS5
Install
# Build (requires SQLite headers)
cargo build -p kham-sqlite --release
# macOS: brew install sqlite (system sqlite3 disables load_extension) Usage
.load ./target/release/libkham_sqlite
CREATE VIRTUAL TABLE docs USING fts5(
body,
tokenize = 'kham'
);
INSERT INTO docs VALUES ('กินข้าวกับปลา');
INSERT INTO docs VALUES ('กรุงเทพมหานครเป็นเมืองหลวง');
SELECT * FROM docs WHERE docs MATCH 'ปลา';