🌐
WebAssembly / npm
kham-wasm brings Thai word segmentation to the browser and Node.js with zero server dependencies. The WASM binary is ~300 KB and includes the full dictionary.
1
Install
npm install kham-wasm
# or: yarn add kham-wasm / pnpm add kham-wasm 2
Browser — ES module
Import init and call it once to load the .wasm file, then use segment() or segment_tokens() freely.
import init, { segment, segment_tokens } from 'kham-wasm';
// Load the WebAssembly module (fetches kham_wasm_bg.wasm)
await init();
// Segment into strings
const words = segment("กินข้าวกับปลา");
console.log(words);
// ["กิน", "ข้าว", "กับ", "ปลา"]
// Rich tokens
const tokens = segment_tokens("Hello กรุงเทพ 2024");
tokens.forEach(t => {
console.log(t.text, t.kind, t.char_start, t.char_end);
}); 3
Lazy loading (recommended)
Load the WASM in the background so it does not block the initial page render.
let kham: Awaited<ReturnType<typeof import('kham-wasm')['default']>> | null = null;
async function getKham() {
if (kham) return kham;
const mod = await import('kham-wasm');
await mod.default(); // init
kham = mod;
return kham;
}
// Preload silently in the background
getKham().catch(console.error);
async function onSegment(text: string) {
const { segment_tokens } = await getKham();
return segment_tokens(text);
} 4
Token fields
// Token object returned by segment_tokens()
tok.text // string — the token text
tok.kind // string — "Thai" | "Latin" | "Number" | "Punctuation" | "Emoji" | "Whitespace" | "Unknown"
tok.char_start // number — Unicode scalar-value start (JS string index for BMP text)
tok.char_end // number — Unicode scalar-value end
tok.byte_start // number — UTF-8 byte start offset
tok.byte_end // number — UTF-8 byte end offset 5
Build from source
# Requires Rust + wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh
git clone https://github.com/preedep/kham
cd kham
wasm-pack build kham-wasm --target web --release
# Output: kham-wasm/pkg/