🌐

WebAssembly / npm

kham-wasm brings Thai word segmentation to the browser and Node.js with zero server dependencies. The WASM binary is ~300 KB and includes the full dictionary.

Install

npm install kham-wasm
# or: yarn add kham-wasm  /  pnpm add kham-wasm

Browser — ES module

Import init and call it once to load the .wasm file, then use segment() or segment_tokens() freely.

import init, { segment, segment_tokens } from 'kham-wasm';

// Load the WebAssembly module (fetches kham_wasm_bg.wasm)
await init();

// Segment into strings
const words = segment("กินข้าวกับปลา");
console.log(words);
// ["กิน", "ข้าว", "กับ", "ปลา"]

// Rich tokens
const tokens = segment_tokens("Hello กรุงเทพ 2024");
tokens.forEach(t => {
  console.log(t.text, t.kind, t.char_start, t.char_end);
});

Lazy loading (recommended)

Load the WASM in the background so it does not block the initial page render.

let kham: Awaited<ReturnType<typeof import('kham-wasm')['default']>> | null = null;

async function getKham() {
  if (kham) return kham;
  const mod = await import('kham-wasm');
  await mod.default();       // init
  kham = mod;
  return kham;
}

// Preload silently in the background
getKham().catch(console.error);

async function onSegment(text: string) {
  const { segment_tokens } = await getKham();
  return segment_tokens(text);
}

Token fields

// Token object returned by segment_tokens()
tok.text        // string — the token text
tok.kind        // string — "Thai" | "Latin" | "Number" | "Punctuation" | "Emoji" | "Whitespace" | "Unknown"
tok.char_start  // number — Unicode scalar-value start (JS string index for BMP text)
tok.char_end    // number — Unicode scalar-value end
tok.byte_start  // number — UTF-8 byte start offset
tok.byte_end    // number — UTF-8 byte end offset

Build from source

# Requires Rust + wasm-pack
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

git clone https://github.com/preedep/kham
cd kham
wasm-pack build kham-wasm --target web --release
# Output: kham-wasm/pkg/

← All targets Try the live demo →