Skip to main content

Getting Started

kham v0.8.2 — pick your target and be up and running in minutes.

📦

Pre-built binaries available

Download ready-to-use binaries from the GitHub Releases page — no Rust toolchain required.

Package Platforms File
kham-cli macOS arm64 / x86_64 · Linux x86_64 · Windows x86_64 kham-cli-<version>-<target>.tar.gz / .zip
kham-sqlite macOS · Linux · Windows · Android (arm64-v8a, armeabi-v7a, x86_64, x86) libkham_sqlite.dylib / .so / .dll
kham-pg Linux x86_64 & aarch64 · PostgreSQL 14–18 kham-pg-<version>-pg<N>-<arch>-unknown-linux-gnu.tar.gz
kham-pg Docker linux/amd64 & arm64 · PostgreSQL 14–18 nickmsft/kham-pg:<version>-pg<N>
🦀

Rust

Native crate — zero-copy, no_std core

Install

[dependencies]
kham-core = "0.8"

Usage

use kham_core::Tokenizer;

fn main() {
    let tok = Tokenizer::new();

    // Segment into &str slices (zero-copy)
    let tokens = tok.segment("กินข้าวกับปลา");
    println!("{:?}", tokens);
    // ["กิน", "ข้าว", "กับ", "ปลา"]

    // Rich tokens with kind + byte/char spans
    for t in tok.segment("Hello กรุงเทพ 2024") {
        println!("{:?}  kind={:?}  chars={}..{}", t.text, t.kind, t.char_span.start, t.char_span.end);
    }
}
Full integration guide →
🐍

Python

PyO3 bindings — segment() and segment_tokens()

Install

pip install kham

Usage

import kham

# Segment into a list of strings
tokens = kham.segment("กินข้าวกับปลา")
print(tokens)
# ['กิน', 'ข้าว', 'กับ', 'ปลา']

# Rich Token objects
for tok in kham.segment_tokens("Hello กรุงเทพ 2024"):
    print(tok.text, tok.kind, tok.char_start, tok.char_end)
Full integration guide →
🌐

WebAssembly / npm

Runs in browser and Node.js — no server needed

Install

npm install kham-wasm

Usage

import init, { segment, segment_tokens } from 'kham-wasm';

// Initialise once (fetches the .wasm file)
await init();

const words = segment("กินข้าวกับปลา");
console.log(words);
// ["กิน", "ข้าว", "กับ", "ปลา"]

const tokens = segment_tokens("Hello กรุงเทพ 2024");
tokens.forEach(t => console.log(t.text, t.kind, t.char_start, t.char_end));
Full integration guide →
💻

CLI

Command-line Thai segmenter — segment, spell-check, extract keywords, romanize

Install

cargo install kham-cli

Usage

# Segment Thai text
kham "กินข้าวกับปลา"
# กิน|ข้าว|กับ|ปลา

# Show token kind and confidence score
kham --kind --confidence "กินข้าวกับปลา"
# กิน:Thai:conf=0.92|ข้าว:Thai:conf=0.98|กับ:Thai:conf=1.00|ปลา:Thai:conf=0.99

# FTS mode — kind, POS, NE, stop, syn per token
kham --fts "กินข้าวกับปลา"

# Spell-check a word
kham --spell "กีนข้าว"

# Extract keywords and keyphrases
kham --keywords "นักวิทยาศาสตร์ค้นพบดาวเคราะห์ใหม่ในระบบสุริยะ"

# Romanize Thai to RTGS Latin
kham --romanize "กินข้าวกับปลา"
# kin khao kap pla
Full integration guide →
🐘

PostgreSQL FTS

Full-text search parser extension for PostgreSQL 14+

Install

# Build and install (requires pg_config in PATH)
cargo build -p kham-pg --release
make -C kham-pg install

Usage

-- Load extension
LOAD 'kham';

-- Create a text search configuration
CREATE TEXT SEARCH CONFIGURATION kham_cfg (PARSER = kham);
ALTER TEXT SEARCH CONFIGURATION kham_cfg
  ADD MAPPING FOR thai WITH simple;

-- Index and search
CREATE TABLE docs (id SERIAL, body TEXT, tsv TSVECTOR);
UPDATE docs SET tsv = to_tsvector('kham_cfg', body);
CREATE INDEX docs_tsv ON docs USING GIN(tsv);

SELECT id, ts_headline('kham_cfg', body, query) AS snippet
FROM docs, to_tsquery('kham_cfg', 'ข้าว') query
WHERE tsv @@ query;
Full integration guide →
🗃️

SQLite FTS5

Loadable tokenizer extension for SQLite FTS5

Install

# Build (requires SQLite headers)
cargo build -p kham-sqlite --release
# macOS: brew install sqlite  (system sqlite3 disables load_extension)

Usage

.load ./target/release/libkham_sqlite

CREATE VIRTUAL TABLE docs USING fts5(
  body,
  tokenize = 'kham'
);

INSERT INTO docs VALUES ('กินข้าวกับปลา');
INSERT INTO docs VALUES ('กรุงเทพมหานครเป็นเมืองหลวง');

SELECT * FROM docs WHERE docs MATCH 'ปลา';
Full integration guide →