Thai Full-Text Search
that Actually Works
Thai text has no spaces. Standard FTS engines produce garbage results. kham replaces the parser with a proper Thai segmenter — phonetic matching, stopwords, and POS filtering included.
The problem with standard FTS for Thai
Thai text has no whitespace between words. Standard FTS parsers — including PostgreSQL's built-in parser and SQLite's default tokenizer — split text on whitespace. On Thai text, this means the entire sentence becomes a single un-searchable token.
How kham solves it
kham replaces the parser with a proper Thai segmenter. Each word is a separate, searchable token, enriched with phonetic codes and POS tags.
'4800':4 'pla':4 'pos_noun':4
Search features
Every feature works in both PostgreSQL and SQLite.
Phonetic matching
Thai soundex codes (lk82 / udom83 / MetaSound) stored alongside each word — near-homophones and misspellings match automatically.
Stopword suppression
600+ Thai stopwords suppressed from the index. Common words like กับ, และ, ใน, ที่ do not pollute result ranking.
POS lexeme filtering
Part-of-speech tags are stored as colocated lexemes — filter results to nouns, verbs, or named entities in one tsquery.
RTGS romanization
Latin-script queries match Thai documents via RTGS romanization. Search "pla" and find ปลา automatically.
Number normalization
Thai digit strings (๑๒๓) indexed alongside ASCII equivalents (123) — cross-script numeric queries always match.
High throughput
PostgreSQL: 15M+ ops/s. SQLite FTS5: 4–6 MiB/s bulk index throughput. Both benchmarked on Apple M-series.
PostgreSQL — native text search parser
kham-pg is a PostgreSQL text-search parser extension. It integrates with to_tsvector, ts_headline, and ts_rank natively.
Supports PostgreSQL 14–18 with pre-built Docker images for fastest setup.
# PostgreSQL 17 with kham_pg pre-installed — no Rust toolchain needed
# Available tags: v0.8.2-pg14 v0.8.2-pg16 v0.8.2-pg17 v0.8.2-pg18
docker run --rm \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
nickmsft/kham-pg:v0.8.2-pg17
# Connect and use immediately
psql -h localhost -U postgres -c "
CREATE EXTENSION kham_pg;
SELECT to_tsvector('kham', 'กินข้าวกับปลา');
" -- 1. Create the extension
CREATE EXTENSION kham_pg;
-- 2. Create a table with a tsvector column
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
body TEXT,
tsv TSVECTOR
);
-- 3. Populate the index
UPDATE articles
SET tsv = to_tsvector('kham', coalesce(title,'') || ' ' || coalesce(body,''));
-- 4. GIN index for fast queries
CREATE INDEX articles_tsv_idx ON articles USING GIN(tsv);
-- 5. Search — phonetic match "pla" finds ปลา
SELECT id, ts_headline('kham', body, query, 'StartSel=<b>, StopSel=</b>') AS snippet
FROM articles, to_tsquery('kham', 'pla') AS query
WHERE tsv @@ query
ORDER BY ts_rank(tsv, query) DESC
LIMIT 10; SQLite — offline-first Thai search
kham-sqlite is a loadable FTS5 tokenizer extension. Ideal for mobile apps, desktop apps, and edge deployments where there is no database server. Runs on Android, iOS, macOS, Linux, and Windows.
-- 1. Load the extension
.load ./target/release/libkham_sqlite
-- 2. Create an FTS5 virtual table
CREATE VIRTUAL TABLE docs USING fts5(
title,
body,
tokenize = 'kham soundex lk82 stopwords on'
);
-- 3. Insert documents
INSERT INTO docs VALUES
('อาหารไทย', 'กินข้าวกับปลาและผัก'),
('เมืองหลวง', 'กรุงเทพมหานครเป็นเมืองหลวง');
-- 4. Search — BM25 ranked results
SELECT title, bm25(docs) AS score
FROM docs
WHERE docs MATCH 'ปลา'
ORDER BY score;
-- 5. Snippet highlighting
SELECT title, snippet(docs, 1, '<b>', '</b>', '…', 10)
FROM docs
WHERE docs MATCH 'กรุงเทพ'; Performance
Add Thai search to your app today
Free, open-source, MIT OR Apache-2.0. Docker images for zero-setup PostgreSQL — no Rust toolchain required.