§ nôm · 2026apache 2.0 · pip install nom-vn
Nôm
v0.2 · public

Built for Vietnamese.

Open-source Python toolkit for building Vietnamese AI applications.

Every team building Vietnamese AI re-implements OCR, text utilities, retrieval. Nôm packages them as one library — local-first, your LLM, your hardware. One pip install — you focus on the product.

§ 01 · what's inside

One library. Three things you'll reach for.

Nôm wraps Vietnamese-specific NLP and a local-first RAG pipeline into one package. Use a single utility, the full RAG, or `nom serve` for the chat web app. Backed by your LLM — Ollama by default; OpenAI, Anthropic, llama.cpp also supported.

nom.textApache 2.0
diacritics · normalize · tokenizepip install nom-vn

NFC normalization, diacritic restoration (zero-dep rule, Apache-licensed T5 model, or any LLM), word segmentation. The small primitives every Vietnamese pipeline needs — and the receipts to pick the right one.

nom.doc + nom.ragApache 2.0
documents → answerspip install "nom-vn[doc,embeddings,llm]"

PDF, DOCX, XLSX, PPTX, HTML, scanned images. Wired into a Vietnamese-aware RAG with hybrid retrieval, cross-encoder reranking, and clickable citations. Bring any LLM.

nom serveApache 2.0
chat web apppip install "nom-vn[chat]"

FastAPI server + React UI baked into the wheel. One command, opens on localhost:8080. Drop in PDFs, ask in Vietnamese, get answers with citations grounded in your documents.

§ 02 · why nôm

A thousand years of Vietnamese script.

Chữ Nôm was the script Vietnamese people used to write their own language for over a thousand years — until the Latin-based Quốc Ngữ replaced it in the 20th century. Truyện Kiều by Nguyễn Du, the foundational work of Vietnamese literature, was written in Nôm. So were Nguyễn Trãi's poems and Hồ Xuân Hương's verse. The script carries a literary tradition older than most living languages have on paper.

Naming a 2026 toolkit after a 13th-century script isn't nostalgia. It's a thesis: Vietnam has always written its own language with its own instruments. Nôm is the next instrument in that tradition — open source, runs locally, doesn't depend on a foreign cloud.

Released under Apache 2.0. Reproducible benchmarks. Models we publish ship with their training recipe. Runs on the hardware you already own.

“Nôm is our script, for our language, by our hand.” — the spirit of reviving a script, applied to a toolkit.
§ 03 · measurement

Measured in numbers, not words.

Nôm doesn't replace your LLM — it adds the Vietnamese-specific layer (text normalization, diacritics, tokenization, retrieval, OCR) on top. Every default ships with a measured number from a script in benchmarks/ that runs on a clean clone.

no numbers · no estimates · no placeholders

Numbers live with the code, not on this page — that way they don't drift. The receipts: per-task pages in docs/tasks/, the consolidated docs/benchmark.md, and reproducible scripts in benchmarks/. We don't ship a number we haven't measured.

§ 04 · what people use nôm for

What Nôm does well.

contracts

Internal contract Q&A

Drop 200 contract PDFs onto a company server. Ask: "How many contracts have penalty clauses above 10%?" Get answers with contract numbers and pages. Nothing leaves your network.

official docs

Official document summarization & extraction

Document number, issue date, issuing body, key content. Vietnamese OCR with diacritics — accurate even on faded faxes and old scans.

assistant

Internal assistant on a company server

Deploy on a single GPU box. Plug into internal docs, calendar, ticketing. Security level: 'never leaves the LAN.'

rag

RAG for Vietnamese documents

Tokenizer that understands tone marks, compound words, EN/VN code-switching. High-quality embeddings for Vietnamese — not English run through translation.

§ 05 · quickstart

Three lines to a Vietnamese RAG.

Real, working code — pip install nom-vn. The docs cover every backend, install extra, and recipe.

pip install nom-vn                # text + chunking + retrieve + rag
pip install "nom-vn[doc]"         # + PDF / Office / OCR parsers
pip install "nom-vn[embeddings]"  # + sentence-transformers
pip install "nom-vn[llm]"         # + Ollama / OpenAI adapters
pip install "nom-vn[chat]"        # + FastAPI / React UI (everything above)
pip install "nom-vn[all]"         # the lot

# Then, for the chat web app:
nom serve   # opens http://localhost:8080
§ 06 · install & community

Install. Cite. Join in.

Apache 2.0 — use it, fork it, ship it. If you publish work that uses Nôm, the citation block on the right is the canonical form.

Track progress, file issues, contribute:

bibtex
# Install
pip install nom-vn          # core
pip install "nom-vn[all]"   # everything

# Cite
@software{nom2026,
  title  = {Nôm: an open Python toolkit for Vietnamese AI applications},
  author = {Nguyen, Viet-Anh and {Neural Research Lab}},
  year   = {2026},
  url    = {https://nrl.ai/nom},
  note   = {Apache 2.0}
}