Internal contract Q&A
Drop 200 contract PDFs onto a company server. Ask: "How many contracts have penalty clauses above 10%?" Get answers with contract numbers and pages. Nothing leaves your network.
Open-source Python toolkit for building Vietnamese AI applications.
Every team building Vietnamese AI re-implements OCR, text utilities, retrieval. Nôm packages them as one library — local-first, your LLM, your hardware. One pip install — you focus on the product.
Nôm wraps Vietnamese-specific NLP and a local-first RAG pipeline into one package. Use a single utility, the full RAG, or `nom serve` for the chat web app. Backed by your LLM — Ollama by default; OpenAI, Anthropic, llama.cpp also supported.
NFC normalization, diacritic restoration (zero-dep rule, Apache-licensed T5 model, or any LLM), word segmentation. The small primitives every Vietnamese pipeline needs — and the receipts to pick the right one.
PDF, DOCX, XLSX, PPTX, HTML, scanned images. Wired into a Vietnamese-aware RAG with hybrid retrieval, cross-encoder reranking, and clickable citations. Bring any LLM.
FastAPI server + React UI baked into the wheel. One command, opens on localhost:8080. Drop in PDFs, ask in Vietnamese, get answers with citations grounded in your documents.
Chữ Nôm was the script Vietnamese people used to write their own language for over a thousand years — until the Latin-based Quốc Ngữ replaced it in the 20th century. Truyện Kiều by Nguyễn Du, the foundational work of Vietnamese literature, was written in Nôm. So were Nguyễn Trãi's poems and Hồ Xuân Hương's verse. The script carries a literary tradition older than most living languages have on paper.
Naming a 2026 toolkit after a 13th-century script isn't nostalgia. It's a thesis: Vietnam has always written its own language with its own instruments. Nôm is the next instrument in that tradition — open source, runs locally, doesn't depend on a foreign cloud.
Released under Apache 2.0. Reproducible benchmarks. Models we publish ship with their training recipe. Runs on the hardware you already own.
“Nôm is our script, for our language, by our hand.” — the spirit of reviving a script, applied to a toolkit.
Nôm doesn't replace your LLM — it adds the Vietnamese-specific layer (text normalization, diacritics, tokenization, retrieval, OCR) on top. Every default ships with a measured number from a script in benchmarks/ that runs on a clean clone.
Numbers live with the code, not on this page — that way they don't drift. The receipts: per-task pages in docs/tasks/, the consolidated docs/benchmark.md, and reproducible scripts in benchmarks/. We don't ship a number we haven't measured.
Drop 200 contract PDFs onto a company server. Ask: "How many contracts have penalty clauses above 10%?" Get answers with contract numbers and pages. Nothing leaves your network.
Document number, issue date, issuing body, key content. Vietnamese OCR with diacritics — accurate even on faded faxes and old scans.
Deploy on a single GPU box. Plug into internal docs, calendar, ticketing. Security level: 'never leaves the LAN.'
Tokenizer that understands tone marks, compound words, EN/VN code-switching. High-quality embeddings for Vietnamese — not English run through translation.
Real, working code — pip install nom-vn. The docs cover every backend, install extra, and recipe.
pip install nom-vn # text + chunking + retrieve + rag
pip install "nom-vn[doc]" # + PDF / Office / OCR parsers
pip install "nom-vn[embeddings]" # + sentence-transformers
pip install "nom-vn[llm]" # + Ollama / OpenAI adapters
pip install "nom-vn[chat]" # + FastAPI / React UI (everything above)
pip install "nom-vn[all]" # the lot
# Then, for the chat web app:
nom serve # opens http://localhost:8080Apache 2.0 — use it, fork it, ship it. If you publish work that uses Nôm, the citation block on the right is the canonical form.
Track progress, file issues, contribute:
# Install
pip install nom-vn # core
pip install "nom-vn[all]" # everything
# Cite
@software{nom2026,
title = {Nôm: an open Python toolkit for Vietnamese AI applications},
author = {Nguyen, Viet-Anh and {Neural Research Lab}},
year = {2026},
url = {https://nrl.ai/nom},
note = {Apache 2.0}
}