A bitemporal, paraconsistent quad store with a full evidence substrate. Postgres 16 + Rust + Go TUI. Optional Lean 4 sidecar for shape validation, derivations, and machine-checkable certificates.
donto is a database for claims that may be wrong.
It stores what was said, who said it, when it was said, what it was based on, what contradicts it, what remains unresolved, and what has been formally certified. Traditional databases assume clean facts. donto is for the messy interval between evidence and knowledge.
claim = (subject, predicate, object, context,
valid_time, transaction_time, polarity, maturity)
+ evidence chain
+ confidence
+ shape annotations
+ arguments (supports / rebuts / undercuts)
+ proof obligations
+ certificate
+ predicate alignment (equivalents, inverses, sub-properties)
Paraconsistent. Two sources disagree about Alice's birth year. Both rows live forever. Contradictions are evidence, not errors.
Bitemporal. Every statement tracks when the fact was true in the
world (valid_time) and when the system learned it (tx_time).
Retraction closes tx_time — nothing is ever deleted.
Full evidence chain. Every claim traces back through extraction run → document revision → source document → agent, with span-level anchoring to exact character offsets in the source text.
Epistemic maturity ladder. Claims climb from raw (Level 0) through registry-curated (1), evidenced (2), validated (3), to certified (4). The system tells you exactly why each claim hasn't reached the next level.
Predicate alignment. LLM extractors freely mint predicates —
bornIn, wasBornIn, birthplaceOf all mean the same thing. The
predicate alignment layer converges them without constraining
extraction: equivalents, inverses, sub-properties, and close matches
are registered with confidence scores. Queries expand through the
closure automatically.
Lean 4 verification. 62 kernel-checked theorems prove model invariants — paraconsistency, snapshot monotonicity, scope semantics, correction identity preservation. The proofs hold for every possible input, not just test cases.
| Component | Count |
|---|---|
| SQL migrations | 56 |
| Tables | 60+ |
| SQL functions | 141 |
| HTTP endpoints | 44 |
| Rust test files | 57 |
| Lean modules | 21 |
| Lean theorems | 62 |
| Ingestion formats | 8 |
| Registered predicates | 1,300+ |
git clone https://github.com/thomasdavis/donto
cd donto
# Bring up Postgres 16 and apply all migrations
./scripts/pg-up.sh
cargo run -p donto-cli --quiet -- migrate
# Start the HTTP sidecar (optional — for apps that talk over HTTP)
cargo run -p dontosrv -- --bind 127.0.0.1:7878
# Run tests (requires running Postgres)
DONTO_TEST_DSN=postgres://donto:donto@127.0.0.1:55432/donto \
cargo test --workspace
# Launch the TUI dashboard (requires Go 1.21+)
cd apps/donto-tui && go run .
# Build the Lean verification layer (optional, requires elan / Lean 4)
cd packages/lean && lake build# Import a genealogy SQLite database (research.db) into donto
cargo run -p donto-migrate -- \
--dsn postgres://donto:donto@127.0.0.1:55432/donto \
genealogy /path/to/research.db \
--root ctx:genealogy/research-db
# Optional second pass for full provenance (documents, chunks, costs)
cargo run -p donto-migrate -- \
--dsn postgres://donto:donto@127.0.0.1:55432/donto \
genealogy-relink /path/to/research.db \
--root ctx:genealogy/research-db-- Two sources disagree about a birth year.
SELECT donto_assert('ex:alice', 'ex:birthYear', NULL,
'{"v":1899,"dt":"xsd:integer"}'::jsonb,
'ctx:census1900', 'asserted', 0, NULL, NULL, NULL);
SELECT donto_assert('ex:alice', 'ex:birthYear', NULL,
'{"v":1925,"dt":"xsd:integer"}'::jsonb,
'ctx:hospital1925', 'asserted', 0, NULL, NULL, NULL);
-- Both are visible. donto never picks a winner.
SELECT * FROM donto_match('ex:alice', 'ex:birthYear',
NULL, NULL, NULL, 'asserted', 0, NULL, NULL);
-- Two rows: 1899 from census, 1925 from hospital.
-- Correct the record without losing history.
SELECT donto_correct(
(SELECT statement_id FROM donto_match('ex:alice', 'ex:birthYear',
NULL, NULL, '{"include":["ctx:census1900"]}'::jsonb,
'asserted', 0, NULL, NULL) LIMIT 1),
NULL, NULL, NULL,
'{"v":1898,"dt":"xsd:integer"}'::jsonb, NULL, NULL);
-- Time travel: what did we believe last month?
SELECT * FROM donto_match('ex:alice', 'ex:birthYear',
NULL, NULL, NULL, 'asserted', 0,
'2026-03-01T00:00:00Z'::timestamptz, NULL);LLM extractors mint predicates freely — the same relationship gets
called bornIn, wasBornIn, birthplaceOf, placeOfBirth. Without
alignment, queries only find exact matches and the knowledge graph
fragments.
The predicate alignment layer (PAL) solves this without constraining extraction. Extractors keep minting whatever feels natural; PAL converges them after the fact.
| Relation | Meaning | Example |
|---|---|---|
exact_equivalent |
Same meaning, interchangeable | bornIn ↔ wasBornIn |
inverse_equivalent |
Same meaning, swaps subject/object | parentOf ↔ childOf |
sub_property_of |
Source is a narrower case of target | birthPlace → associatedPlace |
close_match |
Fuzzy, human-review tier | workedAt ≈ employedBy |
decomposition |
Decomposes into an event frame | workedAt → EmploymentEvent roles |
not_equivalent |
Explicit negative: do not align | birthDate ✗ deathDate |
-- bornIn and wasBornIn mean the same thing
SELECT donto_register_alignment(
'ex:bornIn', 'ex:wasBornIn', 'exact_equivalent',
1.0, -- confidence
NULL, NULL, -- valid_time bounds (optional)
NULL, -- run_id (optional)
NULL, -- provenance (optional)
'human' -- actor
);
-- parentOf is the inverse of childOf
SELECT donto_register_alignment(
'ex:parentOf', 'ex:childOf', 'inverse_equivalent',
1.0, NULL, NULL, NULL, NULL, 'human'
);
-- Rebuild the closure after registering alignments
SELECT donto_rebuild_predicate_closure();After rebuilding the closure, donto_match() expands predicates
automatically. A query for wasBornIn also returns statements asserted
with bornIn:
-- Returns rows for both bornIn and wasBornIn
SELECT * FROM donto_match('ex:alice', 'ex:wasBornIn',
NULL, NULL, NULL, 'asserted', 0, NULL, NULL);
-- Explicit expansion control with matched_via and confidence
SELECT * FROM donto_match_aligned(
'ex:alice', 'ex:wasBornIn', NULL, NULL, NULL,
'asserted', 0, NULL, NULL,
true, -- expand_predicates
0.8 -- min_alignment_confidence
);
-- Returns: matched_via='exact_equivalent', alignment_confidence=1.0
-- Strict mode: no expansion, exact predicate only
SELECT * FROM donto_match_strict('ex:alice', 'ex:bornIn',
NULL, NULL, NULL, 'asserted', 0, NULL, NULL);-- Default: expansion enabled (confidence ≥ 0.8)
MATCH ?s ex:wasBornIn ?o
PROJECT ?s, ?o
-- Strict: exact predicate match only
MATCH ?s ex:bornIn ?o PREDICATES STRICT
PROJECT ?s, ?o
-- Custom confidence floor
MATCH ?s ex:wasBornIn ?o PREDICATES EXPAND_ABOVE 90
PROJECT ?s, ?o
-- Lexical similarity between two predicates
SELECT donto_predicate_lexical_similarity('ex:bornIn', 'ex:wasBornIn');
-- → 0.72
-- Suggest alignments for a predicate based on trigram similarity
SELECT * FROM donto_suggest_alignments('ex:bornIn', 0.5, 10);
-- → (ex:wasBornIn, 0.72, "Was born in"), ...
-- Auto-align a batch of predicates above a confidence threshold
SELECT donto_auto_align_batch(
ARRAY['ex:bornIn', 'ex:diedIn', 'ex:livedAt'],
0.7, -- min_similarity
'sweep' -- actor
);
-- Embedding-based candidate lookup (for LLM extraction pipelines)
SELECT * FROM donto_extraction_predicate_candidates(
embedding, -- float4[] from your model
'text-embedding-3-small',
'genealogy', -- domain filter (optional)
NULL, NULL, 30 -- subject_type, object_type, limit
);Rich metadata for each predicate — label, gloss, domain/range hints, examples, and optional embeddings for semantic search:
SELECT donto_upsert_descriptor(
'ex:bornIn',
'Born in', -- label
'Place where a person was born', -- gloss
'Person', -- subject_type (domain)
'Place', -- object_type (range)
'genealogy', -- domain
'ex:alice', -- example_subject
'ex:london', -- example_object
'Alice was born in London', -- source_sentence
'many_to_one', -- cardinality
NULL, NULL, NULL -- embedding_model, embedding, metadata
);For callers that want fully-canonicalized data, the shadow table pre-computes the canonical predicate for each statement:
-- Materialize a single statement's canonical form
SELECT donto_materialize_shadow('statement-uuid');
-- Batch rebuild for a context (or all statements)
SELECT donto_rebuild_shadows('ctx:genealogy/research-db', 10000);Complex n-ary relations decompose into event frames instead of losing information in a single triple:
-- Instead of: (Marie, workedAt, Sorbonne)
-- Emit a frame with roles:
SELECT donto_decompose_to_frame(
'ex:marie-curie', 'ex:workedAt', 'ex:sorbonne',
'ctx:biography', 'ex:EmploymentEvent',
'{"ex:startDate":"1906","ex:endDate":"1934","ex:role":"professor of physics"}'::jsonb,
'1906-01-01'::date, '1934-07-04'::date,
'extraction'
);donto doesn't just store claims. It stores the full lifecycle of how a claim was produced, evaluated, challenged, and certified.
Document (PDF, web page, record)
→ Revision (text extraction, OCR, parser version)
→ Spans (character offsets, sentences, regions)
→ Mentions (entity references, typed)
→ Coreference clusters (resolved entities)
→ Tables (rows, columns, cells with headers)
→ Content regions (figures, charts, code blocks)
→ Extraction run (model, version, prompt, chunking)
→ Extraction chunks (per-chunk provenance)
→ Statements (claims with typed literals)
→ Confidence scores (per-statement overlay)
→ Evidence links (statement ↔ span/run/document)
→ Shape annotations (pass/warn/violate)
→ Arguments (supports/rebuts/undercuts/qualifies)
→ Proof obligations (needs-coref, needs-source-support, ...)
→ Certificates (Lean-verifiable proofs)
Every layer is queryable, traceable, and correctable.
Ask donto everything it knows about a single claim:
SELECT donto_claim_card('statement-uuid');Returns the statement, its evidence links, arguments, proof obligations, shape annotations, reactions, and maturity blockers — everything needed to understand and evaluate the claim.
Every statement progresses through stages independently of maturity: observed → extracted → typed → anchored → confidence-rated → predicate-registered → shape-checked → source-supported → obligations-clear → argued → certified.
-- See lifecycle coverage for a context
SELECT * FROM donto_lifecycle_summary('ctx:genealogy/research-db');Ask why a claim hasn't been promoted:
SELECT * FROM donto_why_not_higher('statement-uuid');
-- predicate_not_registered: Predicate ex:foo is not registered
-- no_shape_report: No shape validation has been run
-- no_span_anchor: Not anchored to a specific source span
-- open_obligations: 2 open proof obligation(s)
-- active_rebuttals: Open attacks/contradictions existSQL functions — for applications with a Postgres connection:
SELECT * FROM donto_match('ex:alice', 'ex:knows', NULL, NULL,
'{"include":["ctx:wikipedia"]}'::jsonb,
'asserted', 0, NULL, NULL);SPARQL 1.1 subset — for RDF tooling:
PREFIX ex: <http://example.org/>
SELECT ?x ?y WHERE { ?x ex:knows ?y . } LIMIT 10DontoQL — native language with full feature access:
PRESET latest
MATCH ?x ex:knows ?y, ?y ex:name ?n
FILTER ?n != "Mallory"
POLARITY asserted
MATURITY >= 1
PREDICATES EXPAND
PROJECT ?x, ?n
Eight parsers, all piped through the same assertion path:
donto ingest dump.nq # N-Quads
donto ingest data.ttl # Turtle
donto ingest graph.trig # TriG
donto ingest data.rdf # RDF/XML
donto ingest data.jsonld # JSON-LD
donto ingest export.json --format property-graph # Neo4j / AGE
donto ingest stream.jsonl --format jsonl # LLM output
donto ingest data.csv --mapping mapping.json # CSVA terminal UI (Go / Charm Bubbletea) for browsing and monitoring donto in real time. Tabs: dashboard stats, firehose (live LISTEN/NOTIFY stream), explorer, contexts, claim cards, and charts.
cd apps/donto-tui && go run .
# Flags:
# --dsn Postgres DSN (default: $DONTO_DSN or localhost:55432)
# --poll Dashboard refresh interval (default: 5s)
# --srv dontosrv URL for health checks (default: http://127.0.0.1:7878)
# --install-triggers Install LISTEN/NOTIFY triggers on connectdontosrv exposes 44 endpoints for applications that don't have a
direct Postgres connection. Axum-based, stateless, horizontally
scalable.
| Category | Endpoints |
|---|---|
| Query | /sparql, /dontoql, /search, /history/:subject, /statement/:id, /claim/:id |
| Browse | /subjects, /contexts, /predicates |
| Write | /assert, /assert/batch, /retract, /contexts/ensure |
| Evidence | /documents/register, /documents/revision, /evidence/link/span, /evidence/:stmt |
| Arguments | /arguments/assert, /arguments/:stmt, /arguments/frontier |
| Shapes | /shapes/validate |
| Rules | /rules/derive |
| Certificates | /certificates/attach, /certificates/verify/:stmt |
| Obligations | /obligations/emit, /obligations/resolve, /obligations/open, /obligations/summary |
| Agents | /agents/register, /agents/bind |
| Reactions | /react, /reactions/:id |
| Alignment | /alignment/register, /alignment/retract, /alignment/rebuild-closure, /alignment/runs/start, /alignment/runs/complete |
| Descriptors | /descriptors/upsert, /descriptors/nearest |
| Shadows | /shadow/materialize, /shadow/rebuild |
| System | /health, /version, /dir |
A TypeScript client (packages/client-ts) mirrors the HTTP surface
for Next.js and other JS/TS applications.
donto is domain-agnostic. The same schema handles any domain where sources disagree, evidence evolves, or claims need verification:
- Scientific literature — benchmark results, measurements, derived comparisons, proof obligations for vague claims
- Investigative research — contradictory records, hypothesis branches, temporal evidence chains
- Legal / compliance — contract clauses, regulatory requirements, section-level anchoring, audit trails
- Medical records — lab results, medication histories, temporal expressions, confidence-rated diagnoses
- LLM extraction pipelines — structured output → mentions → candidates → promoted claims with full provenance
- Intelligence analysis — multi-source fusion, competing hypotheses, confidence tiers
Ontology seeds ship with the database (migration 0044): 1,300+ predicates across schema.org, ML/AI, physics, geography, and events. Domain-specific predicates are implicitly registered in permissive contexts, then converged through the predicate alignment layer.
apps/
donto-cli/ CLI: migrate, ingest, query, match, retract
dontosrv/ HTTP sidecar (44 endpoints)
donto-tui/ Go/Charm TUI: dashboard, firehose, explorer, charts
docs/ Astro Starlight documentation site
packages/
donto-client/ Typed Rust wrapper + test suite
donto-query/ DontoQL + SPARQL parser and evaluator
donto-ingest/ 8 ingestion format parsers
donto-migrate/ External store migrators (genealogy SQLite)
pg_donto/ pgrx Postgres extension
sql/migrations/ 56 idempotent SQL migrations (source of truth)
sql/fixtures/ Example data for smoke tests
sql/scripts/ Epistemic sweep and batch operations
lean/ 21 Lean 4 modules, 62 theorems
client-ts/ TypeScript client (@donto/client)
tsconfig/ Shared TypeScript config
PRD.md Design specification (principles + maturity ladder)
CLAUDE.md Working contract for AI/human contributors
turbo.json Turborepo pipeline config
PRD.md— design specification. Read §3 (principles) and §2 (maturity ladder) before contributing.apps/docs— Starlight documentation site with migration reference, schema gap audit, and guides.ANTHROPOLOGY_README.md— research philosophy and domain context.
donto is early and moving fast. The data model, evidence substrate, predicate alignment layer, and Lean verification layer are solid. Current focus areas:
- Predicate alignment at scale — embedding-based alignment, auto-suggest, canonical shadow materialization
- AI extraction pipeline — LLM-powered observation → interpretation → judgment chain with full provenance
- TUI polish — charts, claim card viewer, context explorer
- Source-support verification — automated checking of whether a source span actually supports its claim
- More Lean certificates — proof-carrying shapes, derivation trees, and certificate verifiers
Performance is not yet a goal (PRD §25). Current focus: correctness, PRD coverage, test depth.
See CLAUDE.md for the working contract (non-negotiables,
SQL idioms, testing patterns). Read the PRD before changing core types.
Dual licensed under Apache 2.0 and MIT.