donto

A bitemporal, paraconsistent quad store with a full evidence substrate. Postgres 16 + Rust + Go TUI. Optional Lean 4 sidecar for shape validation, derivations, and machine-checkable certificates.

donto is a database for claims that may be wrong.

It stores what was said, who said it, when it was said, what it was based on, what contradicts it, what remains unresolved, and what has been formally certified. Traditional databases assume clean facts. donto is for the messy interval between evidence and knowledge.

claim = (subject, predicate, object, context,
         valid_time, transaction_time, polarity, maturity)
       + evidence chain
       + confidence
       + shape annotations
       + arguments (supports / rebuts / undercuts)
       + proof obligations
       + certificate
       + predicate alignment (equivalents, inverses, sub-properties)

What makes donto different

Paraconsistent. Two sources disagree about Alice's birth year. Both rows live forever. Contradictions are evidence, not errors.

Bitemporal. Every statement tracks when the fact was true in the world (valid_time) and when the system learned it (tx_time). Retraction closes tx_time — nothing is ever deleted.

Full evidence chain. Every claim traces back through extraction run → document revision → source document → agent, with span-level anchoring to exact character offsets in the source text.

Epistemic maturity ladder. Claims climb from raw (Level 0) through registry-curated (1), evidenced (2), validated (3), to certified (4). The system tells you exactly why each claim hasn't reached the next level.

Predicate alignment. LLM extractors freely mint predicates — bornIn, wasBornIn, birthplaceOf all mean the same thing. The predicate alignment layer converges them without constraining extraction: equivalents, inverses, sub-properties, and close matches are registered with confidence scores. Queries expand through the closure automatically.

Lean 4 verification. 62 kernel-checked theorems prove model invariants — paraconsistency, snapshot monotonicity, scope semantics, correction identity preservation. The proofs hold for every possible input, not just test cases.

At a glance

Component	Count
SQL migrations	56
Tables	60+
SQL functions	141
HTTP endpoints	44
Rust test files	57
Lean modules	21
Lean theorems	62
Ingestion formats	8
Registered predicates	1,300+

Quick start

git clone https://github.com/thomasdavis/donto
cd donto

# Bring up Postgres 16 and apply all migrations
./scripts/pg-up.sh
cargo run -p donto-cli --quiet -- migrate

# Start the HTTP sidecar (optional — for apps that talk over HTTP)
cargo run -p dontosrv -- --bind 127.0.0.1:7878

# Run tests (requires running Postgres)
DONTO_TEST_DSN=postgres://donto:donto@127.0.0.1:55432/donto \
  cargo test --workspace

# Launch the TUI dashboard (requires Go 1.21+)
cd apps/donto-tui && go run .

# Build the Lean verification layer (optional, requires elan / Lean 4)
cd packages/lean && lake build

Migrate a genealogy database

# Import a genealogy SQLite database (research.db) into donto
cargo run -p donto-migrate -- \
  --dsn postgres://donto:donto@127.0.0.1:55432/donto \
  genealogy /path/to/research.db \
  --root ctx:genealogy/research-db

# Optional second pass for full provenance (documents, chunks, costs)
cargo run -p donto-migrate -- \
  --dsn postgres://donto:donto@127.0.0.1:55432/donto \
  genealogy-relink /path/to/research.db \
  --root ctx:genealogy/research-db

A quick example

-- Two sources disagree about a birth year.
SELECT donto_assert('ex:alice', 'ex:birthYear', NULL,
    '{"v":1899,"dt":"xsd:integer"}'::jsonb,
    'ctx:census1900', 'asserted', 0, NULL, NULL, NULL);

SELECT donto_assert('ex:alice', 'ex:birthYear', NULL,
    '{"v":1925,"dt":"xsd:integer"}'::jsonb,
    'ctx:hospital1925', 'asserted', 0, NULL, NULL, NULL);

-- Both are visible. donto never picks a winner.
SELECT * FROM donto_match('ex:alice', 'ex:birthYear',
    NULL, NULL, NULL, 'asserted', 0, NULL, NULL);
-- Two rows: 1899 from census, 1925 from hospital.

-- Correct the record without losing history.
SELECT donto_correct(
    (SELECT statement_id FROM donto_match('ex:alice', 'ex:birthYear',
     NULL, NULL, '{"include":["ctx:census1900"]}'::jsonb,
     'asserted', 0, NULL, NULL) LIMIT 1),
    NULL, NULL, NULL,
    '{"v":1898,"dt":"xsd:integer"}'::jsonb, NULL, NULL);

-- Time travel: what did we believe last month?
SELECT * FROM donto_match('ex:alice', 'ex:birthYear',
    NULL, NULL, NULL, 'asserted', 0,
    '2026-03-01T00:00:00Z'::timestamptz, NULL);

Predicate alignment

LLM extractors mint predicates freely — the same relationship gets called bornIn, wasBornIn, birthplaceOf, placeOfBirth. Without alignment, queries only find exact matches and the knowledge graph fragments.

The predicate alignment layer (PAL) solves this without constraining extraction. Extractors keep minting whatever feels natural; PAL converges them after the fact.

Alignment relations

Relation	Meaning	Example
`exact_equivalent`	Same meaning, interchangeable	`bornIn` ↔ `wasBornIn`
`inverse_equivalent`	Same meaning, swaps subject/object	`parentOf` ↔ `childOf`
`sub_property_of`	Source is a narrower case of target	`birthPlace` → `associatedPlace`
`close_match`	Fuzzy, human-review tier	`workedAt` ≈ `employedBy`
`decomposition`	Decomposes into an event frame	`workedAt` → EmploymentEvent roles
`not_equivalent`	Explicit negative: do not align	`birthDate` ✗ `deathDate`

Register an alignment

-- bornIn and wasBornIn mean the same thing
SELECT donto_register_alignment(
    'ex:bornIn', 'ex:wasBornIn', 'exact_equivalent',
    1.0,        -- confidence
    NULL, NULL, -- valid_time bounds (optional)
    NULL,       -- run_id (optional)
    NULL,       -- provenance (optional)
    'human'     -- actor
);

-- parentOf is the inverse of childOf
SELECT donto_register_alignment(
    'ex:parentOf', 'ex:childOf', 'inverse_equivalent',
    1.0, NULL, NULL, NULL, NULL, 'human'
);

-- Rebuild the closure after registering alignments
SELECT donto_rebuild_predicate_closure();

Querying with alignment

After rebuilding the closure, donto_match() expands predicates automatically. A query for wasBornIn also returns statements asserted with bornIn:

-- Returns rows for both bornIn and wasBornIn
SELECT * FROM donto_match('ex:alice', 'ex:wasBornIn',
    NULL, NULL, NULL, 'asserted', 0, NULL, NULL);

-- Explicit expansion control with matched_via and confidence
SELECT * FROM donto_match_aligned(
    'ex:alice', 'ex:wasBornIn', NULL, NULL, NULL,
    'asserted', 0, NULL, NULL,
    true,   -- expand_predicates
    0.8     -- min_alignment_confidence
);
-- Returns: matched_via='exact_equivalent', alignment_confidence=1.0

-- Strict mode: no expansion, exact predicate only
SELECT * FROM donto_match_strict('ex:alice', 'ex:bornIn',
    NULL, NULL, NULL, 'asserted', 0, NULL, NULL);

DontoQL predicate modes

-- Default: expansion enabled (confidence ≥ 0.8)
MATCH ?s ex:wasBornIn ?o
PROJECT ?s, ?o

-- Strict: exact predicate match only
MATCH ?s ex:bornIn ?o PREDICATES STRICT
PROJECT ?s, ?o

-- Custom confidence floor
MATCH ?s ex:wasBornIn ?o PREDICATES EXPAND_ABOVE 90
PROJECT ?s, ?o

Finding and suggesting alignments

-- Lexical similarity between two predicates
SELECT donto_predicate_lexical_similarity('ex:bornIn', 'ex:wasBornIn');
-- → 0.72

-- Suggest alignments for a predicate based on trigram similarity
SELECT * FROM donto_suggest_alignments('ex:bornIn', 0.5, 10);
-- → (ex:wasBornIn, 0.72, "Was born in"), ...

-- Auto-align a batch of predicates above a confidence threshold
SELECT donto_auto_align_batch(
    ARRAY['ex:bornIn', 'ex:diedIn', 'ex:livedAt'],
    0.7,     -- min_similarity
    'sweep'  -- actor
);

-- Embedding-based candidate lookup (for LLM extraction pipelines)
SELECT * FROM donto_extraction_predicate_candidates(
    embedding,      -- float4[] from your model
    'text-embedding-3-small',
    'genealogy',    -- domain filter (optional)
    NULL, NULL, 30  -- subject_type, object_type, limit
);

Predicate descriptors

Rich metadata for each predicate — label, gloss, domain/range hints, examples, and optional embeddings for semantic search:

SELECT donto_upsert_descriptor(
    'ex:bornIn',
    'Born in',                        -- label
    'Place where a person was born',  -- gloss
    'Person',                         -- subject_type (domain)
    'Place',                          -- object_type (range)
    'genealogy',                      -- domain
    'ex:alice',                       -- example_subject
    'ex:london',                      -- example_object
    'Alice was born in London',       -- source_sentence
    'many_to_one',                    -- cardinality
    NULL, NULL, NULL                  -- embedding_model, embedding, metadata
);

Canonical shadows

For callers that want fully-canonicalized data, the shadow table pre-computes the canonical predicate for each statement:

-- Materialize a single statement's canonical form
SELECT donto_materialize_shadow('statement-uuid');

-- Batch rebuild for a context (or all statements)
SELECT donto_rebuild_shadows('ctx:genealogy/research-db', 10000);

Event frames

Complex n-ary relations decompose into event frames instead of losing information in a single triple:

-- Instead of: (Marie, workedAt, Sorbonne)
-- Emit a frame with roles:
SELECT donto_decompose_to_frame(
    'ex:marie-curie', 'ex:workedAt', 'ex:sorbonne',
    'ctx:biography', 'ex:EmploymentEvent',
    '{"ex:startDate":"1906","ex:endDate":"1934","ex:role":"professor of physics"}'::jsonb,
    '1906-01-01'::date, '1934-07-04'::date,
    'extraction'
);

Evidence substrate

donto doesn't just store claims. It stores the full lifecycle of how a claim was produced, evaluated, challenged, and certified.

Document (PDF, web page, record)
  → Revision (text extraction, OCR, parser version)
    → Spans (character offsets, sentences, regions)
      → Mentions (entity references, typed)
        → Coreference clusters (resolved entities)
    → Tables (rows, columns, cells with headers)
    → Content regions (figures, charts, code blocks)
  → Extraction run (model, version, prompt, chunking)
    → Extraction chunks (per-chunk provenance)
    → Statements (claims with typed literals)
      → Confidence scores (per-statement overlay)
      → Evidence links (statement ↔ span/run/document)
      → Shape annotations (pass/warn/violate)
      → Arguments (supports/rebuts/undercuts/qualifies)
      → Proof obligations (needs-coref, needs-source-support, ...)
      → Certificates (Lean-verifiable proofs)

Every layer is queryable, traceable, and correctable.

Claim card

Ask donto everything it knows about a single claim:

SELECT donto_claim_card('statement-uuid');

Returns the statement, its evidence links, arguments, proof obligations, shape annotations, reactions, and maturity blockers — everything needed to understand and evaluate the claim.

Claim lifecycle

Every statement progresses through stages independently of maturity: observed → extracted → typed → anchored → confidence-rated → predicate-registered → shape-checked → source-supported → obligations-clear → argued → certified.

-- See lifecycle coverage for a context
SELECT * FROM donto_lifecycle_summary('ctx:genealogy/research-db');

Why not higher?

Ask why a claim hasn't been promoted:

SELECT * FROM donto_why_not_higher('statement-uuid');
-- predicate_not_registered: Predicate ex:foo is not registered
-- no_shape_report: No shape validation has been run
-- no_span_anchor: Not anchored to a specific source span
-- open_obligations: 2 open proof obligation(s)
-- active_rebuttals: Open attacks/contradictions exist

Three query surfaces

SQL functions — for applications with a Postgres connection:

SELECT * FROM donto_match('ex:alice', 'ex:knows', NULL, NULL,
    '{"include":["ctx:wikipedia"]}'::jsonb,
    'asserted', 0, NULL, NULL);

SPARQL 1.1 subset — for RDF tooling:

PREFIX ex: <http://example.org/>
SELECT ?x ?y WHERE { ?x ex:knows ?y . } LIMIT 10

DontoQL — native language with full feature access:

PRESET latest
MATCH ?x ex:knows ?y, ?y ex:name ?n
FILTER ?n != "Mallory"
POLARITY asserted
MATURITY >= 1
PREDICATES EXPAND
PROJECT ?x, ?n

Ingestion

Eight parsers, all piped through the same assertion path:

donto ingest dump.nq                                    # N-Quads
donto ingest data.ttl                                   # Turtle
donto ingest graph.trig                                 # TriG
donto ingest data.rdf                                   # RDF/XML
donto ingest data.jsonld                                # JSON-LD
donto ingest export.json --format property-graph         # Neo4j / AGE
donto ingest stream.jsonl --format jsonl                 # LLM output
donto ingest data.csv --mapping mapping.json             # CSV

TUI dashboard

A terminal UI (Go / Charm Bubbletea) for browsing and monitoring donto in real time. Tabs: dashboard stats, firehose (live LISTEN/NOTIFY stream), explorer, contexts, claim cards, and charts.

cd apps/donto-tui && go run .

# Flags:
#   --dsn          Postgres DSN (default: $DONTO_DSN or localhost:55432)
#   --poll         Dashboard refresh interval (default: 5s)
#   --srv          dontosrv URL for health checks (default: http://127.0.0.1:7878)
#   --install-triggers  Install LISTEN/NOTIFY triggers on connect

HTTP sidecar (dontosrv)

dontosrv exposes 44 endpoints for applications that don't have a direct Postgres connection. Axum-based, stateless, horizontally scalable.

Category	Endpoints
Query	`/sparql`, `/dontoql`, `/search`, `/history/:subject`, `/statement/:id`, `/claim/:id`
Browse	`/subjects`, `/contexts`, `/predicates`
Write	`/assert`, `/assert/batch`, `/retract`, `/contexts/ensure`
Evidence	`/documents/register`, `/documents/revision`, `/evidence/link/span`, `/evidence/:stmt`
Arguments	`/arguments/assert`, `/arguments/:stmt`, `/arguments/frontier`
Shapes	`/shapes/validate`
Rules	`/rules/derive`
Certificates	`/certificates/attach`, `/certificates/verify/:stmt`
Obligations	`/obligations/emit`, `/obligations/resolve`, `/obligations/open`, `/obligations/summary`
Agents	`/agents/register`, `/agents/bind`
Reactions	`/react`, `/reactions/:id`
Alignment	`/alignment/register`, `/alignment/retract`, `/alignment/rebuild-closure`, `/alignment/runs/start`, `/alignment/runs/complete`
Descriptors	`/descriptors/upsert`, `/descriptors/nearest`
Shadows	`/shadow/materialize`, `/shadow/rebuild`
System	`/health`, `/version`, `/dir`

A TypeScript client (packages/client-ts) mirrors the HTTP surface for Next.js and other JS/TS applications.

Use cases

donto is domain-agnostic. The same schema handles any domain where sources disagree, evidence evolves, or claims need verification:

Scientific literature — benchmark results, measurements, derived comparisons, proof obligations for vague claims
Investigative research — contradictory records, hypothesis branches, temporal evidence chains
Legal / compliance — contract clauses, regulatory requirements, section-level anchoring, audit trails
Medical records — lab results, medication histories, temporal expressions, confidence-rated diagnoses
LLM extraction pipelines — structured output → mentions → candidates → promoted claims with full provenance
Intelligence analysis — multi-source fusion, competing hypotheses, confidence tiers

Ontology seeds ship with the database (migration 0044): 1,300+ predicates across schema.org, ML/AI, physics, geography, and events. Domain-specific predicates are implicitly registered in permissive contexts, then converged through the predicate alignment layer.

Project layout

apps/
  donto-cli/                 CLI: migrate, ingest, query, match, retract
  dontosrv/                  HTTP sidecar (44 endpoints)
  donto-tui/                 Go/Charm TUI: dashboard, firehose, explorer, charts
  docs/                      Astro Starlight documentation site

packages/
  donto-client/              Typed Rust wrapper + test suite
  donto-query/               DontoQL + SPARQL parser and evaluator
  donto-ingest/              8 ingestion format parsers
  donto-migrate/             External store migrators (genealogy SQLite)
  pg_donto/                  pgrx Postgres extension
  sql/migrations/            56 idempotent SQL migrations (source of truth)
  sql/fixtures/              Example data for smoke tests
  sql/scripts/               Epistemic sweep and batch operations
  lean/                      21 Lean 4 modules, 62 theorems
  client-ts/                 TypeScript client (@donto/client)
  tsconfig/                  Shared TypeScript config

PRD.md                       Design specification (principles + maturity ladder)
CLAUDE.md                    Working contract for AI/human contributors
turbo.json                   Turborepo pipeline config

Documentation

PRD.md — design specification. Read §3 (principles) and §2 (maturity ladder) before contributing.
apps/docs — Starlight documentation site with migration reference, schema gap audit, and guides.
ANTHROPOLOGY_README.md — research philosophy and domain context.

Status

donto is early and moving fast. The data model, evidence substrate, predicate alignment layer, and Lean verification layer are solid. Current focus areas:

Predicate alignment at scale — embedding-based alignment, auto-suggest, canonical shadow materialization
AI extraction pipeline — LLM-powered observation → interpretation → judgment chain with full provenance
TUI polish — charts, claim card viewer, context explorer
Source-support verification — automated checking of whether a source span actually supports its claim
More Lean certificates — proof-carrying shapes, derivation trees, and certificate verifiers

Performance is not yet a goal (PRD §25). Current focus: correctness, PRD coverage, test depth.

Contributing

See CLAUDE.md for the working contract (non-negotiables, SQL idioms, testing patterns). Read the PRD before changing core types.

License

Dual licensed under Apache 2.0 and MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.cargo		.cargo
.github		.github
apps		apps
docs		docs
packages		packages
scripts		scripts
.gitignore		.gitignore
ANTHROPOLOGY_README.md		ANTHROPOLOGY_README.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SECURITY.md		SECURITY.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
rustfmt.toml		rustfmt.toml
tsconfig.base.json		tsconfig.base.json
turbo.json		turbo.json

Folders and files

Latest commit

History

Repository files navigation

donto

What makes donto different

At a glance

Quick start

Migrate a genealogy database

A quick example

Predicate alignment

Alignment relations

Register an alignment

Querying with alignment

DontoQL predicate modes

Finding and suggesting alignments

Predicate descriptors

Canonical shadows

Event frames

Evidence substrate

Claim card

Claim lifecycle

Why not higher?

Three query surfaces

Ingestion

TUI dashboard

HTTP sidecar (dontosrv)

Use cases

Project layout

Documentation

Status

Contributing

License

About

Resources

License

Licenses found

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages