Porter

A streaming-first Arrow server for DuckDB — Flight SQL and WebSocket, simple and built for motion.

🧭 Overview

Porter is a DuckDB-backed Arrow server with two transport protocols:

Flight SQL — gRPC-based Arrow Flight SQL
WebSocket — HTTP-based Arrow streaming

SQL goes in. Arrow streams out. Everything else is detail.

Both transports share the same execution engine, ensuring identical query semantics.

Summary Benchmark Results

Metric	WebSocket	FlightSQL (gRPC)
Ops	12	12
Success	12	12
Errors	0	0
Rows/sec	130,712,427	121,704,008
Throughput	1014.32 MB/s	928.53 MB/s
Latency p50	26 ms	17 ms
Latency p95	41 ms	60 ms
Latency p99	41 ms	60 ms

See the Benchmark Report for details.

⚡ Key Characteristics

Streaming-first execution model (Arrow RecordBatch streams)
Dual transport support: Flight SQL + WebSocket
Bulk Ingest — Arrow RecordBatch → DuckDB with transactional semantics
Shared execution engine for semantic parity
Native DuckDB execution via ADBC
Full prepared statement lifecycle with parameter binding
TTL-based handle management with background GC
Live status surface with pipeline flow, pressure, and backpressure visibility

🏗️ Architecture

           +-------------------+
           |   Flight Client   |  <-- ADBC / Flight SQL
           +-------------------+
                     |
               gRPC / Flight
                     |
           +-------------------+
           |   Porter Server   |
           |-------------------|
           | Shared Engine     |  <-- BuildStream()
           +-------------------+
                     |
           +-------------------+
           |     DuckDB        |
           |   (via ADBC)     |
           +-------------------+
                     |
           +-------------------+
           | Arrow RecordBatches|
           +-------------------+

The server is intentionally thin: routing, lifecycle, and streaming glue only. DuckDB does the heavy lifting.

🚀 Getting Started

You have three ways to run Porter:

Docker (fastest path)
go install (clean local toolchain)
Build from source (full control)

🐳 Option 1 — Run with Docker

docker build -t porter .
docker run -p 32010:32010 -p 8080:8080 porter --ws

Run with a persistent database:

docker run -p 32010:32010 -p 8080:8080 -v $(pwd)/data:/data porter --db /data/porter.duckdb --ws

Defaults:

Flight SQL: 0.0.0.0:32010
WebSocket: 0.0.0.0:8080 (when --ws enabled)
Status: 0.0.0.0:9091 (enabled by default)
Database: in-memory (:memory:)

Prerequisites

Install dbc and required ADBC drivers:

curl -LsSf https://dbc.columnar.tech/install.sh | sh
dbc install duckdb
dbc install flightsql

⚙️ Option 2 — Install via `go install`

1. Install Porter

go install github.com/TFMV/porter/cmd/porter@latest

This installs porter into your $GOBIN.

🛠 Option 3 — Build from Source

1. Clone

git clone https://github.com/TFMV/porter.git
cd porter

2. Run

go run ./cmd/porter serve

💻 CLI Usage

porter --help

Quick Start

porter              # Start Flight SQL server on :32010
porter serve        # Same as above

With WebSocket

porter --ws                        # Flight SQL + WebSocket
porter serve --ws                   # Same as above
porter serve --ws --ws-port 9090   # Custom WebSocket port
porter serve --status-port 9191    # Custom status surface
porter serve --ducklake --ducklake-catalog-type duckdb --ducklake-catalog-dsn ./metadata.ducklake
porter serve --ducklake --ducklake-catalog-type sqlite --ducklake-catalog-dsn ./catalog.sqlite --ducklake-data-path ./ducklake-data

Full Flags

Flag	Description	Default
`--db`	DuckDB file path	`:memory:`
`--port`	Flight SQL port	`32010`
`--ws`	Enable WebSocket	`false`
`--ws-port`	WebSocket port	`8080`
`--status`	Enable live status surface	`true`
`--status-port`	Status server port	`9091`
`--ducklake`	Enable DuckLake during server startup	`false`
`--ducklake-catalog-type`	DuckLake metadata backend: `duckdb`, `sqlite`, `postgres`, `mysql`	`duckdb`
`--ducklake-catalog-dsn`	DuckLake metadata DSN or file path	`metadata.ducklake`
`--ducklake-data-path`	DuckLake Parquet/object storage path	empty
`--ducklake-name`	Attached DuckLake catalog name	`my_ducklake`

Execute a query

porter query "SELECT 1 AS value"

REPL

porter repl

Load Parquet

porter load data.parquet

Inspect schema

porter schema table_name

Environment variables

PORTER_DB
PORTER_PORT
PORTER_WS
PORTER_WS_PORT
PORTER_STATUS
PORTER_STATUS_PORT
PORTER_DUCKLAKE
PORTER_DUCKLAKE_CATALOG_TYPE
PORTER_DUCKLAKE_CATALOG_DSN
PORTER_DUCKLAKE_DATA_PATH
PORTER_DUCKLAKE_NAME

DuckLake Startup

When --ducklake is enabled, Porter initializes DuckLake during server startup and keeps the existing FlightSQL/Arrow execution path unchanged. DuckLake is treated as database configuration, not as a separate query mode.

Supported catalog backends:

duckdb
sqlite
postgres
mysql

Examples:

porter serve --ducklake \
  --ducklake-catalog-type duckdb \
  --ducklake-catalog-dsn ./metadata.ducklake

porter serve --ducklake \
  --ducklake-catalog-type sqlite \
  --ducklake-catalog-dsn ./catalog.sqlite \
  --ducklake-data-path ./ducklake-data

porter serve --ducklake \
  --ducklake-catalog-type postgres \
  --ducklake-catalog-dsn postgres://user:pass@host/db \
  --ducklake-data-path s3://bucket/prefix \
  --ducklake-name my_ducklake

Startup initialization:

INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:<catalog>' AS my_ducklake (DATA_PATH '...');
USE my_ducklake;

Per-connection initialization:

LOAD ducklake;
LOAD <catalog-extension>;
USE my_ducklake;

DuckLake inspection and maintenance functions are available through the existing SQL path, for example:

FROM ducklake_snapshots('my_ducklake');
SELECT * FROM ducklake_table_info('my_ducklake');
SELECT * FROM my_table AT (VERSION => 2);
CALL ducklake_merge_adjacent_files('my_ducklake');
CALL ducklake_expire_snapshots('my_ducklake', dry_run => true);
CALL ducklake_cleanup_old_files('my_ducklake', dry_run => true, cleanup_all => true);

Live Status Surface

Porter now exposes a dedicated status server with a living cross-section of the pipeline:

/status — live instrument panel UI
/status/live — current JSON snapshot
/status/stream — SSE stream of snapshots
/status/history — rolling snapshot history
/status/health — deterministic health status

The flow view tracks:

ingress -> transport -> execution -> egress
rows/sec and MB/sec per stage
queue depth and pressure buildup
p50/p95/p99 latency divergence
live structured activity feed
WebSocket vs FlightSQL vs ingest path comparison

🌐 Wire Contract

Flight SQL

Operation	Behavior
SQL Query	Raw SQL → FlightInfo → DoGet stream
Prepared Statements	Handle-based execution with binding
Schema Introspection	Lightweight probe execution
ExecuteUpdate	DDL/DML via DoPutCommandStatementUpdate

WebSocket

Send JSON query request:

{"query": "SELECT * FROM table"}

Receive:

Schema message: {"type": "schema", "fields": ["col1", "col2"]}
Binary IPC frames containing Arrow RecordBatches

📥 Bulk Ingest

Porter supports high-throughput Arrow RecordBatch ingestion via Flight SQL's DoPut:

// Engine interface
IngestStream(ctx, table, reader, opts) (int64, error)

Features:

Feature	Description
Transactional	One stream = one DB transaction
Schema validation	Incoming Arrow schema must match target table
Backpressure	Configurable `MaxUncommittedBytes` (default 64MB)
Table locking	Per-table mutex prevents concurrent writes to same table
Auto-commit	Automatically commits on successful ingest, rolls back on failure

IngestOptions:

Option	Description
`Catalog`	Target catalog name
`DBSchema`	Target schema name
`Temporary`	Create as temporary table
`IngestMode`	Append, replace, or create
`MaxUncommittedBytes`	Memory limit before fail-fast (default 64MB)

Flow:

Client → DoPut (Arrow RecordBatch stream) → Engine.IngestStream → SegmentWriter → Commit → DuckDB

The SegmentWriter accumulates RecordBatches in memory, then atomically publishes them on commit. If MaxUncommittedBytes is exceeded, ingestion fails fast with rollback.

🌊 Streaming Core

Both transports use the same execution primitive:

BuildStream(ctx, sql, params) (*arrow.Schema, <-chan StreamChunk, error)

DuckDB → Arrow RecordReader → Channel → StreamChunk

Backpressure is enforced naturally via the channel boundary.

🛣️ Roadmap

🤝 Contributing

If you've ever looked at a data system and thought:

"Why is this so complicated?"

You're in the right place.

Build it smaller. Make it clearer. Keep it moving.

Name		Name	Last commit message	Last commit date
Latest commit History 418 Commits
.github		.github
assets		assets
bench		bench
cmd		cmd
example		example
execution		execution
internal		internal
telemetry		telemetry
testutil/arrowtest		testutil/arrowtest
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
arrow_flight_docs.md		arrow_flight_docs.md
flightsql_docs.md		flightsql_docs.md
go.mod		go.mod
go.sum		go.sum
metadata.ducklake		metadata.ducklake

Folders and files

Latest commit

History

Repository files navigation

Porter

🧭 Overview

Summary Benchmark Results

⚡ Key Characteristics

🏗️ Architecture

🚀 Getting Started

🐳 Option 1 — Run with Docker

Prerequisites

⚙️ Option 2 — Install via go install

1. Install Porter

🛠 Option 3 — Build from Source

1. Clone

2. Run

💻 CLI Usage

Quick Start

With WebSocket

Full Flags

Execute a query

REPL

Load Parquet

Inspect schema

Environment variables

DuckLake Startup

Live Status Surface

🌐 Wire Contract

Flight SQL

WebSocket

📥 Bulk Ingest

🌊 Streaming Core

🛣️ Roadmap

🤝 Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

⚙️ Option 2 — Install via `go install`

Packages