Skip to content

OAuth: design third mode exchange — RFC 8693 token exchange for spec-compliant ClickHouse forwarding #102

@BorisTyshkevich

Description

@BorisTyshkevich

OAuth: design third mode exchange — RFC 8693 token exchange for spec-compliant ClickHouse forwarding

Executive summary

altinity-mcp currently has two OAuth modes (pkg/config/config.go:111). This issue proposes a third, exchange, that closes the spec gap in forward without removing it:

Mode Inbound auth CH credentials Spec Status
gating MCP-minted HS256, aud=MCP static config shipped, used by otel-mcp.demo.altinity.cloud
forward upstream IdP JWT, passed through inbound JWT verbatim ✗ × 2 shipped, deviation documented
exchange upstream IdP JWT, validated locally MCP-minted RS256, aud=clickhouse proposed by this issue

One-paragraph takeaway: exchange is forward plus spec compliance, at the operational cost of an RSA keypair and a JWKS endpoint reachable from ClickHouse. The practical security gain for the current otel-mcp deployment is marginal (single Auth0 client, single CH downstream), but it future-proofs altinity-mcp for multi-tenant Auth0 setups, additional downstream sinks, audit-attribution requirements, and ecosystem MCP clients running in untrusted contexts.

This issue is the design, not the implementation. The intent is to file a follow-up PR that uses this issue as the spec.


1. What RFC 8693 actually defines

RFC 8693 is "OAuth 2.0 Token Exchange" — a grant type for one-token-for-another swaps with narrower privileges. The mechanism is just an OAuth grant; semantically it's privilege narrowing.

Key concepts:

  • Grant type: urn:ietf:params:oauth:grant-type:token-exchange.
  • Subject token (RFC 8693 §2.1): the token you have. Typically the user's existing access token.
  • Actor token (RFC 8693 §2.1): if the request is on behalf of a user, an actor token (usually a client credential) identifies the relying party. Recorded in the act claim of the resulting token (§4.1).
  • Privilege-narrowing primitives: audience, resource, scope, requested_token_type parameters let the holder request a token with strictly less authority than the subject token.
  • act claim (§4.1): records the delegation chain. Format: {"iss": …, "client_id": …}. Lets a downstream resource server attribute "user X delegated through service Y".

Three classic deployment patterns:

  1. Cross-domain identity translation — gateway exchanges a token from IdP-A for one valid at IdP-B. Not our case.
  2. Service-to-service privilege narrowing — a service receives a broad token, exchanges it for a narrowly-scoped downstream token. This is our case: MCP receives a broad upstream IdP token, would exchange it for a aud=clickhouse token.
  3. Impersonation / delegation between user agents — A acts on behalf of B with B's consent. Not our case; we never act on behalf of a different user than the subject token's.

The MCP spec (2025-11-25 §Access Token Privilege Restriction) doesn't reference RFC 8693 by name, but its two MUSTs (validate + don't pass-through) are exactly what RFC 8693's pattern (2) accomplishes.


2. Threat model — why the spec demands this

Inputs

  • The Auth0 ID token MCP receives in forward mode has aud=fAkf9qpOo0HBI2lA8Nc2R1fOqXdJEshx (our static client ID — see deploy/otel/mcp-values.yaml in the sibling deployment repo).
  • Auth0's JWKS is public; any service that trusts Auth0's JWKS with the matching audience accepts the token.

Leak surface today (passthrough)

A leaked Auth0 token is usable against anything that trusts Auth0 with the matching audience — not just ClickHouse. Today that's only us; the moment a future deployment adds a second downstream sink trusting the same Auth0 client (a Grafana proxy, an S3 sidecar, a separate analytics service), the blast radius widens silently. The token also carries the full identity assertion (email, email_verified, hd, iss, sub) — every relying service sees the full set whether it needs it or not.

Leak surface with exchange

MCP mints a JWT with aud=clickhouse-only, ~5–10-minute TTL, only the claims CH needs (typically email or sub). A leaked exchange token is bounded to ClickHouse and to a single short window. The actor (act.client_id = the DCR client_id) is recorded so a leak's provenance is traceable to the specific MCP client (claude.ai vs codex vs an attacker-DCR'd client).

Honest assessment for the current otel-mcp deployment

Marginal. Closed loop, single Auth0 client, single CH downstream, no live forward-mode users (otel uses gating). The threat model bites harder in any of these scenarios:

  • Auth0 tenant grows to multiple downstream services trusting the same client.
  • MCP integrates additional downstream sinks beyond ClickHouse (Grafana, S3, Redis, etc. — each can have its own aud).
  • Compliance / audit demands explicit "MCP minted this for user X via DCR client Y" trails.
  • MCP runs untrusted tools that might exfiltrate the Authorization header in a tool response.

3. How forward mode works today (existing-code map)

Findings from a Phase-1 exploration pass over the OAuth code surface:

Concern File:line What happens today
Inbound validation skipped pkg/server/server_client.go:230 only validates in gating mode
Token passthrough to CH pkg/server/server_client.go:141 headers["Authorization"] = "Bearer " + token
Forward refresh wraps upstream refresh in JWE cmd/altinity-mcp/oauth_server.go:1429 mintForwardRefreshToken
Forward refresh exchange cmd/altinity-mcp/oauth_server.go:1716 handleOAuthTokenRefreshForward
Auth-code response returns upstream token verbatim cmd/altinity-mcp/oauth_server.go:1543 response["access_token"] = bearerToken

Forward-mode-only configuration knobs (pkg/config/config.go:101-206):

  • UpstreamOfflineAccess — request offline_access scope upstream + JWE-wrap upstream refresh tokens.
  • ClickHouseHeaderName — header to forward token in. Defaults to Authorization (Bearer); custom name sends raw token.
  • ClaimsToHeaders — projects validated JWT claims into ClickHouse HTTP headers (e.g. {"email": "X-ClickHouse-Email"}). Becomes redundant in exchange mode (claims travel inside the JWT itself).

4. ClickHouse-side reality — what <token_processors> actually accepts

Verified via pkg/server/oauth_e2e_test.go:47-280 (the embedded-CH test wires <token_processors> against a mock OIDC provider), every example in helm/altinity-mcp/values_examples/*.yaml, and docs/oauth_authorization.md:439-660.

Findings:

  • Only <type>openid</type> and <type>azure</type> are documented in any example or test.
  • HMAC validators are NOT supported. Every documented example uses configuration_endpoint or the equivalent OIDC discovery URL. This forces RS256 (or another asymmetric algorithm advertised in the discovery doc) and rules out the simpler "shared HMAC secret" design.
  • The Antalya processor specifically requires either userinfo_endpoint OR token_introspection_endpoint in the discovery doc (pkg/server/oauth_e2e_test.go:47-51). MCP must serve a userinfo endpoint for CH to consult after JWT signature validation.
  • token_cache_lifetime (60s default) caches successful validations — relevant to multi-replica key-rotation scenarios.

CH config delta (operator action required when migrating)

Before (Auth0):

<token_processors>
  <auth0>
    <type>openid</type>
    <configuration_endpoint>https://altinity.auth0.com/.well-known/openid-configuration</configuration_endpoint>
    <token_cache_lifetime>60</token_cache_lifetime>
    <username_claim>email</username_claim>
  </auth0>
</token_processors>

After (exchange mode):

<token_processors>
  <altinity_mcp_exchange>
    <type>openid</type>
    <configuration_endpoint>https://otel-mcp.demo.altinity.cloud/.well-known/mcp-exchange/openid-configuration</configuration_endpoint>
    <token_cache_lifetime>60</token_cache_lifetime>
    <username_claim>email</username_claim>
  </altinity_mcp_exchange>
</token_processors>
<user_directories>
  <token>
    <processor>altinity_mcp_exchange</processor>
    <common_roles><default_role /></common_roles>
  </token>
</user_directories>

Plus a worked example shipped alongside the implementation as helm/altinity-mcp/values_examples/mcp-oauth-exchange.yaml.


5. Proposed design — the exchange mode

5.1. New mode value

OAuth.Mode accepts a new string "exchange" alongside "gating" and "forward". Existing modes keep their semantics; this is purely additive. Default for new deployments stays gating (most common case).

5.2. Inbound side

Identical to forward mode's inbound side TODAY plus mandatory local validation. The MCP client presents an upstream IdP token in Authorization: Bearer; MCP runs parseAndVerifyExternalJWT (pkg/server/server_auth_oauth.go:267) — which already exists, validates JWKS, enforces issuer / audience / identity policy.

5.3. Outbound side

When MCP invokes ClickHouse in exchange mode, it mints a fresh CH-bound JWT:

Header:   {"alg": "RS256", "typ": "JWT", "kid": "mcp-exchange-v1"}
Payload:  {
  "iss":   "https://otel-mcp.demo.altinity.cloud",
  "aud":   "<configured oauth.exchange.clickhouse_audience>",
  "sub":   "<from validated upstream claims>",
  "email": "<from validated upstream claims>",
  "email_verified": true,
  "act":   {"iss": "altinity-mcp", "client_id": "<DCR client_id>"},
  "exp":   <min(upstream_exp, now + token_ttl_seconds)>,
  "iat":   <now>,
  "jti":   <random>
}
Signed:   RSA-2048 private key held by MCP.

The token is forwarded to CH in Authorization: Bearer. CH validates against MCP's published JWKS.

The act claim (RFC 8693 §4.1) is the mechanism for recording that the token represents "user X delegated through MCP DCR client Y". CH log shows real attribution.

5.4. New endpoints

Three new HTTP routes when mode=exchange:

Path (default) Purpose Notes
/.well-known/mcp-exchange/openid-configuration OIDC-style discovery doc satisfies CH's configuration_endpoint
/.well-known/mcp-exchange/jwks.json Public JWK set one key, kid mcp-exchange-v1
/oauth/exchange/userinfo Translates exchange JWT → claims JSON what CH calls after JWT verify

All three configurable via oauth.exchange.{discovery,jwks,userinfo}_path.

The userinfo endpoint is non-trivial: CH POSTs the exchange JWT to it, expects a JSON document of the form {"sub": …, "email": …, …}. We re-validate the inbound exchange JWT (via the same public key — yes, recursive on a single replica), extract claims, return them. Idempotent and short-circuited by CH's token_cache_lifetime.

5.5. RSA keypair lifecycle

Three options for sourcing the keypair, ordered by operator preference:

  1. oauth.exchange.private_key_pem_file — path to PEM on disk (mounted from K8s Secret in production).
  2. oauth.exchange.private_key_pem — inline PEM in config.
  3. oauth.exchange.auto_generate: true — RSA-2048 generated in-memory at startup. Single-replica only. Keypair is ephemeral; rotates every pod restart.

Default: auto-generate with a loud WARN at startup if no PEM is configured. Logs the public-key SPKI SHA-256 fingerprint for audit.

Multi-replica pitfall: each replica auto-generates its own keypair, JWKS contains only its own key. CH cache hits one replica then misses the next → AUTHENTICATION_FAILED on cache rotation. Production deployments must use the file/inline path. The test plan below includes a deliberate test that documents this failure mode.

5.6. Configuration shape

New nested struct under OAuthConfig:

oauth:
  mode: exchange
  exchange:
    private_key_pem: ""               # or:
    private_key_pem_file: /etc/secrets/exchange.pem
    auto_generate: false              # default false; true only for dev
    kid: mcp-exchange-v1
    clickhouse_audience: https://clickhouse.internal:8123
    token_ttl_seconds: 600            # 10 min, capped by upstream exp
    jwks_path: /.well-known/mcp-exchange/jwks.json
    discovery_path: /.well-known/mcp-exchange/openid-configuration
    userinfo_path: /oauth/exchange/userinfo

The exchange key is intentionally separate from signing_secret. Different cryptosystem (RSA vs HMAC), different threat model (CH replicas hold the public JWK; signing_secret never leaves MCP). HKDF-deriving an RSA key from signing_secret would re-bind the two and is the trap to avoid.


6. Migration & coexistence

6.1. Strategy

forward stays as a deprecated mode. exchange is added as the spec-compliant alternative. Operators choose at deployment time. No silent migration.

6.2. Operator migration path (forward → exchange)

  1. Generate keypair: openssl genrsa -out exchange.pem 2048.
  2. Mount via K8s Secret, set oauth.exchange.private_key_pem_file.
  3. Set oauth.exchange.clickhouse_audience to the CH server URL.
  4. Set oauth.mode: exchange.
  5. Deploy MCP. Verify JWKS endpoint reachable from CH cluster network.
  6. Update CH <token_processors> configuration_endpoint to MCP's discovery URL (XML diff in §4 above).
  7. Restart CH (it caches <token_processors> config at startup).
  8. Verify with a query that returns the user's email (e.g. SELECT currentUser()).

Order matters: MCP must be reachable before CH restarts; otherwise CH boots without the processor and rejects every token.

6.3. Deprecation timeline for forward

Open question for the team: do we remove forward after one chart version (e.g. 1.6.0 deprecated, 1.7.0 removed) or keep it indefinitely with a loud startup warning? Recommendation: keep indefinitely. Cost of carrying the code is small and removing it forces operators with different threat models off the platform.


7. Testing strategy

7.1. Unit tests (~10 tests in new pkg/server/server_auth_exchange_test.go)

Each test asserts one property:

  • MintExchangeToken produces an RS256-signed JWT decodable by the public JWK with the expected claim set.
  • Exchange JWT exp is bounded by min(upstream_exp, ttl).
  • MintExchangeToken errors when no key has been loaded.
  • JWKS endpoint serves only the public key (no private prime).
  • Discovery endpoint includes the userinfo_endpoint field Antalya requires.
  • Userinfo endpoint validates the inbound exchange JWT and 401s on forged tokens.
  • LoadExchangeKey(auto_generate=true) produces different moduli on successive calls (catches accidental caching).
  • PEM-file load round-trip.

7.2. End-to-end integration test (new pkg/server/oauth_exchange_e2e_test.go)

Mirrors pkg/server/oauth_e2e_test.go structure:

  1. Boot mock upstream IdP.
  2. Start MCP server in exchange mode with auto-generated keypair.
  3. Boot Antalya ClickHouse via internal/testutil/embeddedch.Setup with <token_processors> configuration_endpoint pointing at the MCP server's discovery URL.
  4. Issue an upstream JWT for user@example.com.
  5. Run SELECT currentUser() through MCP. Assert the result is user@example.com — proving the exchange tokens arrived at CH and were validated against MCP's JWKS.
  6. Sub-test: assert the outbound Authorization header CH receives is NOT byte-equal to the inbound MCP-side header, AND decodes against MCP's public key (proves passthrough is gone).
  7. Sub-test: kill MCP, restart with auto-generate again (new keypair). Replay the query. Assert CH rejects with AUTHENTICATION_FAILED because its cached JWKS no longer matches. This is the multi-replica pitfall in concrete form — it documents WHY a persistent key file is required.

7.3. Modified existing tests

  • TestOAuthMCPAuthInjectorForwardModePassesOpaqueBearerToken (oauth_server_test.go) — gets a new sibling …ExchangeModeRejectsOpaqueBearerToken. In exchange mode, opaque (non-JWT) bearers fail validation; the gating-style validation runs unconditionally.
  • Forward-mode tests stay green — no behaviour change in forward.

8. Deferred / explicitly out of scope

  • Public RFC 8693 endpoint (/oauth/token with grant_type=urn:ietf:params:oauth:grant-type:token-exchange). Internal-only first; expose later if a use case appears. Cost: more attack surface (any MCP-bound token could mint CH tokens externally), state-machine for grant validation, more tests.
  • Multi-key rotation tooling. First implementation ships a single kid. Rotation = restart with new key + accept a brief 401 window while CH cache (token_cache_lifetime, default 60s) clears. Multi-key publishing (two JWKs in JWKS, drop old kid after cache rotates) is a follow-up.
  • Removing ClaimsToHeaders mapping (pkg/server/server_client.go:147-184). Becomes redundant once claims live inside the exchange JWT itself; CH reads them via username_claim in the processor config. Mark deprecated, remove in a later release.
  • Ed25519 alternative to RS256. Faster + smaller; CH's Antalya processor advertises whatever id_token_signing_alg_values_supported the discovery doc declares, but cross-version support is uneven. Stick with RS256 for safety; revisit if CPU profiling shows JWT signing as a hotspot (unlikely — once per request, not per tool call).
  • JWE encryption of exchange tokens. Tokens are short-lived and travel on an internal network; JWE adds key-management complexity for negligible gain. Revisit only if MCP↔CH crosses a trust boundary.
  • Per-tool audience scoping (e.g., aud=clickhouse-readonly for read-only tools, aud=clickhouse-write for write tools). RFC 8693 supports it via the audience parameter. Useful only after Tools config grows finer-grained access control.
  • Removing forward mode entirely. Keep it; cheap to maintain; some deployments may have legitimate threat models where passthrough is acceptable.

9. Open questions for the team

  1. Is the marginal-gain analysis above acceptable as a reason to ship? If not, what additional constraint (multi-tenant, audit, tool sandbox) would the doc need to capture before we start coding?
  2. Should auto_generate be allowed at all, or hard-fail in all non-explicitly-configured cases? Current direction: auto-generate with a loud warn.
  3. Is the userinfo endpoint design (re-validating the same JWT against the same key) acceptable, or should we use a server-side claims cache keyed by jti?
  4. Are we comfortable with a token_ttl_seconds default of 600s (10 min)? OAuth 2.1 prefers shorter; CH cache is 60s by default so anything >60s lengthens the leak window without improving ergonomics.
  5. Deprecation timeline for forward: keep indefinitely, or remove after a chart-version cycle?

10. References


11. Definition of done

This issue is the design. It is "done" when:

  • The team has read it and either approved it or replaced it with a different approach.
  • The follow-up implementation PR (separate issue / branch) ships:
    • The new exchange mode behind OAuth.Mode = "exchange".
    • The three new endpoints + RSA keypair lifecycle.
    • Unit tests + e2e test described in §7.
    • Updated docs/oauth_authorization.md with the new mode.
    • A helm/altinity-mcp/values_examples/mcp-oauth-exchange.yaml worked example.
  • A user runs the CH <token_processors> migration path in §6.2 against a real deployment and the SELECT currentUser() query returns the upstream user's email.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions