mcp-datahub mcp server for datahub
v1.x ·· UTC part of txn2 ↗

project / mcp-datahub

org / txn2

est. 2025

pkg.go.dev ↗

apache 2.0 · read-only by default

mcp-datahub mcp · go metadata catalog for ai.

mcp-datahub is an MCP server that gives AI assistants safe, structured access to DataHub metadata catalogs. Search across datasets, dashboards, pipelines, and queries. Read schemas, walk upstream and downstream lineage at table or column grain, fetch glossary terms, and resolve domains and data products. Read-only by default; write operations like tagging, linking, and incident updates are opt-in. Speaks to any DataHub instance, v1.3 or later.

Run it standalone, or import it as a Go library and compose your own MCP server with custom auth, tenant isolation, and audit logging through middleware and interceptors. Multi-server out of the box. Apache 2.0, part of the txn2 mcp data platform alongside mcp-trino and mcp-s3.

§ 01 / install · run

Two ways to use it.

Run mcp-datahub as a standalone server and wire it to Claude, Cursor, or any MCP client. Or import the Go packages and compose a custom MCP server with your own auth, tenancy, and audit logic. Same toolkit, two surfaces.

SRV-001 · go · server docs ↗

standalone server

One binary. Wire to any MCP client over stdio.

Install with go install, Docker, or grab a release. Point it at a DataHub GraphQL endpoint via environment variables. Read-only mode is the default; flip DATAHUB_WRITE_ENABLED=true when you mean it. Add multiple instances through DATAHUB_ADDITIONAL_SERVERS and let the AI pick.

~ / mcp-datahub
$ go install github.com/txn2/mcp-datahub/cmd/mcp-datahub@latest

$ claude mcp add datahub \
    -e DATAHUB_URL=https://datahub.example.com/api/graphql \
    -e DATAHUB_TOKEN=$TOKEN \
    -- mcp-datahub
  added: datahub (read-only, 9 read tools)

$ claude
# ask: which tables feed the orders dashboard?

LIB-002 · go · library docs ↗

go library

Compose a custom MCP server. Bring your own auth.

client.New returns a configured DataHub handle. tools.NewToolkit packages the catalog operations as MCP tools. Register them on your own MCPServer, then attach Use middleware, interceptors, and transformers for redaction or audit logging. No forking required.

~ / library
$ go get github.com/txn2/mcp-datahub

// main.go
cfg := multiserver.FromEnv()
dh, _ := client.New(ctx, cfg.Primary())

kit := tools.NewToolkit(dh, tools.Config{
    WriteEnabled: false,
    DefaultLimit: 20,
})
kit.RegisterAll(srv)

§ 02 / what it does

DataHub as tools. Not a black box.

mcp-datahub exposes DataHub as a small, deliberate set of MCP tools. Each tool maps to one catalog operation, with strict input validation, sane limits, and clear error contracts. Twelve tools cover thirty-five operations through a discriminator-pattern surface.

  1. 001
    search, get, browse

    datahub_search with keyword or semantic mode, datahub_get_entity by URN, datahub_get_schema for fields, datahub_browse for tags, domains, and data products. Nine read tools that are always safe to expose.

    tools
  2. 002
    read-only by default

    DATAHUB_WRITE_ENABLED=false is the default. The three write tools (datahub_create, datahub_update, datahub_delete) are blocked unless explicitly enabled, and can be toggled per connection.

    safety
  3. 003
    multi-server

    Configure several DataHub instances via DATAHUB_ADDITIONAL_SERVERS. Query production, staging, and a local stack from the same MCP install. datahub_list_connections lets the AI pick.

    runtime
  4. 004
    column-level lineage

    datahub_get_lineage walks upstream and downstream graphs for any entity. Set level=column for column-grain lineage. The AI can trace where a value came from before quoting it.

    graph
  5. 005
    crud through one tool

    Three write tools cover thirty-five operations through a what discriminator: tag entities, link documents, manage glossary terms, edit data products, file incidents, attach owners. Idempotent updates, destructive deletes flagged.

    write
  6. 006
    middleware & interceptors

    Wrap tool execution with Use middleware. Block or rewrite requests with interceptors. Redact, audit, or transform results post-execution. Compose enterprise concerns without forking the toolkit.

    compose
  7. ···
    part of the txn2 mcp data platform

    Sister projects: mcp-trino for federated SQL, mcp-s3 for object storage, mcp-data-platform as the catalog.

    + ecosystem

// open source

mcp-datahub is one of several open source components by Craig Johnston, sponsored by Deasil Works, Inc. and Plexara. Released under the Apache 2.0 license. Built to give AI assistants a safe, composable bridge to data catalogs.