Skip to content

Available Tools

mcp-datahub provides 12 MCP tools for interacting with DataHub (9 read + 3 write), covering 35 write operations through the CRUD discriminator pattern.

DataHub Version Compatibility

Minimum: DataHub 1.3.x. Full feature set: DataHub 1.4.x.

All read tools and most write operations work with DataHub 1.3.x+. Only document operations require DataHub 1.4.x+:

Operation Tool Requires
what=document datahub_create DataHub 1.4.x+
what=document_contents datahub_update DataHub 1.4.x+
what=document_status datahub_update DataHub 1.4.x+
what=document_related_entities datahub_update DataHub 1.4.x+
what=document_sub_type datahub_update DataHub 1.4.x+
what=document datahub_delete DataHub 1.4.x+

The client gracefully handles version differences for read queries — returning empty results (not errors) when a feature is unavailable.

Tool Annotations

All tools include MCP tool annotations that describe their behavior to AI clients:

Hint Read Tools datahub_create datahub_update datahub_delete Description
ReadOnlyHint true false false false Whether the tool only reads data
DestructiveHint (default) false false true Whether the tool may destructively update
IdempotentHint true false true true Whether repeated calls produce the same result
OpenWorldHint true true true true Whether the tool interacts with external entities

OpenWorldHint is true for all tools because every tool communicates with an external DataHub instance.

These annotations help MCP clients make informed decisions about tool invocation (e.g., auto-approving read-only tools). Library users can override annotations per-tool or per-toolkit; see the Tools API Reference.

Multi-Server Support

All tools accept an optional connection parameter to target a specific DataHub server in multi-server environments. Use datahub_list_connections to discover available connections.

Parameter Type Required Description
connection string No Named connection to use (see datahub_list_connections)

datahub_list_connections

List all configured DataHub server connections.

Parameters: None

Example Response:

{
  "connections": [
    {
      "name": "prod",
      "url": "https://prod.datahub.example.com",
      "is_default": true
    },
    {
      "name": "staging",
      "url": "https://staging.datahub.example.com",
      "is_default": false
    }
  ],
  "count": 2
}

Use Cases:

  • Discover available connections before querying
  • Verify multi-server configuration
  • Check which connection is the default

Search for datasets, dashboards, pipelines, and other assets.

Parameters:

Parameter Type Required Description
query string Yes Search query string
entity_type string No Filter by entity type (DATASET, DASHBOARD, etc.)
limit integer No Maximum results (default: 10, max: 100)
offset integer No Pagination offset (default: 0)
connection string No Named connection to use

Example Request:

{
  "query": "customer",
  "entity_type": "DATASET",
  "limit": 5
}

Example Response:

{
  "entities": [
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)",
      "type": "DATASET",
      "name": "customers",
      "platform": "snowflake",
      "description": "Customer master data including contact information",
      "owners": ["Data Team"],
      "tags": ["pii", "customer-data"],
      "domain": "Sales"
    },
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customer_orders,PROD)",
      "type": "DATASET",
      "name": "customer_orders",
      "platform": "snowflake",
      "description": "Order history by customer"
    }
  ],
  "total": 42,
  "offset": 0,
  "limit": 5
}

Common Use Cases:

  • Find datasets by name or description
  • Search within a specific domain
  • Discover dashboards related to a topic
  • Find entities by tag

Entity Type Values:

Value Description
DATASET Tables, views, files
DASHBOARD BI dashboards
CHART Individual visualizations
DATA_FLOW Pipelines
DATA_JOB Pipeline tasks
GLOSSARY_TERM Glossary terms
DOMAIN Domains
DATA_PRODUCT Data products

datahub_get_entity

Get detailed metadata for an entity by URN.

Parameters:

Parameter Type Required Description
urn string Yes DataHub URN of the entity
connection string No Named connection to use

Example Request:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)"
}

Example Response:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)",
  "type": "DATASET",
  "name": "customers",
  "qualifiedName": "prod.sales.customers",
  "description": "Customer master data including contact information and preferences",
  "platform": "snowflake",
  "owners": [
    {
      "urn": "urn:li:corpuser:[email protected]",
      "name": "Jane Smith",
      "type": "DATAOWNER"
    }
  ],
  "tags": [
    {"name": "pii", "urn": "urn:li:tag:pii"},
    {"name": "customer-data", "urn": "urn:li:tag:customer-data"}
  ],
  "glossaryTerms": [
    {"name": "Customer", "urn": "urn:li:glossaryTerm:Customer"},
    {"name": "PII", "urn": "urn:li:glossaryTerm:Classification.PII"}
  ],
  "domain": {
    "name": "Sales",
    "urn": "urn:li:domain:sales"
  },
  "created": "2023-06-15T10:30:00Z",
  "lastModified": "2024-01-10T14:22:00Z",
  "properties": {
    "customProperties": {
      "retention_days": "365",
      "data_classification": "confidential"
    }
  }
}

Common Use Cases:

  • Get full details about a search result
  • Find owners for a dataset
  • Check tags and glossary terms
  • Get custom properties

datahub_get_schema

Get schema fields for a dataset with descriptions.

Parameters:

Parameter Type Required Description
urn string Yes Dataset URN
connection string No Named connection to use

Example Request:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)"
}

Example Response:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)",
  "name": "customers",
  "fields": [
    {
      "fieldPath": "customer_id",
      "type": "NUMBER",
      "nativeType": "INT64",
      "description": "Unique customer identifier",
      "nullable": false,
      "isPrimaryKey": true
    },
    {
      "fieldPath": "email",
      "type": "STRING",
      "nativeType": "VARCHAR(255)",
      "description": "Customer email address",
      "nullable": true,
      "glossaryTerms": [
        {"name": "PII", "urn": "urn:li:glossaryTerm:Classification.PII"},
        {"name": "Email", "urn": "urn:li:glossaryTerm:ContactInfo.Email"}
      ]
    },
    {
      "fieldPath": "created_at",
      "type": "TIMESTAMP",
      "nativeType": "TIMESTAMP_NTZ",
      "description": "Account creation timestamp",
      "nullable": false
    },
    {
      "fieldPath": "address.street",
      "type": "STRING",
      "nativeType": "VARCHAR(500)",
      "description": "Street address",
      "nullable": true
    },
    {
      "fieldPath": "address.city",
      "type": "STRING",
      "nativeType": "VARCHAR(100)",
      "description": "City name",
      "nullable": true
    }
  ],
  "primaryKeys": ["customer_id"],
  "foreignKeys": []
}

Field Properties:

Property Description
fieldPath Full path including nested fields
type Normalized type (STRING, NUMBER, etc.)
nativeType Platform-specific type
description Field description
nullable Whether field can be null
isPrimaryKey Whether field is a primary key
glossaryTerms Associated glossary terms

datahub_get_lineage

Get upstream and downstream lineage for an entity. Supports both dataset-level and column-level lineage via the level parameter.

Parameters:

Parameter Type Required Description
urn string Yes Entity URN
level string No Granularity: dataset or column (default: dataset)
direction string No UPSTREAM, DOWNSTREAM, or BOTH (default: BOTH, dataset level only)
depth integer No Maximum traversal depth (default: 3, max: 5, dataset level only)
connection string No Named connection to use

Example Request:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.analytics.customer_metrics,PROD)",
  "direction": "BOTH",
  "depth": 2
}

Example Response:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.analytics.customer_metrics,PROD)",
  "upstream": [
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)",
      "name": "customers",
      "type": "DATASET",
      "platform": "snowflake",
      "degree": 1
    },
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.orders,PROD)",
      "name": "orders",
      "type": "DATASET",
      "platform": "snowflake",
      "degree": 1
    },
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.raw.customer_events,PROD)",
      "name": "customer_events",
      "type": "DATASET",
      "platform": "snowflake",
      "degree": 2
    }
  ],
  "downstream": [
    {
      "urn": "urn:li:dashboard:(looker,customer_360)",
      "name": "Customer 360 Dashboard",
      "type": "DASHBOARD",
      "platform": "looker",
      "degree": 1
    },
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.ml.churn_features,PROD)",
      "name": "churn_features",
      "type": "DATASET",
      "platform": "snowflake",
      "degree": 1
    }
  ]
}

Common Use Cases:

  • Impact analysis before schema changes
  • Root cause analysis for data issues
  • Understanding data flow
  • Discovering related datasets

Column-Level Lineage (level=column)

When level=column is specified, returns fine-grained column-level lineage mappings instead of dataset-level lineage. The direction and depth parameters are ignored for column-level lineage.

Example Request:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.analytics.customer_metrics,PROD)",
  "level": "column"
}

Example Response:

{
  "dataset_urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.analytics.customer_metrics,PROD)",
  "mappings": [
    {
      "downstream_column": "customer_id",
      "upstream_dataset": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)",
      "upstream_column": "id",
      "transform": "IDENTITY"
    },
    {
      "downstream_column": "total_orders",
      "upstream_dataset": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.orders,PROD)",
      "upstream_column": "order_count",
      "transform": "AGGREGATE",
      "confidence_score": 0.95
    },
    {
      "downstream_column": "last_order_date",
      "upstream_dataset": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.orders,PROD)",
      "upstream_column": "order_date",
      "transform": "AGGREGATE"
    }
  ]
}

Mapping Properties:

Property Description
downstream_column Column name in the target dataset
upstream_dataset URN of the source dataset
upstream_column Column name in the source dataset
transform Transformation type (IDENTITY, AGGREGATE, etc.)
query Optional SQL query that defines the transformation
confidence_score Optional confidence score (0-1) for inferred lineage

datahub_get_queries

Get SQL queries associated with a dataset.

Parameters:

Parameter Type Required Description
urn string Yes Dataset URN
connection string No Named connection to use

Example Response:

{
  "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.sales.customers,PROD)",
  "queries": [
    {
      "query": "SELECT customer_id, email, created_at FROM prod.sales.customers WHERE created_at > DATEADD(day, -30, CURRENT_DATE())",
      "createdAt": "2024-01-10T09:15:00Z",
      "user": "[email protected]"
    },
    {
      "query": "SELECT COUNT(*) as total_customers FROM prod.sales.customers",
      "createdAt": "2024-01-09T14:30:00Z",
      "user": "dashboard_service"
    }
  ],
  "count": 2
}

datahub_get_glossary_term

Get glossary term definition and related assets.

Parameters:

Parameter Type Required Description
urn string Yes Glossary term URN
connection string No Named connection to use

Example Request:

{
  "urn": "urn:li:glossaryTerm:Classification.PII"
}

Example Response:

{
  "urn": "urn:li:glossaryTerm:Classification.PII",
  "name": "PII",
  "description": "Personally Identifiable Information - data that can identify an individual",
  "definition": "PII includes names, email addresses, phone numbers, social security numbers, and other data that can be used to identify a specific person.",
  "termSource": "INTERNAL",
  "parentNode": {
    "name": "Classification",
    "urn": "urn:li:glossaryNode:Classification"
  },
  "relatedTerms": [
    {"name": "Sensitive Data", "urn": "urn:li:glossaryTerm:Classification.Sensitive"},
    {"name": "PHI", "urn": "urn:li:glossaryTerm:Classification.PHI"}
  ],
  "owners": [
    {"name": "Data Governance Team", "type": "DATAOWNER"}
  ]
}

datahub_browse

Browse the catalog to list tags, domains, or data products.

Parameters:

Parameter Type Required Description
what string Yes What to browse: tags, domains, or data_products
filter string No Optional filter string (tags only)
connection string No Named connection to use

Example Request (tags):

{
  "what": "tags"
}

Example Response (tags):

{
  "tags": [
    {"name": "pii", "urn": "urn:li:tag:pii", "description": "Contains personally identifiable information"},
    {"name": "deprecated", "urn": "urn:li:tag:deprecated", "description": "This asset is deprecated"},
    {"name": "certified", "urn": "urn:li:tag:certified", "description": "Quality certified dataset"}
  ]
}

Example Request (domains):

{
  "what": "domains"
}

Example Response (domains):

{
  "domains": [
    {
      "urn": "urn:li:domain:sales",
      "name": "Sales",
      "description": "Sales and revenue data",
      "entityCount": 45
    },
    {
      "urn": "urn:li:domain:marketing",
      "name": "Marketing",
      "description": "Marketing campaigns and analytics",
      "entityCount": 32
    }
  ]
}

Example Request (data_products):

{
  "what": "data_products"
}

Example Response (data_products):

{
  "data_products": [
    {
      "urn": "urn:li:dataProduct:customer-360",
      "name": "Customer 360",
      "description": "Unified view of customer data across all touchpoints",
      "domain": "Sales"
    },
    {
      "urn": "urn:li:dataProduct:revenue-analytics",
      "name": "Revenue Analytics",
      "description": "Revenue metrics and forecasting data",
      "domain": "Finance"
    }
  ]
}

datahub_get_data_product

Get detailed information about a data product.

Parameters:

Parameter Type Required Description
urn string Yes Data product URN
connection string No Named connection to use

Example Response:

{
  "urn": "urn:li:dataProduct:customer-360",
  "name": "Customer 360",
  "description": "Unified view of customer data across all touchpoints",
  "domain": {
    "name": "Sales",
    "urn": "urn:li:domain:sales"
  },
  "owners": [
    {"name": "Customer Data Team", "type": "DATAOWNER"}
  ],
  "assets": [
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.customer360.profile,PROD)",
      "name": "profile",
      "type": "DATASET"
    },
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.customer360.interactions,PROD)",
      "name": "interactions",
      "type": "DATASET"
    }
  ],
  "properties": {
    "sla": "99.9%",
    "refresh_frequency": "hourly"
  }
}

Write Tools

Write tools require DATAHUB_WRITE_ENABLED=true. They use the CRUD discriminator pattern — 3 tools covering 35 operations via the what parameter.


datahub_create

Create a new entity or resource. Returns the URN of the created entity.

Parameters:

Parameter Type Required Description
what string Yes Entity type: tag, domain, glossary_term, data_product, document, application, query, incident, structured_property, data_contract
name string Varies Name or title (required for most types)
description string No Description or content
parent_node string No Parent glossary node URN (glossary_term)
domain_urn string No Domain URN (data_product, required)
value string No SQL statement (query)
language string No Query language, default SQL (query)
dataset_urns string[] No Associated dataset URNs (query, data_contract)
entity_urns string[] No Affected entity URNs (incident)
incident_type string No Incident type (incident)
priority string No Priority: LOW, MEDIUM, HIGH, CRITICAL (incident)
qualified_name string No Fully qualified name (structured_property, required)
value_type string No Value type: string, number, date, urn (structured_property, required)
entity_types string[] No Applicable entity types (structured_property, required)
cardinality string No SINGLE or MULTIPLE (structured_property)
status string No Publication status: PUBLISHED or UNPUBLISHED (document)
sub_type string No Document sub-type (document)
related_assets string[] No Related asset URNs (document)
global_context bool No Show in global search (document)
schema_assertions string[] No Schema assertion URNs (data_contract)
freshness_assertions string[] No Freshness assertion URNs (data_contract)
data_quality_assertions string[] No Data quality assertion URNs (data_contract)
connection string No Named connection to use

datahub_update

Update metadata on an existing entity.

Parameters:

Parameter Type Required Description
what string Yes What to update (see table below)
action string Varies add/remove (required for tag, glossary_term, link, owner); set/remove (domain, structured_properties, default: set); not used for other what values
urn string Yes Entity URN
value string No New value (description, status, label, message)
target_urn string No Target URN for add/remove (tag, glossary term, owner, domain)
url string No URL for link operations
field_path string No Schema field path (column_description)
name string No Updated name (query, incident, structured_property)
description string No Updated description
ownership_type string No Ownership type, e.g. TECHNICAL_OWNER (owner add only)
properties object[] No Structured property values to set (structured_properties)
property_urns string[] No Property URNs to remove (structured_properties)
language string No Query language (query only)
dataset_urns string[] No Dataset URNs (query, data_contract)
incident_type string No Incident type (incident only)
priority string No Priority: LOW, MEDIUM, HIGH, CRITICAL (incident only)
state string No Incident state: ACTIVE, RESOLVED (incident_status, required)
title string No Document title (document_contents)
text string No Document text (document_contents)
entity_urns string[] No Related entity URNs (document_related_entities)
schema_assertions string[] No Schema assertion URNs (data_contract)
freshness_assertions string[] No Freshness assertion URNs (data_contract)
data_quality_assertions string[] No Data quality assertion URNs (data_contract)
connection string No Named connection to use

what values and required action:

what action Description
description (not used) Set entity description
column_description (not used) Set schema field description
tag required: add/remove Add or remove a tag
glossary_term required: add/remove Add or remove a glossary term
link required: add/remove Add or remove a link
owner required: add/remove Add or remove an owner
domain set/remove (default: set) Set or remove domain assignment
structured_properties set/remove (default: set) Set or remove structured property values
structured_property (not used) Update a structured property definition
incident_status (not used) Update incident status (requires state)
incident (not used) Update incident details
query (not used) Update query properties
document_contents (not used) Update document title/text
document_status (not used) Update document status
document_related_entities (not used) Update document related entities
document_sub_type (not used) Update document sub-type
data_contract (not used) Upsert a data contract

datahub_delete

Delete an entity or resource. This is destructive and cannot be undone.

Parameters:

Parameter Type Required Description
what string Yes Entity type: query, tag, domain, glossary_entity, data_product, application, document, structured_property
urn string Yes URN of the entity to delete
connection string No Named connection to use

Error Responses

All tools may return error responses:

{
  "error": true,
  "message": "Entity not found: urn:li:dataset:..."
}

Common Errors:

Error Cause Solution
unauthorized Invalid or expired token Generate new token
entity not found URN does not exist Verify URN is correct
connection refused Cannot reach DataHub Check DATAHUB_URL
rate limit exceeded Too many requests Reduce request rate
invalid parameter Bad parameter value Check parameter format