Configuration¶
All configuration is done through environment variables.
Required Variables¶
| Variable | Description |
|---|---|
DATAHUB_URL |
DataHub GMS URL (e.g., https://datahub.company.com) |
DATAHUB_TOKEN |
Personal access token from DataHub |
Optional Variables¶
| Variable | Description | Default |
|---|---|---|
DATAHUB_TIMEOUT |
Request timeout in seconds | 30 |
DATAHUB_RETRY_MAX |
Maximum retry attempts | 3 |
DATAHUB_DEFAULT_LIMIT |
Default search result limit | 10 |
DATAHUB_MAX_LIMIT |
Maximum allowed limit | 100 |
DATAHUB_MAX_LINEAGE_DEPTH |
Maximum lineage traversal depth | 5 |
DATAHUB_CONNECTION_NAME |
Display name for primary connection | datahub |
DATAHUB_ADDITIONAL_SERVERS |
JSON map of additional servers | (empty) |
DATAHUB_WRITE_ENABLED |
Enable write operations (true or 1) |
false |
DATAHUB_DEBUG |
Enable debug logging (1 or true) |
false |
Extensions Variables¶
Optional middleware can be enabled via environment variables:
| Variable | Description | Default |
|---|---|---|
MCP_DATAHUB_EXT_LOGGING |
Enable structured logging of tool calls | false |
MCP_DATAHUB_EXT_METRICS |
Enable metrics collection | false |
MCP_DATAHUB_EXT_METADATA |
Enable metadata enrichment on results | false |
MCP_DATAHUB_EXT_ERRORS |
Enable error hint enrichment | true |
Example Configuration¶
# Required
export DATAHUB_URL=https://datahub.company.com
export DATAHUB_TOKEN=your_personal_access_token
# Optional tuning
export DATAHUB_TIMEOUT=60
export DATAHUB_DEFAULT_LIMIT=20
export DATAHUB_MAX_LIMIT=50
# Debug logging (for troubleshooting)
export DATAHUB_DEBUG=1
Multi-Server Configuration¶
Connect to multiple DataHub instances simultaneously. Useful for:
- Production and staging environments
- Multi-tenant deployments
- Cross-environment metadata comparison
Setting Up Multiple Servers¶
# Primary server configuration
export DATAHUB_URL=https://prod.datahub.example.com/api/graphql
export DATAHUB_TOKEN=prod-token
export DATAHUB_CONNECTION_NAME=prod # Optional: customize display name
# Additional servers as JSON
export DATAHUB_ADDITIONAL_SERVERS='{
"staging": {
"url": "https://staging.datahub.example.com/api/graphql",
"token": "staging-token"
},
"dev": {
"url": "https://dev.datahub.example.com/api/graphql"
}
}'
Additional Server Options¶
Each additional server can override these settings (inherits from primary if not specified):
| Field | Description |
|---|---|
url |
DataHub GMS URL (required) |
token |
Access token (inherits from primary) |
timeout |
Request timeout in seconds |
retry_max |
Maximum retry attempts |
default_limit |
Default search limit |
max_limit |
Maximum allowed limit |
max_lineage_depth |
Maximum lineage depth |
write_enabled |
Enable write operations (nil = inherit from primary) |
Using Multiple Servers¶
- Use
datahub_list_connectionsto see available connections - Pass the
connectionparameter to any tool to target a specific server - If
connectionis omitted, the default (primary) server is used
Getting a DataHub Token¶
- Log into DataHub
- Go to Settings > Access Tokens
- Generate a new token with appropriate permissions
- Copy the token value
Config File¶
As an alternative to environment variables, configure via YAML or JSON:
datahub:
url: https://datahub.example.com
token: "${DATAHUB_TOKEN}"
timeout: "30s"
connection_name: prod
write_enabled: true
toolkit:
default_limit: 20
max_limit: 50
descriptions:
datahub_search: "Custom search description for your deployment"
extensions:
logging: true
errors: true
Load with extensions.LoadConfig("config.yaml") when using as a library. Environment variables override file values for sensitive fields. Token values support $VAR / ${VAR} expansion.
See the configuration reference for all options.
Security Considerations¶
- Never commit tokens to version control
- Use environment variables or secret management
- Tokens should have minimal required permissions
- Rotate tokens periodically