Comprehensive monitoring of AI-powered applications with Datadog
Core Functionality: The "Furnish Hub" AI assistant handles customer inquiries including:
Technical Stack:
Key Observability Challenges Demonstrated:
The Non-Deterministic Nature: Unlike traditional software, LLMs are inherently probabilistic. The same prompt can generate different responses, requiring statistical performance baselines rather than exact-match testing.
The "Black Box" Problem: Multi-billion parameter LLMs have opaque internal reasoning, making semantic failures (hallucinations, bias propagation, off-topic responses) more critical than technical exceptions.
Vast Input Space: Natural language inputs create an effectively infinite, unstructured space that opens doors to unique vulnerabilities like prompt injection attacks.
LLM observability forces convergence between Data Science, DevOps, and Security teams:
An observability platform becomes the common ground, providing shared language and unified data views that enable collaborative, cross-functional AI application management.
Shift from reactive troubleshooting to proactive optimization through performance baselines and deviation monitoring.
Granular insights for identifying inefficient prompt patterns and balancing model complexity with performance requirements.
Connecting technical metrics to user outcomes (feedback scores, task completion rates) to focus optimization efforts.
Comprehensive audit trails, systematic detection of harmful outputs, and protection against security risks like prompt injection.
Complete LLM chain visualization with detailed metadata, input-output data, errors, latency, and token usage.
Pre-built dashboards for immediate operational metrics across all major LLM providers (OpenAI, Anthropic, Amazon Bedrock, Google Vertex AI).
Automatic quality checks (failure to answer, off-topic responses, negative sentiment) plus custom evaluation capabilities.
Built-in PII detection and redaction using Datadog Sensitive Data Scanner, plus prompt injection detection.
Semantic clustering to identify systemic issues and performance drifts by grouping similar low-quality interactions.
LLM traces integrated with traditional APM traces, enabling complete request flow visibility from browser clicks through backend services to LLM calls.
Full integration of observability's three pillars, allowing immediate correlation between LLM traces, application logs, and infrastructure metrics.
Unlike standalone LLM tools that create observability silos, Datadog treats LLMs as first-class citizens within the broader application architecture.
API Keys Required:
The Magic Command:
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
DD_LLMOBS_ENABLED=1 \
DD_API_KEY="<YOUR_DATADOG_API_KEY>" \
DD_LLMOBS_ML_APP="furnish-hub-support-ai" \
DD_SITE="datadoghq.com" \
DD_LLMOBS_AGENTLESS_ENABLED=1 \
ddtrace-run python main.py
from ddtrace import tracer
span = tracer.current_span()
if span:
span.set_tag("customer.id", user_id)
span.set_tag("session.id", session_id)
span.set_tag("user.subscription_tier", "premium")
The journey from experimental AI to enterprise-grade AI requires operational excellence. LLM observability transforms the opaque "black box" of neural networks into a transparent, manageable "glass box" that enables:
For organizations deploying AI at scale, comprehensive LLM observability is not just a best practiceβit's the foundation upon which the future of AI operations will be built.
Current Configuration:
Loading configuration...
Add parameters to the URL to override configuration:
?dd_client_token=pub_xxx&dd_app_id=xxx&dd_site=datadoghq.com
Examples:
?dd_client_token=pub_xxx&dd_app_id=xxx
- Override tokens?dd_site=datadoghq.eu
- Use EU site?dd_env=staging&dd_version=2.0.0
- Set environmentValues are automatically saved to localStorage when using URL parameters.
Uses safe, publicly exposed demo values if no overrides are provided.