Skip to content

Deploying Azure OpenAI Monitoring

This guide explains how to deploy and configure Azure OpenAI monitoring using Datadog.

  1. Terraform >= 1.0
  2. Datadog API and App keys with appropriate permissions
  3. Azure CLI installed and configured
  4. Access to the Azure subscription with OpenAI service
Terminal window
cd infrastructure
terraform init

Create a terraform.tfvars file with your configuration:

environment = "production"
datadog_api_key = "your-datadog-api-key"
datadog_app_key = "your-datadog-app-key"
# Optional: Customize alert thresholds
error_rate_threshold = 5.0 # percentage
latency_threshold_ms = 1000 # milliseconds
Terminal window
terraform plan -target=module.azure_openai_monitoring
Terminal window
terraform apply -target=module.azure_openai_monitoring
  1. Log in to your Datadog dashboard
  2. Navigate to “Dashboards” and look for “[ENV] Azure OpenAI - Vibecode-AI - Overview”
  3. Check the “Monitors” section for the newly created alerts

Customize alert thresholds in infrastructure/monitoring/azure_openai.tf:

module "azure_openai_monitoring" {
# ... existing configuration ...
# Alert thresholds
error_rate_threshold = 5.0 # 5% error rate
latency_threshold_ms = 1000 # 1 second
# ... rest of the configuration ...
}

Update the Slack channel in infrastructure/monitoring/azure_openai.tf:

slack_channel = var.environment == "production" ? "#alerts-ai" : "#alerts-dev"

The dashboard includes the following widgets:

  1. API Requests: Shows request volume by status code
  2. Error Rate: Displays the percentage of failed requests
  3. Token Usage: Tracks token consumption by type

Two main alerts are configured:

  1. High Error Rate: Triggers when error rate exceeds the threshold
  2. High Latency: Triggers when response time exceeds the threshold

If metrics are not appearing in Datadog:

  1. Verify the Azure integration is properly configured in Datadog
  2. Check the Datadog agent logs for connection issues:
    Terminal window
    kubectl logs -l app=datadog-agent -n datadog
  3. Ensure the service name matches in both the module and your application

If you’re getting false positive alerts:

  1. Adjust the threshold values in the module configuration
  2. Modify the evaluation window for alerts
  3. Add additional filters to the alert queries

To update the monitoring configuration:

  1. Make your changes in the module
  2. Run terraform plan to review changes
  3. Apply with terraform apply

To completely remove the monitoring:

Terminal window
terraform destroy -target=module.azure_openai_monitoring

For issues with the monitoring setup, contact the AI Engineering team or open an issue in the repository.