Skip to content

PostgreSQL GenAI Monitoring Demo Guide

This document provides comprehensive guidance for demonstrating PostgreSQL with pgvector for GenAI applications on Azure, including common challenges and solutions.

This document outlines potential challenges and their solutions when monitoring PostgreSQL databases for GenAI applications on Azure. It represents a “friction log” to help developers avoid common pitfalls.

Pitfall: Improper connection pool sizing leads to connection exhaustion or underutilization.

Solution:

// Optimal connection pooling for vector operations
const connection = await createAzurePostgresConnection({
host: 'your-server.postgres.database.azure.com',
database: 'your-database',
username: 'your-username',
password: 'your-password',
// Configure based on workload patterns
maxPoolSize: 20, // Set higher for query-intensive workloads
minPoolSize: 5, // Keep some connections warm
idleTimeoutMs: 30000, // Recycle idle connections after 30 seconds
});

Monitoring Approach:

  • Track azure_postgres.pool.total, azure_postgres.pool.idle, and azure_postgres.pool.active metrics
  • Create alerts when active connections approach max (>80% utilization)
  • Monitor connection request failures

Pitfall: Missing or improperly configured pgvector extension in Azure PostgreSQL.

Solution:

-- Check if extension is installed
SELECT * FROM pg_extension WHERE extname = 'vector';
-- Install extension (requires appropriate permissions)
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify vector dimensions
SELECT typelem, typndims, typmod FROM pg_type WHERE typname = 'vector';
-- Set dimensions if needed (requires dropping and recreating tables with vector columns)
ALTER TYPE vector SET (DIMENSIONS = 1536);

Monitoring Approach:

  • Add startup validation in your application
  • Create CI/CD checks to verify extension presence before deployment
  • Track vector index size metrics to detect dimension-related issues

Pitfall: Authentication failures when using Azure Managed Identity with PostgreSQL.

Solution:

// Use Azure identity for authentication
const credential = new DefaultAzureCredential();
const token = await credential.getToken('https://ossrdbms-aad.database.windows.net/.default');
// Use token in connection
const pool = new Pool({
host: 'your-server.postgres.database.azure.com',
database: 'your-database',
user: 'username@servername',
password: token.token,
ssl: {
rejectUnauthorized: true,
},
});

Monitoring Approach:

  • Track authentication failures with azure_postgres.authentication.failures
  • Monitor token refresh operations
  • Set alerts for repeated authentication failures

Pitfall: Vector searches take too long, especially as database grows.

Solution:

-- Create appropriate index based on dataset size
-- For smaller datasets (<1M vectors) or faster indexing:
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- For larger datasets and better query performance:
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Set search parameters at query time
SET hnsw.ef_search = 100; -- Higher for better recall, lower for speed

Monitoring Approach:

  • Track database.vector.search.latency.avg and database.vector.search.latency.p95
  • Monitor index scan vs. sequential scan rates
  • Create dashboards showing correlation between dataset size and query time

2. Memory Pressure During Vector Operations

Section titled “2. Memory Pressure During Vector Operations”

Pitfall: Vector operations consume excessive memory, causing slowdowns or failures.

Solution:

-- Configure work_mem for vector operations
SET work_mem = '64MB';
-- Process vectors in batches in application code
const batchSize = 100;
for (let i = 0; i < vectors.length; i += batchSize) {
const batch = vectors.slice(i, i + batchSize);
await processVectorBatch(batch);
}

Monitoring Approach:

  • Monitor azure.postgresql_servers.memory_percent
  • Track query memory utilization with custom metrics
  • Set up alerts for memory pressure during vector operations

Pitfall: Default index settings perform poorly for specific workloads.

Solution:

// Benchmark different index configurations
const results = await benchmarkVectorIndexes([
{ type: 'ivfflat', params: { lists: 100 } },
{ type: 'ivfflat', params: { lists: 200 } },
{ type: 'hnsw', params: { m: 16, ef_construction: 64 } },
{ type: 'hnsw', params: { m: 32, ef_construction: 128 } }
]);
// Use benchmarking data to optimize index settings
console.log(`Best index configuration: ${results.bestConfiguration}`);
console.log(`Latency improvement: ${results.improvementFactor}x`);

Monitoring Approach:

  • Run periodic benchmarks with different configurations
  • Track recall rate for production queries
  • Monitor index size and build time metrics

Pitfall: Standard PostgreSQL monitoring doesn’t capture vector-specific performance data.

Solution:

-- Create function to expose vector metrics
CREATE OR REPLACE FUNCTION get_vector_metrics() RETURNS TABLE (
metric_name text,
metric_value numeric
) AS $$
BEGIN
RETURN QUERY
SELECT 'vector_count', COUNT(*)::numeric FROM items WHERE embedding IS NOT NULL
UNION ALL
SELECT 'vector_index_size', pg_relation_size('idx_vector_hnsw')::numeric
UNION ALL
SELECT 'avg_vector_search_time', mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%<=>%' AND calls > 10
ORDER BY mean_exec_time DESC
LIMIT 1;
END;
$$ LANGUAGE plpgsql;
-- Grant execute permissions to monitoring user
GRANT EXECUTE ON FUNCTION get_vector_metrics() TO datadog;

Monitoring Approach:

  • Create custom Datadog integration to collect these metrics
  • Build vector-specific dashboards using the dashboard template
  • Set alerts on vector operation performance degradation

2. Correlating AI Performance with Database Metrics

Section titled “2. Correlating AI Performance with Database Metrics”

Pitfall: Difficult to connect end-user AI experience with database performance.

Solution:

// Instrument RAG queries with end-to-end timing
async function performRagQuery(query, options) {
const startTime = Date.now();
// Generate embedding
const embeddingStart = Date.now();
const embedding = await generateEmbedding(query);
const embeddingTime = Date.now() - embeddingStart;
// Vector search
const searchStart = Date.now();
const similarDocs = await vectorDb.search(embedding, options);
const searchTime = Date.now() - searchStart;
// LLM response generation
const llmStart = Date.now();
const llmResponse = await generateResponse(query, similarDocs);
const llmTime = Date.now() - llmStart;
// Track all timings
const totalTime = Date.now() - startTime;
// Send to Datadog
metrics.gauge('genai.rag.latency.total', totalTime);
metrics.gauge('genai.rag.latency.embedding', embeddingTime);
metrics.gauge('genai.rag.latency.vector_search', searchTime);
metrics.gauge('genai.rag.latency.llm_response', llmTime);
return {
response: llmResponse,
metrics: {
totalTime,
embeddingTime,
searchTime,
llmTime
}
};
}

Monitoring Approach:

  • Create end-to-end RAG performance dashboard
  • Monitor each component separately to identify bottlenecks
  • Track correlation between vector search time and overall response time

Pitfall: Generic alerts don’t account for AI-specific performance patterns.

Solution:

// Vector search latency alert
{
"name": "Vector Search Latency Alert",
"type": "metric alert",
"query": "avg(last_5m):avg:database.vector.search.latency.p95{env:production} > 1000",
"message": "Vector search latency p95 exceeds 1000ms",
"tags": ["team:ai", "service:vector-db"],
"priority": "P2",
"options": {
"thresholds": {
"critical": 1000,
"warning": 500
},
"notify_no_data": false,
"notify_audit": false,
"include_tags": true
}
}

Monitoring Approach:

  • Set different thresholds for different vector operations
  • Create alerts based on relative performance degradation, not just absolute values
  • Include business impact context in alert messages

Pitfall: Changing vector dimensions requires complete reindexing and downtime.

Solution:

// Use migration script for zero-downtime dimension change
const migrationOptions = {
sourceTable: 'rag_chunks',
sourceColumn: 'embedding',
targetTable: 'rag_chunks_new',
targetColumn: 'embedding',
dimensions: 1536, // New dimensions
batchSize: 100,
withIndex: true,
indexType: 'hnsw',
monitor: true
};
// Run migration with monitoring
const result = await migrateVectorData(migrationOptions);
console.log(`Migration success: ${result.success}`);
console.log(`Processed ${result.processedRows} of ${result.totalRows} rows`);
console.log(`Migration time: ${result.duration} seconds`);

Monitoring Approach:

  • Track migration progress with custom metrics
  • Monitor both source and target tables during migration
  • Create alerts for migration failures or slowdowns

Pitfall: Rebuilding vector indexes causes performance degradation.

Solution:

-- Optimize index building
SET maintenance_work_mem = '1GB';
SET max_parallel_maintenance_workers = 4;
-- Use CONCURRENTLY option when possible
CREATE INDEX CONCURRENTLY idx_vector_hnsw
ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Monitoring Approach:

  • Schedule index rebuilds during low-traffic periods
  • Monitor index creation time and resource usage
  • Track query performance before, during, and after index creation

Pitfall: Vector performance issues aren’t detected until production.

Solution:

# GitHub Actions workflow for testing vector performance
name: Test Vector Performance
on:
pull_request:
paths:
- 'src/lib/vector-db/**'
- 'prisma/schema.prisma'
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run vector benchmark
run: node scripts/benchmark-vector-search.js --queries 50
- name: Compare to baseline
run: node scripts/compare-benchmark-results.js

Monitoring Approach:

  • Store benchmark results as baselines
  • Track performance trends over time
  • Alert on significant performance regressions

Pitfall: Manual dashboard and alert setup leads to inconsistencies.

Solution:

// Deploy Datadog resources programmatically
async function deployMonitoring(environment) {
// Deploy dashboards
await deployDashboard('monitoring/dashboards/genai-vector-performance.json', {
environment,
service: 'vibecode-vector'
});
// Deploy monitors
await deployMonitors('monitoring/alerts', {
environment,
service: 'vibecode-vector'
});
// Verify deployment
const status = await verifyMonitoringDeployment(environment);
console.log(`Monitoring deployment status: ${status.success ? 'Success' : 'Failed'}`);
}

Monitoring Approach:

  • Version control all monitoring resources
  • Automate deployment with CI/CD
  • Test monitoring effectiveness with synthetic issues

1. Correlating Database Performance with User Satisfaction

Section titled “1. Correlating Database Performance with User Satisfaction”

Pitfall: Difficult to connect database metrics to actual user experience.

Solution:

// Instrument frontend for user experience metrics
function trackRagQueryUserExperience(queryId, ragResult) {
// Track user interaction with results
sendToDatadog({
'genai.rag.user_experience': {
'query_id': queryId,
'response_time': ragResult.metrics.totalTime,
'vector_search_time': ragResult.metrics.searchTime,
'result_count': ragResult.similarDocs.length,
'user_rating': getUserRatingFromUI() // 1-5 star rating
}
});
}

Monitoring Approach:

  • Create dashboards showing correlation between backend performance and user ratings
  • Set up alerts based on user experience degradation
  • Track changes in user behavior following database changes

Pitfall: Aggregate metrics hide individual poor-performing queries.

Solution:

// Track individual query performance
async function monitorIndividualQueries() {
const client = await pool.connect();
try {
const slowQueries = await client.query(`
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%<=>%' AND mean_exec_time > 500
ORDER BY mean_exec_time DESC
LIMIT 10
`);
// Send to Datadog
for (const q of slowQueries.rows) {
dog.gauge('database.vector.slow_queries', 1, [
`query_hash:${calculateQueryHash(q.query)}`,
`exec_time:${q.mean_exec_time}`,
`calls:${q.calls}`
]);
}
} finally {
client.release();
}
}

Monitoring Approach:

  • Create top list of slowest vector queries
  • Track specific vector query patterns
  • Set up automatic detection of new slow query patterns

The Big Picture: End-to-End GenAI Monitoring

Section titled “The Big Picture: End-to-End GenAI Monitoring”
  1. Data Collection Layers:

    • Infrastructure metrics (Azure VM, PostgreSQL)
    • Database metrics (connections, queries, vector operations)
    • Application metrics (embedding generation, vector searches)
    • End-user experience metrics (response time, relevance ratings)
  2. Correlation and Analysis:

    • Trace end-to-end request flow from user to database and back
    • Identify bottlenecks and optimization opportunities
    • Perform anomaly detection for GenAI-specific patterns
  3. Visualization and Alerting:

    • Multi-level dashboards (executive, operations, developer)
    • Context-aware alerts with actionable information
    • Automated remediation for common issues
// Complete RAG monitoring implementation
class RagMonitoring {
constructor(options) {
this.options = options;
this.dog = options.datadogClient;
}
async trackEndToEndQuery(query, userId, sessionId) {
const traceId = uuidv4();
const startTime = Date.now();
try {
// Start span for entire operation
this.dog.startSpan('rag.query', {
traceId,
tags: { userId, sessionId, query_length: query.length }
});
// Embedding generation
const embeddingStart = Date.now();
this.dog.startSpan('rag.embedding', { traceId });
const embedding = await this.generateEmbedding(query);
this.dog.finishSpan('rag.embedding', {
duration: Date.now() - embeddingStart,
dimensions: embedding.length
});
// Vector search
const searchStart = Date.now();
this.dog.startSpan('rag.vector_search', { traceId });
const searchResults = await this.vectorSearch(embedding);
const searchDuration = Date.now() - searchStart;
this.dog.finishSpan('rag.vector_search', {
duration: searchDuration,
results: searchResults.length,
avg_relevance: this.calculateAvgRelevance(searchResults)
});
// Document processing
const processingStart = Date.now();
this.dog.startSpan('rag.document_processing', { traceId });
const processedContext = this.processDocuments(searchResults);
this.dog.finishSpan('rag.document_processing', {
duration: Date.now() - processingStart,
context_length: processedContext.length
});
// LLM response generation
const llmStart = Date.now();
this.dog.startSpan('rag.llm_response', { traceId });
const response = await this.generateResponse(query, processedContext);
this.dog.finishSpan('rag.llm_response', {
duration: Date.now() - llmStart,
response_length: response.length
});
// End main span
const totalDuration = Date.now() - startTime;
this.dog.finishSpan('rag.query', {
duration: totalDuration,
success: true
});
// Record user feedback when available
this.trackUserFeedback(traceId, response);
return {
response,
metrics: {
traceId,
totalDuration,
vectorSearchDuration: searchDuration
}
};
} catch (error) {
// Record error
this.dog.finishSpan('rag.query', {
duration: Date.now() - startTime,
success: false,
error: error.message
});
throw error;
}
}
// Additional methods for monitoring user feedback, etc.
}

Monitoring GenAI applications on Azure PostgreSQL requires:

  1. Understanding the unique challenges of vector operations
  2. Implementing specialized metrics for GenAI workloads
  3. Creating comprehensive dashboards that show the full picture
  4. Setting up smart alerts that catch real issues without false positives
  5. Correlating database performance with user experience

With these principles in place, you’ll have robust observability for your AI-powered applications and can quickly identify and resolve issues before they impact users.