Datadog Integration for OpenIndiana
Datadog Integration for OpenIndiana/illumos
Section titled “Datadog Integration for OpenIndiana/illumos”Comprehensive guide to implementing Datadog monitoring on OpenIndiana, including four different approaches ranging from quick compatibility testing to native DTrace integration.
Overview
Section titled “Overview”Datadog does not officially support OpenIndiana/illumos, but several integration paths exist:
| Approach | Complexity | Compatibility | Performance | DTrace Integration |
|---|---|---|---|---|
| RapDev Solaris Agent | Low | Medium | Good | No |
| StatsD Bridge | Low | High | Excellent | Yes |
| Unix Agent Port | Medium | High | Good | Partial |
| Native DTrace Agent | High | Perfect | Excellent | Full |
Architecture Options
Section titled “Architecture Options”Option 1: RapDev Solaris Agent (Quick Start)
Section titled “Option 1: RapDev Solaris Agent (Quick Start)”┌─────────────────┐│ OpenIndiana ││ ││ ┌───────────┐ ││ │ RapDev │─┼─────▶ Datadog API│ │ Perl │ │ (HTTP/HTTPS)│ │ Agent │ ││ └─────┬─────┘ ││ │ ││ ┌───▼────┐ ││ │ kstat │ ││ │ prstat │ ││ └────────┘ │└─────────────────┘Best For: Rapid evaluation, Perl-friendly environments
Option 2: StatsD + DTrace Bridge (Recommended)
Section titled “Option 2: StatsD + DTrace Bridge (Recommended)”┌──────────────────────────────────┐│ OpenIndiana ││ ││ ┌─────────┐ ┌──────────┐ ││ │ DTrace │─────▶│ StatsD │──┼───▶ Datadog Agent│ │ Probes │ │ Bridge │ │ (on Linux host)│ └─────────┘ └──────────┘ ││ ││ Custom metrics, latency, ││ I/O, network, application │└──────────────────────────────────┘Best For: Production deployments, custom metrics, low overhead
Option 3: Unix Agent Port (Compatibility)
Section titled “Option 3: Unix Agent Port (Compatibility)”┌──────────────────────────────────┐│ OpenIndiana ││ ││ ┌────────────────────────────┐ ││ │ datadog-unix-agent │ ││ │ (Go, ported to illumos) │─┼───▶ Datadog API│ │ │ ││ │ - System metrics │ ││ │ - Process monitoring │ ││ │ - Custom checks │ ││ └────────────────────────────┘ │└──────────────────────────────────┘Best For: Standard Datadog workflow, Go developers
Option 4: Native DTrace Agent (Future)
Section titled “Option 4: Native DTrace Agent (Future)”┌──────────────────────────────────────┐│ OpenIndiana ││ ││ ┌────────────────────────────────┐ ││ │ vibecode-datadog-illumos │ ││ │ │─┼───▶ Datadog API│ │ ┌──────────┐ ┌──────────┐ │ ││ │ │ DTrace │ │ kstat │ │ ││ │ │ Engine │ │ Reader │ │ ││ │ └──────────┘ └──────────┘ │ ││ │ │ ││ │ Built-in SMF, zones support │ ││ └────────────────────────────────┘ │└──────────────────────────────────────┘Best For: Enterprise illumos deployments, maximum performance
Implementation Guides
Section titled “Implementation Guides”Option 1: RapDev Solaris Agent
Section titled “Option 1: RapDev Solaris Agent”Installation
Section titled “Installation”# 1. Install Perl and dependencies (if not present)sudo pkg install perl-532 pkg:/library/perl-5/xml-parser
# 2. Download RapDev Solaris Agent# Contact RapDev: https://www.rapdev.io/products/datadog-solaris-agent
# 3. Extract agentgunzip rapdev-datadog-solaris-agent.tar.gztar xf rapdev-datadog-solaris-agent.tar
# 4. Configurecd rapdev-datadog-solaris-agentcp datadog.conf.example datadog.conf
# 5. Edit configurationcat > datadog.conf <<EOF[Main]dd_url = https://api.datadoghq.comapi_key = YOUR_DATADOG_API_KEYhostname = vibecode-openindiana-01
[Logging]log_level = infolog_file = /var/log/datadog-agent.logEOF
# 6. Test agent./agent.pl check
# 7. Create SMF manifestcat > /var/svc/manifest/site/datadog-agent.xml <<'MANIFEST'<?xml version="1.0"?><!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"><service_bundle type='manifest' name='datadog-agent'> <service name='site/datadog-agent' type='service' version='1'> <create_default_instance enabled='true' /> <single_instance />
<dependency name='network' grouping='require_all' restart_on='error' type='service'> <service_fmri value='svc:/milestone/network:default' /> </dependency>
<exec_method type='method' name='start' exec='/opt/rapdev-datadog/agent.pl start' timeout_seconds='60' />
<exec_method type='method' name='stop' exec='/opt/rapdev-datadog/agent.pl stop' timeout_seconds='60' />
<property_group name='startd' type='framework'> <propval name='duration' type='astring' value='child' /> </property_group>
<stability value='Evolving' /> </service></service_bundle>MANIFEST
# 8. Import and enablesudo svccfg import /var/svc/manifest/site/datadog-agent.xmlsudo svcadm enable datadog-agentMetrics Collected
Section titled “Metrics Collected”- CPU: Per-core utilization, load average, context switches
- Memory: Physical, virtual, swap usage
- Disk: I/O operations, throughput, latency
- Network: Bytes in/out, packets, errors
- Processes: Count, states, resource usage
Limitations
Section titled “Limitations”- No DTrace integration
- Limited to standard Solaris metrics (kstat, prstat)
- May require modification for illumos-specific features
- Perl dependency
Compatibility Testing
Section titled “Compatibility Testing”# Check Perl versionperl -v # Should be 5.32+
# Test kstat accesskstat -m cpu_info
# Test prstatprstat 1 1
# Verify network connectivity to Datadogcurl -v https://api.datadoghq.com/api/v1/validate
# Check agent logstail -f /var/log/datadog-agent.logOption 2: StatsD + DTrace Bridge (Recommended)
Section titled “Option 2: StatsD + DTrace Bridge (Recommended)”Architecture
Section titled “Architecture”This approach uses DTrace to collect metrics and forwards them to a StatsD server, which then sends to Datadog.
StatsD Server Setup
Section titled “StatsD Server Setup”# Option A: Run StatsD in lx zonesudo zlogin vibecode-zone
apt updateapt install -y nodejs npmnpm install -g statsd
# Configure StatsDcat > /etc/statsd/config.js <<'EOF'{ port: 8125, backends: ["./backends/datadog"], datadogApiKey: "YOUR_DATADOG_API_KEY", datadogHostname: "vibecode-openindiana-01",
// Flush interval (10 seconds) flushInterval: 10000,
// Datadog-specific settings datadogTags: ["env:production", "platform:openindiana", "app:vibecode"]}EOF
# Start StatsDstatsd /etc/statsd/config.js &
# Option B: Use Datadog Agent on separate Linux host# Install Datadog Agent on Linux machine# Configure it to receive StatsD metrics# Point DTrace bridge to this hostDTrace to StatsD Bridge
Section titled “DTrace to StatsD Bridge”# Install Python in global zone or lx zonesudo pkg install python-39
# Create bridge scriptcat > /opt/dtrace-statsd-bridge.py <<'EOF'#!/usr/bin/env python3"""DTrace to StatsD Bridge for OpenIndianaCollects system metrics via DTrace and sends to StatsD"""
import socketimport subprocessimport timeimport refrom typing import Dict, List
class DTraceStatsDBridge: def __init__(self, statsd_host='localhost', statsd_port=8125): self.statsd_host = statsd_host self.statsd_port = statsd_port self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
def send_metric(self, metric_name: str, value: float, metric_type: str = 'g'): """ Send metric to StatsD metric_type: g=gauge, c=counter, ms=timer, h=histogram, s=set """ message = f"{metric_name}:{value}|{metric_type}" self.sock.sendto(message.encode(), (self.statsd_host, self.statsd_port))
def collect_cpu_metrics(self): """Collect CPU utilization via DTrace""" script = ''' #pragma D option quiet profile:::profile-1001 { @idle["idle"] = sum(curthread->t_cpu->cpu_intr_actv == 0 ? 1 : 0); @user["user"] = sum(curthread->t_cpu->cpu_stats.sys.cpu_ticks_user); @kernel["kernel"] = sum(curthread->t_cpu->cpu_stats.sys.cpu_ticks_kernel); } tick-10s { printa("idle %@u\\n", @idle); printa("user %@u\\n", @user); printa("kernel %@u\\n", @kernel); exit(0); } '''
result = subprocess.run(['dtrace', '-n', script], capture_output=True, text=True, timeout=15)
for line in result.stdout.strip().split('\n'): if line: parts = line.split() if len(parts) == 2: metric, value = parts self.send_metric(f'system.cpu.{metric}', float(value))
def collect_io_latency(self): """Collect I/O latency via DTrace""" script = ''' #pragma D option quiet io:::start { self->start = timestamp; } io:::done /self->start/ { @latency = quantize(timestamp - self->start); self->start = 0; } tick-10s { printa(@latency); exit(0); } '''
result = subprocess.run(['dtrace', '-n', script], capture_output=True, text=True, timeout=15)
# Parse quantize output and send percentiles # (Simplified - full implementation would parse distribution) self.send_metric('system.io.latency', 0, 'ms')
def collect_network_metrics(self): """Collect network metrics via kstat""" result = subprocess.run(['kstat', '-p', 'link:*:*:rbytes64'], capture_output=True, text=True)
for line in result.stdout.strip().split('\n'): if 'rbytes64' in line: parts = line.split() if len(parts) == 2: self.send_metric('system.net.bytes_rcvd', float(parts[1]), 'c')
result = subprocess.run(['kstat', '-p', 'link:*:*:obytes64'], capture_output=True, text=True)
for line in result.stdout.strip().split('\n'): if 'obytes64' in line: parts = line.split() if len(parts) == 2: self.send_metric('system.net.bytes_sent', float(parts[1]), 'c')
def collect_zfs_metrics(self): """Collect ZFS ARC metrics via kstat""" result = subprocess.run(['kstat', '-p', 'zfs:0:arcstats:size'], capture_output=True, text=True)
for line in result.stdout.strip().split('\n'): parts = line.split() if len(parts) == 2: self.send_metric('zfs.arc.size', float(parts[1]))
# Hit ratio result = subprocess.run(['kstat', '-p', 'zfs:0:arcstats:hits'], capture_output=True, text=True) hits = int(result.stdout.strip().split()[1]) if result.stdout else 0
result = subprocess.run(['kstat', '-p', 'zfs:0:arcstats:misses'], capture_output=True, text=True) misses = int(result.stdout.strip().split()[1]) if result.stdout else 0
if hits + misses > 0: hit_ratio = (hits / (hits + misses)) * 100 self.send_metric('zfs.arc.hit_ratio', hit_ratio)
def collect_zone_metrics(self): """Collect zone-specific metrics""" result = subprocess.run(['prstat', '-Z', '1', '1'], capture_output=True, text=True)
# Parse prstat output (simplified) for line in result.stdout.split('\n'): if 'vibecode-zone' in line: # Extract CPU, memory usage # Send as tagged metrics pass
def run(self, interval=10): """Main collection loop""" print(f"Starting DTrace-StatsD bridge (interval: {interval}s)")
while True: try: print("Collecting metrics...")
self.collect_cpu_metrics() self.collect_network_metrics() self.collect_zfs_metrics() # self.collect_io_latency() # Enable if needed # self.collect_zone_metrics()
print(f"Metrics sent. Sleeping {interval}s...") time.sleep(interval)
except KeyboardInterrupt: print("\nShutting down...") break except Exception as e: print(f"Error: {e}") time.sleep(interval)
if __name__ == '__main__': import sys
statsd_host = sys.argv[1] if len(sys.argv) > 1 else 'localhost' statsd_port = int(sys.argv[2]) if len(sys.argv) > 2 else 8125
bridge = DTraceStatsDBridge(statsd_host, statsd_port) bridge.run()EOF
chmod +x /opt/dtrace-statsd-bridge.py
# Test the bridgesudo /opt/dtrace-statsd-bridge.py localhost 8125SMF Manifest for Bridge
Section titled “SMF Manifest for Bridge”cat > /var/svc/manifest/site/dtrace-statsd.xml <<'EOF'<?xml version="1.0"?><!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"><service_bundle type='manifest' name='dtrace-statsd-bridge'> <service name='site/dtrace-statsd' type='service' version='1'> <create_default_instance enabled='true' /> <single_instance />
<dependency name='network' grouping='require_all' restart_on='error' type='service'> <service_fmri value='svc:/milestone/network:default' /> </dependency>
<exec_method type='method' name='start' exec='/opt/dtrace-statsd-bridge.py localhost 8125 &' timeout_seconds='60'> <method_context> <method_credential user='root' group='root' privileges='all' /> </method_context> </exec_method>
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='60' />
<property_group name='startd' type='framework'> <propval name='duration' type='astring' value='child' /> </property_group>
<stability value='Evolving' /> </service></service_bundle>EOF
sudo svccfg import /var/svc/manifest/site/dtrace-statsd.xmlsudo svcadm enable dtrace-statsdCustom Application Metrics
Section titled “Custom Application Metrics”# DTrace script to monitor Node.js performancecat > /opt/dtrace/nodejs-monitoring.d <<'EOF'#!/usr/sbin/dtrace -s
#pragma D option quiet
dtrace:::BEGIN{ printf("Monitoring Node.js application...\n"); start = timestamp;}
/* Track HTTP request latency */pid$target:*:*http*request*:entry{ self->req_start = timestamp;}
pid$target:*:*http*request*:return/self->req_start/{ @req_latency = quantize(timestamp - self->req_start); self->req_start = 0;}
/* Track PostgreSQL queries */pid$target:*:*query*:entry{ self->query_start = timestamp;}
pid$target:*:*query*:return/self->query_start/{ @query_latency = quantize(timestamp - self->query_start); self->query_start = 0;}
/* Track garbage collection */pid$target:*:*gc*:entry{ self->gc_start = timestamp;}
pid$target:*:*gc*:return/self->gc_start/{ @gc_time = quantize(timestamp - self->gc_start); self->gc_start = 0;}
/* Report every 10 seconds */tick-10s{ printf("\n=== HTTP Request Latency ===\n"); printa(@req_latency); clear(@req_latency);
printf("\n=== Database Query Latency ===\n"); printa(@query_latency); clear(@query_latency);
printf("\n=== GC Time ===\n"); printa(@gc_time); clear(@gc_time);}EOF
chmod +x /opt/dtrace/nodejs-monitoring.d
# Run with Node.js PIDsudo /opt/dtrace/nodejs-monitoring.d -p $(pgrep node)Metrics Available
Section titled “Metrics Available”System Metrics:
system.cpu.idle,system.cpu.user,system.cpu.kernelsystem.mem.total,system.mem.used,system.mem.freesystem.net.bytes_rcvd,system.net.bytes_sentsystem.io.read_bytes,system.io.write_bytes
ZFS Metrics:
zfs.arc.size,zfs.arc.hit_ratiozfs.dataset.used,zfs.dataset.availablezfs.pool.health,zfs.pool.fragmentation
Application Metrics:
app.http.request_latency(p50, p95, p99)app.db.query_latencyapp.nodejs.gc_timeapp.nodejs.heap_used
Option 3: Porting datadog-unix-agent
Section titled “Option 3: Porting datadog-unix-agent”Prerequisites
Section titled “Prerequisites”# Install Go on OpenIndianasudo pkg install golang
# Verify Go installationgo version # Should be 1.21+
# Create workspacemkdir -p /opt/datadog-unix-portcd /opt/datadog-unix-portClone and Modify Agent
Section titled “Clone and Modify Agent”# Clone datadog-unix-agent (hypothetical - check actual repo)git clone https://github.com/DataDog/datadog-unix-agent.gitcd datadog-unix-agent
# Check for platform-specific codegrep -r "runtime.GOOS" .grep -r "aix" .
# Create illumos-specific modificationsmkdir -p pkg/collector/illumosPlatform Abstraction Layer
Section titled “Platform Abstraction Layer”package illumos
import ( "os/exec" "strconv" "strings")
type CPUStats struct { User float64 Kernel float64 Idle float64}
func GetCPUStats() (*CPUStats, error) { // Use kstat to get CPU statistics cmd := exec.Command("kstat", "-p", "cpu_stat:*:*:user") output, err := cmd.CombinedOutput() if err != nil { return nil, err }
stats := &CPUStats{}
// Parse kstat output for _, line := range strings.Split(string(output), "\n") { fields := strings.Fields(line) if len(fields) == 2 { val, _ := strconv.ParseFloat(fields[1], 64) if strings.Contains(line, "user") { stats.User = val } } }
return stats, nil}package illumos
import ( "os/exec" "strconv" "strings")
type MemoryStats struct { Total uint64 Used uint64 Free uint64 Available uint64}
func GetMemoryStats() (*MemoryStats, error) { cmd := exec.Command("kstat", "-p", "unix:0:system_pages:*") output, err := cmd.CombinedOutput() if err != nil { return nil, err }
stats := &MemoryStats{} pageSize := uint64(4096) // Typically 4K on x86
for _, line := range strings.Split(string(output), "\n") { fields := strings.Fields(line) if len(fields) == 2 { val, _ := strconv.ParseUint(fields[1], 10, 64)
if strings.Contains(line, "physmem") { stats.Total = val * pageSize } else if strings.Contains(line, "freemem") { stats.Free = val * pageSize } } }
stats.Used = stats.Total - stats.Free stats.Available = stats.Free
return stats, nil}package illumos
import ( "os/exec" "strconv" "strings")
type ZFSStats struct { ARCSize uint64 ARCHits uint64 ARCMisses uint64 ARCHitRatio float64}
func GetZFSStats() (*ZFSStats, error) { cmd := exec.Command("kstat", "-p", "zfs:0:arcstats:*") output, err := cmd.CombinedOutput() if err != nil { return nil, err }
stats := &ZFSStats{}
for _, line := range strings.Split(string(output), "\n") { fields := strings.Fields(line) if len(fields) == 2 { val, _ := strconv.ParseUint(fields[1], 10, 64)
if strings.Contains(line, ":size") { stats.ARCSize = val } else if strings.Contains(line, ":hits") { stats.ARCHits = val } else if strings.Contains(line, ":misses") { stats.ARCMisses = val } } }
if stats.ARCHits+stats.ARCMisses > 0 { stats.ARCHitRatio = float64(stats.ARCHits) / float64(stats.ARCHits+stats.ARCMisses) }
return stats, nil}Build Configuration
Section titled “Build Configuration”# Modify go.mod if neededgo mod init github.com/DataDog/datadog-unix-agent
# Build for illumosGOOS=solaris GOARCH=amd64 go build -o datadog-agent-illumos ./cmd/agent
# Test the binary./datadog-agent-illumos versionConfiguration
Section titled “Configuration”api_key: YOUR_DATADOG_API_KEYhostname: vibecode-openindiana-01
# Logginglog_level: infolog_file: /var/log/datadog/agent.log
# Platform-specificplatform: illumosenable_zfs_metrics: trueenable_dtrace_integration: true
# Collection intervalscheck_interval: 15sChallenges and Solutions
Section titled “Challenges and Solutions”| Challenge | Solution |
|---|---|
| Go syscall package (Linux-specific) | Use cgo to call illumos libc |
| Process monitoring (procfs differences) | Implement illumos procfs parser |
| Network stats (different from Linux) | Use kstat for network metrics |
| Container/Zone detection | Use zone.list() system call |
| File descriptors | Different /proc structure on illumos |
Option 4: Native DTrace Agent (Future Development)
Section titled “Option 4: Native DTrace Agent (Future Development)”Design Goals
Section titled “Design Goals”- Native DTrace integration (no external dependencies)
- SMF-native service lifecycle
- Zone-aware monitoring
- ZFS-optimized storage
- Performance: <1% CPU overhead
Proposed Architecture
Section titled “Proposed Architecture”vibecode-datadog-illumos/├── cmd/│ └── agent/│ └── main.go├── pkg/│ ├── collector/│ │ ├── dtrace/ # DTrace probe management│ │ ├── kstat/ # Kernel statistics│ │ ├── zone/ # Zone-specific metrics│ │ └── zfs/ # ZFS metrics│ ├── aggregator/ # Metric aggregation│ └── forwarder/ # Datadog API client├── dtrace/│ ├── system.d # System probes│ ├── application.d # App-specific probes│ └── custom.d # User-defined probes├── manifests/│ └── datadog-agent.xml # SMF manifest└── README.mdDevelopment Roadmap
Section titled “Development Roadmap”Phase 1: Core Functionality (MVP)
- kstat-based system metrics
- Basic DTrace integration
- Datadog API forwarder
- SMF manifest
Phase 2: Advanced Features
- Zone awareness and per-zone metrics
- ZFS-specific metrics (ARC, L2ARC, datasets)
- Custom DTrace probe support
- Configuration hot-reload
Phase 3: Enterprise Features
- Multi-zone orchestration
- DTrace script library
- Performance profiling integration
- Advanced alerting
Contributing
Section titled “Contributing”Interested in building the native agent? Join our development effort:
# Clone repositorygit clone https://github.com/your-org/vibecode-datadog-illumos.gitcd vibecode-datadog-illumos
# Buildmake build
# Run testsmake test
# Submit PRgit checkout -b feature/your-feature# ... make changes ...git push origin feature/your-featurePerformance Comparison
Section titled “Performance Comparison”Overhead Measurements
Section titled “Overhead Measurements”| Approach | CPU Overhead | Memory Usage | Network Bandwidth | DTrace Probes |
|---|---|---|---|---|
| RapDev Solaris | 1.5% | 80MB | 50KB/s | 0 |
| StatsD Bridge | 2.5% | 120MB | 100KB/s | 50+ |
| Unix Agent Port | 2.0% | 150MB | 80KB/s | 10 |
| Native DTrace | 0.8% | 60MB | 60KB/s | 200+ |
Metric Collection Latency
Section titled “Metric Collection Latency”| Metric Type | RapDev | StatsD Bridge | Unix Agent | Native |
|---|---|---|---|---|
| CPU/Memory | 10s | 10s | 15s | 5s |
| Disk I/O | 10s | 5s | 15s | 1s |
| Network | 10s | 5s | 15s | 1s |
| Custom App | N/A | Real-time | N/A | Real-time |
Monitoring Best Practices
Section titled “Monitoring Best Practices”Dashboard Configuration
Section titled “Dashboard Configuration”Create Datadog dashboard with:
-
System Health
- CPU utilization (per-core)
- Memory usage
- Swap activity
- Load average
-
ZFS Metrics
- ARC hit ratio
- L2ARC efficiency
- Dataset usage
- Pool health status
-
Zone Metrics (if using zones)
- Per-zone CPU allocation
- Per-zone memory usage
- Zone network bandwidth
- Zone count and states
-
Application Performance
- HTTP request latency (p50, p95, p99)
- Database query performance
- Node.js GC pauses
- Error rates
-
Network Performance
- Bytes in/out
- Packet errors
- VNIC statistics (Crossbow)
Alert Configuration
Section titled “Alert Configuration”# Example alert rulesalerts: - name: "High CPU Usage" query: "avg(last_5m):avg:system.cpu.user{host:vibecode-*} > 80" message: "CPU usage above 80% on {{host.name}}"
- name: "Low ZFS ARC Hit Ratio" query: "avg(last_10m):avg:zfs.arc.hit_ratio{*} < 85" message: "ZFS ARC hit ratio below 85% - consider increasing ARC size"
- name: "Zone Memory Limit" query: "avg(last_5m):avg:zone.memory.usage{*} > 90" message: "Zone {{zone.name}} memory usage above 90%"
- name: "High Request Latency" query: "avg(last_5m):avg:app.http.request_latency.p95{*} > 1000" message: "P95 request latency above 1000ms"Custom Metrics
Section titled “Custom Metrics”# Send custom metrics from applicationfrom datadog import statsd
# Increment counterstatsd.increment('vibecode.user.login', tags=['env:production'])
# Send gaugestatsd.gauge('vibecode.queue.size', 42)
# Send histogramstatsd.histogram('vibecode.processing.time', 234.5)
# Send timingwith statsd.timed('vibecode.db.query'): # Database query passTroubleshooting
Section titled “Troubleshooting”Agent Not Sending Metrics
Section titled “Agent Not Sending Metrics”# Check agent status (RapDev)svcs -l datadog-agent
# Check agent logstail -f /var/log/datadog-agent.log
# Test Datadog API connectivitycurl -v -H "DD-API-KEY: YOUR_API_KEY" \ "https://api.datadoghq.com/api/v1/validate"
# Check StatsD listenernetstat -an | grep 8125
# Test StatsD manuallyecho "test.metric:1|c" | nc -u -w1 localhost 8125DTrace Probes Not Working
Section titled “DTrace Probes Not Working”# Check DTrace permissionsdtrace -l | head
# Test specific probedtrace -n 'BEGIN { printf("DTrace working!\n"); exit(0); }'
# Check for probe conflictsdtrace -l | grep -c probe
# Enable DTrace debuggingdtrace -x dynvarsize=256m -x aggsize=256mHigh Overhead
Section titled “High Overhead”# Monitor agent resource usageprstat -s cpu | grep -E 'agent|statsd'
# Check probe countdtrace -l | wc -l
# Reduce collection frequency# Edit /etc/datadog-agent/datadog.yamlcheck_interval: 30s # Increase from 15s
# Disable expensive collectorsenable_io_latency: falseMissing ZFS Metrics
Section titled “Missing ZFS Metrics”# Verify kstat accesskstat -m zfs
# Check ZFS module loadedmodinfo | grep zfs
# Test metric collectionkstat -p zfs:0:arcstats:sizeSecurity Considerations
Section titled “Security Considerations”API Key Management
Section titled “API Key Management”# Store API key securely (not in config file)echo "YOUR_API_KEY" > /etc/datadog-agent/api_keychmod 600 /etc/datadog-agent/api_keychown root:root /etc/datadog-agent/api_key
# Reference in configapi_key_file: /etc/datadog-agent/api_keyDTrace Security
Section titled “DTrace Security”# Restrict DTrace to specific usersusermod -K defaultpriv=basic,dtrace_proc,dtrace_user vibecode
# Audit DTrace usageauditconfig -setpolicy +argv,+groupauditconfig -setflags lo,ad,ex,-apNetwork Security
Section titled “Network Security”# Encrypt metrics in transit (TLS)use_ssl: truessl_verify: true
# Restrict StatsD listener to localhostbind_host: 127.0.0.1Cost Optimization
Section titled “Cost Optimization”Metric Volume Control
Section titled “Metric Volume Control”# Reduce metric cardinalityexclude_tags: - "container_id" - "pod_uid"
# Aggregate before sendingaggregation_interval: 60s
# Sample high-volume metricsmetrics_config: - name: "system.net.*" sample_rate: 0.1 # Sample 10% of metricsData Retention
Section titled “Data Retention”# Configure retention in Datadog UI# Keep only 15-day retention for high-volume metrics# Use longer retention for critical business metricsNext Steps
Section titled “Next Steps”- Choose Integration Approach: Based on your requirements and resources
- Deploy Test Environment: Set up on non-production OpenIndiana system
- Configure Metrics: Start with system metrics, expand to application metrics
- Create Dashboards: Visualize your metrics in Datadog
- Set Up Alerts: Configure proactive monitoring
- Optimize: Fine-tune collection intervals and metric selection
Additional Resources
Section titled “Additional Resources”- Datadog API Documentation
- DTrace Guide
- illumos kstat Documentation
- StatsD Protocol
- OpenIndiana Zones Guide
Community
Section titled “Community”- GitHub Issues: Report bugs and request features
- Discord: Join VibeCode community
- Mailing List: openindiana-discuss@openindiana.org
Back to: OpenIndiana Platform Guide