Skip to content

Info

License: Free - No license required. Available in all tiers.

CPU Probe

The CPU probe monitors processor performance across all major operating systems, providing comprehensive metrics for usage, load, and system-level CPU statistics.

Quick Start

Basic Configuration

probes:
  - name: cpu
    type: cpu
    params:
      interval: 30  # Collection interval in seconds (default: 30)

Minimal Configuration

probes:
  - name: cpu
    type: cpu
    params: {}

The CPU probe requires no mandatory parameters and works out-of-the-box with default settings.

Supported Platforms

  • Windows: Windows Server 2012+ / Windows 10+
  • Linux: All modern distributions (Ubuntu, RHEL, CentOS, Debian, etc.)
  • macOS: macOS 10.13+ (with graceful degradation)
  • BSD: FreeBSD, OpenBSD, NetBSD

Platform-specific metrics are automatically detected and collected based on the operating system.

macOS Platform Notes

On macOS, the gopsutil library has limited support for detailed CPU time metrics (cpu.Times()). The CPU probe implements graceful degradation:

  • If detailed CPU times unavailable: Probe continues with load average metrics (cpu_load1, cpu_load5, cpu_load15)
  • Always available: CPU usage percentage (cpu_usage_total, cpu_core_usage)
  • Behavior: Logs warnings for unavailable metrics but remains active

This ensures the probe stays functional even when platform limitations exist, providing at minimum load average and usage percentage metrics.

Key Metrics Summary

Cross-Platform Metrics

Metric Description Available On
cpu_usage_total Total CPU usage percentage (0-100%) All platforms
cpu_core_usage Per-core CPU usage percentage All platforms
cpu_user User-mode CPU time All platforms
cpu_system System-mode CPU time All platforms
cpu_irq Hardware interrupt time All platforms
cpu_softirq Software interrupt time All platforms

Unix/Linux/macOS Specific

Metric Description
cpu_idle CPU idle time (seconds)
cpu_nice CPU nice priority time (seconds)
cpu_iowait CPU I/O wait time (seconds)
cpu_steal CPU steal time for VMs (seconds)
cpu_load1 Load average (1 minute)
cpu_load5 Load average (5 minutes)
cpu_load15 Load average (15 minutes)

Windows Specific

Metric Description
cpu_dpc_rate Deferred Procedure Calls per second
cpu_dpc_queued DPCs queued per second
cpu_interrupts Hardware interrupts per second
cpu_queue_length Processor queue length

Configuration Parameters

Parameter Type Default Description
interval integer 30 Collection interval in seconds

Example Configurations

High-frequency monitoring (every 10 seconds):

probes:
  - name: cpu
    type: cpu
    params:
      interval: 10

Standard monitoring (every minute):

probes:
  - name: cpu
    type: cpu
    params:
      interval: 60

Monitoring Tool Integration

PRTG Network Monitor

Access CPU metrics in PRTG JSON format:

# All CPU metrics
curl http://localhost:8080/api/{agentkey}/prtg/metrics

# Configure PRTG HTTP Advanced Sensor:
# - URL: http://agent-host:8080/api/{agentkey}/prtg/metrics
# - Method: POST
# - Request body: {"probe": "cpu"}

PRTG Channels Available: - CPU Total Usage (%) - CPU Core 0-N Usage (%) - CPU User Time (% or seconds) - CPU System Time (% or seconds) - CPU Load Average (Linux/Unix) - DPC Rate & Interrupts (Windows)

Nagios

Access CPU metrics in Nagios format:

# All CPU metrics with performance data
curl http://localhost:8080/api/{agentkey}/nagios/metrics?probe=cpu

# Example output:
# OK - CPU monitoring active | cpu_usage_total=42.5%;80;90 cpu_load1=1.23;;;

Nagios Performance Data: - cpu_usage_total - Total CPU usage with 80% warning, 90% critical - cpu_load1, cpu_load5, cpu_load15 - Load averages (Unix) - cpu_queue_length - Processor queue (Windows)

Web Interface

View CPU metrics in the built-in dashboard:

http://localhost:8080/web/{agentkey}/dashboard

Features: - Real-time CPU usage visualization - Per-core CPU usage breakdown - Load average trends (Unix/Linux) - System-wide CPU statistics

Use Cases

Performance Monitoring

Monitor CPU usage to identify: - High CPU consumers - CPU bottlenecks - Per-core imbalances - System vs. user time distribution

Capacity Planning

Track CPU trends over time: - Peak usage patterns - Average load levels - Core utilization distribution - Growth trends

VM Performance Analysis

Monitor virtualized environments: - CPU steal time (hypervisor overhead) - Queue length (scheduling delays) - Per-core allocation effectiveness

Troubleshooting

Diagnose system issues: - High I/O wait (storage bottleneck) - Excessive interrupts (hardware issues) - High DPC rate (Windows driver issues) - Load average spikes (Unix/Linux)

Troubleshooting

No Metrics Collected

Check probe status:

# View agent logs with CPU probe debugging
./agent run --verbose --debug-modules probe.cpu

Verify probe is enabled:

# Check configuration
cat agent-config.yaml | grep -A5 "name: cpu"

Windows: PDH Counter Errors

Symptom: Error messages about Performance Data Helper (PDH) counters

Solution: 1. Verify Performance Counter service is running:

Get-Service | Where-Object {$_.Name -eq "PerfHost"}

  1. Rebuild Performance Counters:

    lodctr /R
    

  2. Check Windows Event Log for PDH errors

Unix/Linux: Permission Denied

Symptom: Cannot read /proc/stat or system files

Solution: Run the agent with appropriate permissions:

# Option 1: Run as root
sudo ./agent run

# Option 2: Grant capabilities (Linux)
sudo setcap cap_sys_ptrace=eip ./agent

High CPU Usage from Agent

Symptom: Agent itself consuming significant CPU

Solution: 1. Increase collection interval:

- name: cpu
  type: cpu
  params:
    interval: 60  # Collect every minute instead of 30 seconds

  1. Check for other resource-intensive probes

  2. Review system load and available resources

Per-Core Metrics Missing

Windows: Ensure all CPU cores are enabled in BIOS/firmware

Unix/Linux: Verify /proc/cpuinfo shows all cores:

cat /proc/cpuinfo | grep processor

Performance Considerations

Collection Overhead

The CPU probe has minimal overhead: - Windows: ~10ms per collection (PDH counters) - Unix/Linux: ~50ms per collection (gopsutil library) - macOS: ~30ms per collection (system calls)

Memory Usage

Typical memory footprint per collection: - Base probe: ~500 KB - Per-core metrics: ~50 KB per core - Example: 16-core system = ~1.3 MB total

Use Case Interval Reason
Real-time monitoring 10-15s Catch short-lived spikes
Standard monitoring 30-60s Balance accuracy and overhead
Long-term trending 120-300s Reduce storage and overhead

Advanced Configuration

Multi-Instance Monitoring

Monitor multiple systems with individual configurations:

probes:
  - name: cpu_realtime
    type: cpu
    params:
      interval: 10

  - name: cpu_trending
    type: cpu
    params:
      interval: 300

Note: This will create duplicate metrics. Use unique probe names for different collection intervals.

Integration with Other Probes

Correlate CPU metrics with other system metrics:

probes:
  - name: cpu
    type: cpu
    params:
      interval: 30

  - name: memory
    type: memory
    params:
      interval: 30

  - name: logicaldisk
    type: logicaldisk
    params:
      interval: 60

This provides comprehensive system monitoring with aligned collection intervals.

Authentication

The CPU probe requires no authentication as it collects local system metrics only.

Requirements

Windows

  • Windows Server 2012+ or Windows 10+
  • Performance Counter service enabled
  • No special permissions required (runs as service account)

Linux/Unix/macOS

  • Read access to /proc/stat (Linux)
  • Read access to /proc/loadavg (Linux)
  • System information APIs (macOS, BSD)

Network

  • No network access required (local metrics only)
  • HTTP strategy required for remote access to metrics