CBIT Test Reference

Complete reference for all 21 Continuous Built-In Tests (CBIT).

CBIT tests run periodically during system operation to monitor health, detect degradation, and alert to emerging issues. They enable proactive maintenance and prevent failures.

System Tests

cbit_bsp_version

Purpose: Continuously monitor BSP version information consistency.

What it monitors: - BSP release file integrity - Version information stability - Unexpected changes to BSP metadata

Configuration:

[cbit_bsp_version]
enabled = true
expected_version = "1.0.0"
expected_model = "DevBoard"
interval_secs = 3600  # Check hourly

Monitoring behavior: - Runs every interval_secs - Logs version drift - Detects tampering with BSP files

Alert conditions: - BSP file disappears - Version changes unexpectedly - Model information modified


cbit_checksum

Purpose: Continuously verify file integrity via checksums.

What it monitors: - Critical file modification - Unauthorized changes - File corruption

Configuration:

[cbit_checksum]
enabled = true

[[file]]
path = "/etc/fstab"
checksum = "38f46022c28fe35e"

[[file]]
path = "/boot/config.txt"
checksum = "7f2e91ac44d8b9c1"

Monitoring behavior: - Recalculates checksums periodically - Compares to baseline - Logs all mismatches

Alert conditions: - Any file checksum differs from expected - File no longer accessible - Permissions prevent reading

Maintenance: - Regenerate checksums after legitimate updates - Use bit-learn after system maintenance


cbit_dmesg

Purpose: Continuously monitor kernel messages for errors.

What it monitors: - New kernel errors/warnings - Driver failures - Hardware issues appearing during runtime

Configuration:

[cbit_dmesg]
enabled = true
error_patterns = ["error", "fail", "critical", "panic", "oops", "bug"]
warning_patterns = ["warn", "warning"]
ignore_patterns = ["acpi PNP0C14:01"]  # Benign messages
check_interval_secs = 300

Monitoring behavior: - Scans new dmesg entries since last check - Filters against patterns - Tracks error frequency

Alert conditions: - Critical error patterns detected - Repeated warnings - New driver failures

Common issues: - Requires elevated privileges - Log rotation can cause missed messages - Noisy hardware generates false positives


Hardware Tests

cbit_can

Purpose: Monitor CAN bus interface health continuously.

What it monitors: - Interface availability - Link state changes - Interface statistics (errors, drops)

Configuration:

[cbit_can]
enabled = true

[[can]]
interface = "can0"
check_interval_secs = 60

[[can]]
interface = "can1"
check_interval_secs = 60

Monitoring behavior: - Verifies interface exists - Checks operational state - Monitors error counters

Alert conditions: - Interface disappears - State changes unexpectedly - Error rate exceeds threshold

Best practices: - Monitor during active CAN traffic - Correlate with application logs - Check bus termination if errors


cbit_cpu_cores

Purpose: Continuously verify CPU core availability.

What it monitors: - Core count stability - Core hotplug events - CPU failures

Configuration:

[cbit_cpu_cores]
enabled = true
expected_cores = 8
alert_on_mismatch = true

Monitoring behavior: - Counts online cores - Detects core changes - Logs core count history

Alert conditions: - Core count drops below expected - Cores offline unexpectedly

Use cases: - Detect CPU failures - Monitor thermal throttling impacts - Validate power management behavior


cbit_disk_health

Purpose: Continuously monitor disk SMART health data.

What it monitors: - SMART attribute changes - Pre-fail warnings - Wear indicators (SSD)

Configuration:

[cbit_disk_health]
enabled = true

[[disk]]
device = "/dev/sda"
check_interval_secs = 3600  # Hourly

[[disk]]
device = "/dev/nvme0n1"
check_interval_secs = 3600

Monitoring behavior: - Reads SMART attributes - Tracks attribute trends - Detects degradation

Alert conditions: - SMART status changes to FAILING - Reallocated sector count increases - Temperature exceeds threshold - Wear leveling concerns (SSD)

Best practices: - Don't check too frequently (disk wear) - Monitor trends over time - Act on pre-fail warnings immediately


cbit_ethernet

Purpose: Monitor network interface health and status.

What it monitors: - Link state (up/down) - Link speed changes - Interface errors and drops

Configuration:

[cbit_ethernet]
enabled = true

[[interface]]
name = "eth0"
expected_speed = 1000
expected_state = "up"
check_interval_secs = 60

[[interface]]
name = "eth1"
expected_speed = 1000
expected_state = "up"
check_interval_secs = 60

Monitoring behavior: - Checks interface state - Validates speed negotiation - Monitors statistics

Alert conditions: - Link goes down unexpectedly - Speed negotiates lower than expected - Error/drop counters increase rapidly

Troubleshooting: - Check physical cable - Verify switch port configuration - Monitor for duplex mismatches


cbit_gpio

Purpose: Continuously monitor GPIO pin states.

What it monitors: - Pin state changes - Unexpected transitions - Hardware signal integrity

Configuration:

[cbit_gpio]
enabled = true

[[gpio]]
pin = 17
expected_state = "high"
check_interval_secs = 5

[[gpio]]
pin = 27
expected_state = "low"
check_interval_secs = 5

Monitoring behavior: - Reads pin states - Detects transitions - Logs state history

Alert conditions: - Pin state differs from expected - Rapid state changes (bouncing) - Pin becomes inaccessible

Use cases: - Monitor hardware interlocks - Detect sensor failures - Validate control signals


cbit_gpu_loading

Purpose: Monitor GPU utilization continuously.

What it monitors: - GPU usage percentage - Sustained high utilization - GPU availability

Configuration:

[cbit_gpu_loading]
enabled = true
threshold = 95
sustained_threshold_secs = 300  # Alert if >95% for 5min
check_interval_secs = 30

Monitoring behavior: - Queries GPU utilization - Tracks utilization history - Detects sustained overload

Alert conditions: - Utilization exceeds threshold - Sustained high usage - GPU becomes unresponsive

Performance tuning: - Adjust threshold for workload - Monitor temperature correlation - Check for runaway processes


cbit_temperature

Purpose: Continuously monitor system temperatures.

What it monitors: - CPU/GPU/disk temperatures - Thermal zone trends - Cooling system effectiveness

Configuration:

[cbit_temperature]
enabled = true

[[thermal_zone]]
label = "Core 0"
threshold = 85.0
critical_threshold = 95.0
check_interval_secs = 60

[[thermal_zone]]
label = "GPU"
threshold = 80.0
critical_threshold = 90.0
check_interval_secs = 60

Monitoring behavior: - Reads thermal sensors - Tracks temperature trends - Calculates moving averages

Alert conditions: - Temperature exceeds threshold - Rapid temperature increase - Critical threshold reached

Thermal management: - Monitor ambient temperature impact - Check fan operation - Verify thermal paste/pads - Clean dust from heatsinks


Resource Tests

cbit_cpu_usage

Purpose: Monitor CPU utilization continuously.

What it monitors: - Overall CPU usage - Per-core utilization (if configured) - Sustained high load

Configuration:

[cbit_cpu_usage]
enabled = true
threshold = 90
sustained_threshold_secs = 600  # Alert if >90% for 10min
check_interval_secs = 30

Monitoring behavior: - Samples CPU usage - Calculates averages - Detects load spikes

Alert conditions: - Usage exceeds threshold - Sustained high CPU load - Single core bottleneck

Performance analysis: - Identify CPU-bound processes - Check for runaway tasks - Monitor system load average - Correlate with application behavior


cbit_disk_usage

Purpose: Monitor filesystem space utilization.

What it monitors: - Disk space consumption - Growth rate - Capacity planning needs

Configuration:

[cbit_disk_usage]
enabled = true

[[disk]]
disk = "/dev/sda1"
threshold = 80
critical_threshold = 95
check_interval_secs = 300

[[disk]]
disk = "/dev/shm"
threshold = 70
check_interval_secs = 60

Monitoring behavior: - Measures filesystem usage - Tracks usage trends - Predicts time-to-full

Alert conditions: - Usage exceeds threshold - Rapid growth detected - Critical threshold reached

Disk management: - Identify large files/directories - Clean up logs and temporary files - Monitor log rotation - Check for disk leaks


cbit_memory_usage

Purpose: Monitor RAM and swap utilization.

What it monitors: - Physical memory usage - Swap usage - Available memory - Memory pressure

Configuration:

[cbit_memory_usage]
enabled = true
threshold = 90
swap_threshold = 50
min_available_mb = 512
check_interval_secs = 60

Monitoring behavior: - Samples memory statistics - Tracks usage trends - Monitors swap activity

Alert conditions: - Memory usage exceeds threshold - Swap usage increasing - Available memory too low - OOM killer activity

Memory analysis: - Identify memory leaks - Check for runaway processes - Monitor cache vs used memory - Analyze swap usage patterns


cbit_power_consumption

Purpose: Monitor system power consumption and voltage.

What it monitors: - Power draw (watts) - Voltage levels - Power trends - Efficiency

Configuration:

[cbit_power_consumption]
enabled = true
min_voltage_mv = 3000
max_voltage_mv = 20000
max_power_uw = 150000000  # 150W
check_interval_secs = 60

Monitoring behavior: - Reads power sensors - Calculates consumption - Tracks power trends

Alert conditions: - Voltage out of range - Power exceeds limit - Unexpected power spikes - Voltage instability

Power management: - Correlate with workload - Check PSU capacity - Monitor efficiency - Detect failing PSU


Security Tests

cbit_ethernet_status

Purpose: Monitor network connectivity and reachability.

What it monitors: - Interface operational status - Network connectivity - Link quality metrics

Configuration:

[cbit_ethernet_status]
enabled = true

[[interface]]
name = "eth0"
check_connectivity = true
ping_target = "8.8.8.8"
check_interval_secs = 60

Monitoring behavior: - Checks interface status - Tests connectivity (optional) - Monitors link statistics

Alert conditions: - Interface down - Connectivity lost - Packet loss excessive - Latency high


cbit_firewall_configuration

Purpose: Continuously verify firewall configuration.

What it monitors: - Firewall enabled state - Rule count/integrity - Configuration changes

Configuration:

[cbit_firewall_configuration]
enabled = true
expected_firewall = "ufw"
expected_state = "active"
check_interval_secs = 300

Monitoring behavior: - Checks firewall status - Validates configuration - Detects rule changes

Alert conditions: - Firewall disabled - Unexpected rule changes - Configuration drift


cbit_pci_whitelist

Purpose: Continuously monitor PCI device changes.

What it monitors: - PCI device hotplug - Unauthorized hardware - Device removal

Configuration:

[cbit_pci_whitelist]
enabled = true
check_interval_secs = 60

[[device]]
vendor_id = "8086"
device_id = "15d7"
manufacturer = "Intel Corporation"
device_description = "Ethernet Connection"

Monitoring behavior: - Scans PCI bus - Compares to whitelist - Detects changes

Alert conditions: - Unknown device appears - Whitelisted device removed - Device ID mismatch


cbit_permissions

Purpose: Monitor file permission changes.

What it monitors: - File permission drift - Ownership changes - Security policy compliance

Configuration:

[cbit_permissions]
enabled = true

[[file]]
path = "/etc/shadow"
mode = "0o640"
owner = "root"
group = "shadow"

[[file]]
path = "/etc/ssh/sshd_config"
mode = "0o644"
owner = "root"
group = "root"

Monitoring behavior: - Checks permissions periodically - Validates ownership - Detects changes

Alert conditions: - Permissions changed - Ownership modified - Security weakening detected


cbit_permissions_verification

Purpose: Comprehensive permission verification across system files.

What it monitors: - Bulk permission checks - Security policy compliance - Configuration file integrity

Configuration:

[cbit_permissions_verification]
enabled = true
check_interval_secs = 3600

[[file]]
path = "/etc/passwd"
mode = "0o644"

[[file]]
path = "/etc/shadow"
mode = "0o640"

[[file]]
path = "/root"
mode = "0o700"

Monitoring behavior: - Scans multiple files - Batch permission verification - Generates compliance report

Alert conditions: - Any permission mismatch - Insecure permissions detected - Critical file exposure


cbit_selinux_apparmor_status

Purpose: Monitor security framework status.

What it monitors: - SELinux/AppArmor state - Enforcing mode - Security policy active

Configuration:

[cbit_selinux_apparmor_status]
enabled = true
expected_system = "apparmor"
expected_status = "enabled"
check_interval_secs = 300

Monitoring behavior: - Checks security system status - Validates mode - Detects policy changes

Alert conditions: - Security framework disabled - Mode changed to permissive - Policy violations detected


cbit_serial_ports

Purpose: Monitor serial port availability and health.

What it monitors: - Port existence - Device accessibility - Port configuration

Configuration:

[cbit_serial_ports]
enabled = true

[[port]]
device = "/dev/ttyS0"
check_interval_secs = 60

[[port]]
device = "/dev/ttyUSB0"
check_interval_secs = 60

Monitoring behavior: - Checks port existence - Validates accessibility - Monitors port statistics

Alert conditions: - Port disappears - Access denied - Device errors


cbit_ssh_configuration

Purpose: Monitor SSH daemon configuration.

What it monitors: - sshd_config integrity - Security settings - Configuration changes

Configuration:

[cbit_ssh_configuration]
enabled = true
check_interval_secs = 3600

[[setting]]
key = "PermitRootLogin"
value = "no"

[[setting]]
key = "PasswordAuthentication"
value = "no"

[[setting]]
key = "Port"
value = "22"

Monitoring behavior: - Parses config file - Validates settings - Detects modifications

Alert conditions: - Setting changed - Security weakened - Unexpected configuration


cbit_syslog_analysis

Purpose: Continuously analyze system logs.

What it monitors: - New error messages - Warning patterns - Anomalous log activity

Configuration:

[cbit_syslog_analysis]
enabled = true
lines_to_check = 100  # Recent lines
error_patterns = ["error", "critical", "fail"]
warning_patterns = ["warn", "warning"]
ignore_patterns = ["chronyd", "systemd-logind"]
check_interval_secs = 300

Monitoring behavior: - Tails system log - Searches for patterns - Tracks error frequency

Alert conditions: - Error patterns detected - High error rate - Critical messages


cbit_usb_whitelist

Purpose: Monitor USB device connections.

What it monitors: - USB device hotplug - Unauthorized devices - Device removal

Configuration:

[cbit_usb_whitelist]
enabled = true
check_interval_secs = 60

[[device]]
device_name = "USB Hub"
vendor_id = "1d6b"
device_id = "0002"

[[device]]
device_name = "Security Key"
vendor_id = "1050"
device_id = "0407"

Monitoring behavior: - Scans USB bus - Compares to whitelist - Detects changes

Alert conditions: - Unknown device connected - Whitelisted device removed - Unauthorized USB storage


Running CBIT Tests

Continuous Monitoring

# Run all CBIT tests continuously
bit-manager -c

# Background monitoring with logging
bit-manager -c > /var/log/bit-cbit.log 2>&1 &

Selective Monitoring

# Monitor hardware only
bit-manager -t cbit_cpu_cores -t cbit_disk_health -t cbit_temperature -c

# Security monitoring
bit-manager -t cbit_usb_whitelist -t cbit_pci_whitelist -t cbit_firewall_configuration -c

# Resource monitoring
bit-manager -t cbit_cpu_usage -t cbit_memory_usage -t cbit_disk_usage -c

Systemd Integration

Create a systemd service for continuous CBIT monitoring:

[Unit]
Description=BIT Continuous Monitoring
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/bit-manager -c
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Next Steps