CBIT Test Reference¶
Complete reference for all 21 Continuous Built-In Tests (CBIT).
CBIT tests run periodically during system operation to monitor health, detect degradation, and alert to emerging issues. They enable proactive maintenance and prevent failures.
System Tests¶
cbit_bsp_version¶
Purpose: Continuously monitor BSP version information consistency.
What it monitors: - BSP release file integrity - Version information stability - Unexpected changes to BSP metadata
Configuration:
[cbit_bsp_version]
enabled = true
expected_version = "1.0.0"
expected_model = "DevBoard"
interval_secs = 3600 # Check hourly
Monitoring behavior: - Runs every interval_secs - Logs version drift - Detects tampering with BSP files
Alert conditions: - BSP file disappears - Version changes unexpectedly - Model information modified
cbit_checksum¶
Purpose: Continuously verify file integrity via checksums.
What it monitors: - Critical file modification - Unauthorized changes - File corruption
Configuration:
[cbit_checksum]
enabled = true
[[file]]
path = "/etc/fstab"
checksum = "38f46022c28fe35e"
[[file]]
path = "/boot/config.txt"
checksum = "7f2e91ac44d8b9c1"
Monitoring behavior: - Recalculates checksums periodically - Compares to baseline - Logs all mismatches
Alert conditions: - Any file checksum differs from expected - File no longer accessible - Permissions prevent reading
Maintenance:
- Regenerate checksums after legitimate updates
- Use bit-learn after system maintenance
cbit_dmesg¶
Purpose: Continuously monitor kernel messages for errors.
What it monitors: - New kernel errors/warnings - Driver failures - Hardware issues appearing during runtime
Configuration:
[cbit_dmesg]
enabled = true
error_patterns = ["error", "fail", "critical", "panic", "oops", "bug"]
warning_patterns = ["warn", "warning"]
ignore_patterns = ["acpi PNP0C14:01"] # Benign messages
check_interval_secs = 300
Monitoring behavior: - Scans new dmesg entries since last check - Filters against patterns - Tracks error frequency
Alert conditions: - Critical error patterns detected - Repeated warnings - New driver failures
Common issues: - Requires elevated privileges - Log rotation can cause missed messages - Noisy hardware generates false positives
Hardware Tests¶
cbit_can¶
Purpose: Monitor CAN bus interface health continuously.
What it monitors: - Interface availability - Link state changes - Interface statistics (errors, drops)
Configuration:
[cbit_can]
enabled = true
[[can]]
interface = "can0"
check_interval_secs = 60
[[can]]
interface = "can1"
check_interval_secs = 60
Monitoring behavior: - Verifies interface exists - Checks operational state - Monitors error counters
Alert conditions: - Interface disappears - State changes unexpectedly - Error rate exceeds threshold
Best practices: - Monitor during active CAN traffic - Correlate with application logs - Check bus termination if errors
cbit_cpu_cores¶
Purpose: Continuously verify CPU core availability.
What it monitors: - Core count stability - Core hotplug events - CPU failures
Configuration:
Monitoring behavior: - Counts online cores - Detects core changes - Logs core count history
Alert conditions: - Core count drops below expected - Cores offline unexpectedly
Use cases: - Detect CPU failures - Monitor thermal throttling impacts - Validate power management behavior
cbit_disk_health¶
Purpose: Continuously monitor disk SMART health data.
What it monitors: - SMART attribute changes - Pre-fail warnings - Wear indicators (SSD)
Configuration:
[cbit_disk_health]
enabled = true
[[disk]]
device = "/dev/sda"
check_interval_secs = 3600 # Hourly
[[disk]]
device = "/dev/nvme0n1"
check_interval_secs = 3600
Monitoring behavior: - Reads SMART attributes - Tracks attribute trends - Detects degradation
Alert conditions: - SMART status changes to FAILING - Reallocated sector count increases - Temperature exceeds threshold - Wear leveling concerns (SSD)
Best practices: - Don't check too frequently (disk wear) - Monitor trends over time - Act on pre-fail warnings immediately
cbit_ethernet¶
Purpose: Monitor network interface health and status.
What it monitors: - Link state (up/down) - Link speed changes - Interface errors and drops
Configuration:
[cbit_ethernet]
enabled = true
[[interface]]
name = "eth0"
expected_speed = 1000
expected_state = "up"
check_interval_secs = 60
[[interface]]
name = "eth1"
expected_speed = 1000
expected_state = "up"
check_interval_secs = 60
Monitoring behavior: - Checks interface state - Validates speed negotiation - Monitors statistics
Alert conditions: - Link goes down unexpectedly - Speed negotiates lower than expected - Error/drop counters increase rapidly
Troubleshooting: - Check physical cable - Verify switch port configuration - Monitor for duplex mismatches
cbit_gpio¶
Purpose: Continuously monitor GPIO pin states.
What it monitors: - Pin state changes - Unexpected transitions - Hardware signal integrity
Configuration:
[cbit_gpio]
enabled = true
[[gpio]]
pin = 17
expected_state = "high"
check_interval_secs = 5
[[gpio]]
pin = 27
expected_state = "low"
check_interval_secs = 5
Monitoring behavior: - Reads pin states - Detects transitions - Logs state history
Alert conditions: - Pin state differs from expected - Rapid state changes (bouncing) - Pin becomes inaccessible
Use cases: - Monitor hardware interlocks - Detect sensor failures - Validate control signals
cbit_gpu_loading¶
Purpose: Monitor GPU utilization continuously.
What it monitors: - GPU usage percentage - Sustained high utilization - GPU availability
Configuration:
[cbit_gpu_loading]
enabled = true
threshold = 95
sustained_threshold_secs = 300 # Alert if >95% for 5min
check_interval_secs = 30
Monitoring behavior: - Queries GPU utilization - Tracks utilization history - Detects sustained overload
Alert conditions: - Utilization exceeds threshold - Sustained high usage - GPU becomes unresponsive
Performance tuning: - Adjust threshold for workload - Monitor temperature correlation - Check for runaway processes
cbit_temperature¶
Purpose: Continuously monitor system temperatures.
What it monitors: - CPU/GPU/disk temperatures - Thermal zone trends - Cooling system effectiveness
Configuration:
[cbit_temperature]
enabled = true
[[thermal_zone]]
label = "Core 0"
threshold = 85.0
critical_threshold = 95.0
check_interval_secs = 60
[[thermal_zone]]
label = "GPU"
threshold = 80.0
critical_threshold = 90.0
check_interval_secs = 60
Monitoring behavior: - Reads thermal sensors - Tracks temperature trends - Calculates moving averages
Alert conditions: - Temperature exceeds threshold - Rapid temperature increase - Critical threshold reached
Thermal management: - Monitor ambient temperature impact - Check fan operation - Verify thermal paste/pads - Clean dust from heatsinks
Resource Tests¶
cbit_cpu_usage¶
Purpose: Monitor CPU utilization continuously.
What it monitors: - Overall CPU usage - Per-core utilization (if configured) - Sustained high load
Configuration:
[cbit_cpu_usage]
enabled = true
threshold = 90
sustained_threshold_secs = 600 # Alert if >90% for 10min
check_interval_secs = 30
Monitoring behavior: - Samples CPU usage - Calculates averages - Detects load spikes
Alert conditions: - Usage exceeds threshold - Sustained high CPU load - Single core bottleneck
Performance analysis: - Identify CPU-bound processes - Check for runaway tasks - Monitor system load average - Correlate with application behavior
cbit_disk_usage¶
Purpose: Monitor filesystem space utilization.
What it monitors: - Disk space consumption - Growth rate - Capacity planning needs
Configuration:
[cbit_disk_usage]
enabled = true
[[disk]]
disk = "/dev/sda1"
threshold = 80
critical_threshold = 95
check_interval_secs = 300
[[disk]]
disk = "/dev/shm"
threshold = 70
check_interval_secs = 60
Monitoring behavior: - Measures filesystem usage - Tracks usage trends - Predicts time-to-full
Alert conditions: - Usage exceeds threshold - Rapid growth detected - Critical threshold reached
Disk management: - Identify large files/directories - Clean up logs and temporary files - Monitor log rotation - Check for disk leaks
cbit_memory_usage¶
Purpose: Monitor RAM and swap utilization.
What it monitors: - Physical memory usage - Swap usage - Available memory - Memory pressure
Configuration:
[cbit_memory_usage]
enabled = true
threshold = 90
swap_threshold = 50
min_available_mb = 512
check_interval_secs = 60
Monitoring behavior: - Samples memory statistics - Tracks usage trends - Monitors swap activity
Alert conditions: - Memory usage exceeds threshold - Swap usage increasing - Available memory too low - OOM killer activity
Memory analysis: - Identify memory leaks - Check for runaway processes - Monitor cache vs used memory - Analyze swap usage patterns
cbit_power_consumption¶
Purpose: Monitor system power consumption and voltage.
What it monitors: - Power draw (watts) - Voltage levels - Power trends - Efficiency
Configuration:
[cbit_power_consumption]
enabled = true
min_voltage_mv = 3000
max_voltage_mv = 20000
max_power_uw = 150000000 # 150W
check_interval_secs = 60
Monitoring behavior: - Reads power sensors - Calculates consumption - Tracks power trends
Alert conditions: - Voltage out of range - Power exceeds limit - Unexpected power spikes - Voltage instability
Power management: - Correlate with workload - Check PSU capacity - Monitor efficiency - Detect failing PSU
Security Tests¶
cbit_ethernet_status¶
Purpose: Monitor network connectivity and reachability.
What it monitors: - Interface operational status - Network connectivity - Link quality metrics
Configuration:
[cbit_ethernet_status]
enabled = true
[[interface]]
name = "eth0"
check_connectivity = true
ping_target = "8.8.8.8"
check_interval_secs = 60
Monitoring behavior: - Checks interface status - Tests connectivity (optional) - Monitors link statistics
Alert conditions: - Interface down - Connectivity lost - Packet loss excessive - Latency high
cbit_firewall_configuration¶
Purpose: Continuously verify firewall configuration.
What it monitors: - Firewall enabled state - Rule count/integrity - Configuration changes
Configuration:
[cbit_firewall_configuration]
enabled = true
expected_firewall = "ufw"
expected_state = "active"
check_interval_secs = 300
Monitoring behavior: - Checks firewall status - Validates configuration - Detects rule changes
Alert conditions: - Firewall disabled - Unexpected rule changes - Configuration drift
cbit_pci_whitelist¶
Purpose: Continuously monitor PCI device changes.
What it monitors: - PCI device hotplug - Unauthorized hardware - Device removal
Configuration:
[cbit_pci_whitelist]
enabled = true
check_interval_secs = 60
[[device]]
vendor_id = "8086"
device_id = "15d7"
manufacturer = "Intel Corporation"
device_description = "Ethernet Connection"
Monitoring behavior: - Scans PCI bus - Compares to whitelist - Detects changes
Alert conditions: - Unknown device appears - Whitelisted device removed - Device ID mismatch
cbit_permissions¶
Purpose: Monitor file permission changes.
What it monitors: - File permission drift - Ownership changes - Security policy compliance
Configuration:
[cbit_permissions]
enabled = true
[[file]]
path = "/etc/shadow"
mode = "0o640"
owner = "root"
group = "shadow"
[[file]]
path = "/etc/ssh/sshd_config"
mode = "0o644"
owner = "root"
group = "root"
Monitoring behavior: - Checks permissions periodically - Validates ownership - Detects changes
Alert conditions: - Permissions changed - Ownership modified - Security weakening detected
cbit_permissions_verification¶
Purpose: Comprehensive permission verification across system files.
What it monitors: - Bulk permission checks - Security policy compliance - Configuration file integrity
Configuration:
[cbit_permissions_verification]
enabled = true
check_interval_secs = 3600
[[file]]
path = "/etc/passwd"
mode = "0o644"
[[file]]
path = "/etc/shadow"
mode = "0o640"
[[file]]
path = "/root"
mode = "0o700"
Monitoring behavior: - Scans multiple files - Batch permission verification - Generates compliance report
Alert conditions: - Any permission mismatch - Insecure permissions detected - Critical file exposure
cbit_selinux_apparmor_status¶
Purpose: Monitor security framework status.
What it monitors: - SELinux/AppArmor state - Enforcing mode - Security policy active
Configuration:
[cbit_selinux_apparmor_status]
enabled = true
expected_system = "apparmor"
expected_status = "enabled"
check_interval_secs = 300
Monitoring behavior: - Checks security system status - Validates mode - Detects policy changes
Alert conditions: - Security framework disabled - Mode changed to permissive - Policy violations detected
cbit_serial_ports¶
Purpose: Monitor serial port availability and health.
What it monitors: - Port existence - Device accessibility - Port configuration
Configuration:
[cbit_serial_ports]
enabled = true
[[port]]
device = "/dev/ttyS0"
check_interval_secs = 60
[[port]]
device = "/dev/ttyUSB0"
check_interval_secs = 60
Monitoring behavior: - Checks port existence - Validates accessibility - Monitors port statistics
Alert conditions: - Port disappears - Access denied - Device errors
cbit_ssh_configuration¶
Purpose: Monitor SSH daemon configuration.
What it monitors: - sshd_config integrity - Security settings - Configuration changes
Configuration:
[cbit_ssh_configuration]
enabled = true
check_interval_secs = 3600
[[setting]]
key = "PermitRootLogin"
value = "no"
[[setting]]
key = "PasswordAuthentication"
value = "no"
[[setting]]
key = "Port"
value = "22"
Monitoring behavior: - Parses config file - Validates settings - Detects modifications
Alert conditions: - Setting changed - Security weakened - Unexpected configuration
cbit_syslog_analysis¶
Purpose: Continuously analyze system logs.
What it monitors: - New error messages - Warning patterns - Anomalous log activity
Configuration:
[cbit_syslog_analysis]
enabled = true
lines_to_check = 100 # Recent lines
error_patterns = ["error", "critical", "fail"]
warning_patterns = ["warn", "warning"]
ignore_patterns = ["chronyd", "systemd-logind"]
check_interval_secs = 300
Monitoring behavior: - Tails system log - Searches for patterns - Tracks error frequency
Alert conditions: - Error patterns detected - High error rate - Critical messages
cbit_usb_whitelist¶
Purpose: Monitor USB device connections.
What it monitors: - USB device hotplug - Unauthorized devices - Device removal
Configuration:
[cbit_usb_whitelist]
enabled = true
check_interval_secs = 60
[[device]]
device_name = "USB Hub"
vendor_id = "1d6b"
device_id = "0002"
[[device]]
device_name = "Security Key"
vendor_id = "1050"
device_id = "0407"
Monitoring behavior: - Scans USB bus - Compares to whitelist - Detects changes
Alert conditions: - Unknown device connected - Whitelisted device removed - Unauthorized USB storage
Running CBIT Tests¶
Continuous Monitoring¶
# Run all CBIT tests continuously
bit-manager -c
# Background monitoring with logging
bit-manager -c > /var/log/bit-cbit.log 2>&1 &
Selective Monitoring¶
# Monitor hardware only
bit-manager -t cbit_cpu_cores -t cbit_disk_health -t cbit_temperature -c
# Security monitoring
bit-manager -t cbit_usb_whitelist -t cbit_pci_whitelist -t cbit_firewall_configuration -c
# Resource monitoring
bit-manager -t cbit_cpu_usage -t cbit_memory_usage -t cbit_disk_usage -c
Systemd Integration¶
Create a systemd service for continuous CBIT monitoring:
[Unit]
Description=BIT Continuous Monitoring
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/bit-manager -c
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Next Steps¶
- PBIT Tests - Power-on validation tests
- FBIT Tests - Factory validation tests
- Test Overview - Complete test catalog
- Running Tests - Execution modes