VoIP Agent

The GVA VoIP Agent (gva-voip-agent) is an AI-powered voice assistant that can answer calls and respond using natural language. It integrates with Ollama for LLM inference, Piper for text-to-speech, and Vosk for speech recognition—all running locally without cloud dependencies.

Overview

graph LR subgraph "VoIP Agent" SIP[SIP Client] RTP[RTP Audio] STT[Vosk STT] LLM[Ollama LLM] TTS[Piper TTS] MCP[MCP Tools] end subgraph "External" SRV[VoIP Server] OLL[Ollama Server] MCPS[MCP Servers] end SRV <-->|SIP| SIP SRV <-->|RTP| RTP RTP --> STT STT --> LLM LLM --> TTS TTS --> RTP LLM <--> MCP MCP <-->|stdio| MCPS LLM <-->|HTTP| OLL style LLM fill:#9C27B0 style STT fill:#4CAF50 style TTS fill:#2196F3

Features

Offline AI

All AI processing runs locally:

  • Ollama - Local LLM server (Gemma, Llama, Mistral, etc.)
  • Piper - Neural TTS with natural voices
  • Vosk - Offline speech recognition

MCP Tool Support

The agent supports Model Context Protocol (MCP) tools for extended capabilities:

  • Weather queries
  • File system access
  • Web fetching
  • Custom tool servers

Built-in Military Tools

Native C++ tools optimised for military use:

Tool Description
mgrs_to_latlon Convert MGRS to latitude/longitude
latlon_to_mgrs Convert lat/lon to MGRS
calculate_bearing Calculate bearing in degrees/mils
calculate_distance Distance in meters/km/nautical miles
format_dtg Format Date-Time Group (DTG)

SRTP Encryption

All VoIP communications are secured with SRTP encryption (RFC 3711 / RFC 4568) using AES-256-CM-HMAC-SHA1-80—suitable for OFFICIAL-SENSITIVE classification. Encryption is always enabled and cannot be disabled.

Standalone Mode

Run without VoIP—uses local microphone and speakers:

./build/bin/gva-voip-agent --standalone --model=gemma2

Prerequisites

Ollama

Install and start Ollama with a model:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull gemma2

# Start server (runs on localhost:11434)
ollama serve

Piper TTS

Install Piper and download voice models:

# Install via pipx
pipx install piper-tts

# Download voice model
mkdir -p ~/.local/share/piper
cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

Vosk STT

Download a Vosk model:

# Download model
mkdir -p ~/.local/share/vosk
cd ~/.local/share/vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 en-us

Command Line Options

gva-voip-agent [options]

Mode:
  --standalone            Use local mic/speaker instead of VoIP

SIP Options (VoIP mode):
  -u, --user=<username>   SIP username (default: agent.gemma4)
  -s, --server=<host>     SIP server address (default: 127.0.0.1)
  -p, --port=<port>       SIP server port (default: 5060)
  --local-port=<port>     Local SIP port (default: 5062)
  --rtp-port=<port>       Local RTP port (default: 10002)
  --display-name=<name>   SIP display name

LLM Options:
  -m, --model=<name>      Ollama model name (default: gemma2)
  --ollama-url=<url>      Ollama server URL (default: http://localhost:11434)
  --system-prompt=<text>  Custom system prompt

TTS/STT Options:
  --voice=<path>          Piper voice model path
  --stt-model=<path>      Vosk STT model path
  --speaking-rate=<rate>  TTS speaking rate (0.5-2.0, default: 1.0)

Behavior:
  --no-auto-answer        Don't auto-answer calls
  --greeting=<text>       Custom greeting message
  --silence-timeout=<ms>  Silence timeout (default: 3000ms)
  --headless              Run without GUI

MCP Options:
  --mcp-weather           Enable MCP weather server
  --mcp-fetch             Enable MCP fetch server

Example Usage

VoIP Mode (Answer Calls)

# Basic setup - answers calls from VoIP server
./build/bin/gva-voip-agent \
    --user=agent.ai \
    --server=192.168.1.10 \
    --model=gemma2

# With custom greeting
./build/bin/gva-voip-agent \
    --user=assistant \
    --server=192.168.1.10 \
    --model=llama3.2 \
    --greeting="Hello, this is the vehicle AI assistant. How can I help?"

Standalone Mode (Local Audio)

# Direct mic/speaker interaction
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2

# With custom system prompt
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2 \
    --system-prompt="You are a military vehicle assistant. Be concise and professional."

With MCP Tools

# Enable weather and web fetch tools
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2 \
    --mcp-weather \
    --mcp-fetch

Conversation Flow

stateDiagram-v2 [*] --> Idle: Start Idle --> IncomingCall: Receive INVITE IncomingCall --> InCall: Answer InCall --> Listening: Say Greeting Listening --> Processing: User Speaks Processing --> Speaking: LLM Response Speaking --> Listening: TTS Complete Listening --> HangingUp: "Goodbye" / Timeout HangingUp --> Idle: Call Ended Processing --> ToolCall: Tool Required ToolCall --> Processing: Tool Result

Built-in Tools

The agent registers these tools with the LLM:

Weather

User: "What's the weather in London?"
Agent: "The current weather in London is partly cloudy, 18 degrees Celsius,
        with 65% humidity and light winds from the southwest."

Time

User: "What time is it?"
Agent: "The current time is Sunday, April 13, 2026 at 4:30 PM."

Military Coordinates

User: "Convert 51.5074 north, 0.1278 west to MGRS"
Agent: "The MGRS coordinate is 30U XC 99287 15350."

User: "What's the bearing from my position to coordinates 52.0, -1.0?"
Agent: "The bearing is 342 degrees, or 6044 mils, at a distance of 85 kilometers."

End Call

User: "Goodbye"
Agent: "Goodbye. Ending the call now."
[Call terminates]

MCP Tool Integration

Adding Custom MCP Servers

Edit the agent configuration to add MCP tool servers:

// In your code
McpServerConfig myServer;
myServer.name = "my-tools";
myServer.command = "npx";
myServer.args = {"-y", "@myorg/mcp-tools"};
agent->addMcpServer(myServer);

Available MCP Servers

Server Package Description
Weather @modelcontextprotocol/server-weather Weather forecasts
Fetch @anthropic-ai/fetch-mcp HTTP requests
Filesystem @modelcontextprotocol/server-filesystem File operations
Memory @modelcontextprotocol/server-memory Persistent memory

Auto-Reconnect

MCP servers automatically restart on crash with exponential backoff:

McpServerConfig config;
config.name = "weather";
config.maxRestarts = 3;        // Max restart attempts
config.restartDelayMs = 1000;  // Initial delay (doubles each retry)

Tool Call Timeouts

Tool calls have configurable timeouts to prevent hanging:

mcp->setDefaultTimeout(30000);  // 30 second default
mcp->callTool("slow_tool", args, callback, 60000);  // 60 second override

Voice Models

Voice Language Quality Size
en_US-lessac-medium US English Good 65 MB
en_GB-alba-medium UK English Good 63 MB
en_AU-danny-low AU English Fair 15 MB

Downloading Voices

# US Female (Amy)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx

# UK Male (Alan)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_GB/alan/medium/en_GB-alan-medium.onnx

Troubleshooting

Ollama Not Responding

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Restart Ollama
systemctl restart ollama
# or
ollama serve

No Speech Recognition

  1. Verify Vosk model exists: ls ~/.local/share/vosk/
  2. Check microphone permissions
  3. Test with: arecord -d 3 test.wav && aplay test.wav

TTS Not Working

  1. Check Piper installation: piper --help
  2. Verify voice model: ls ~/.local/share/piper/
  3. Test: echo "Hello" | piper --model ~/.local/share/piper/en_US-lessac-medium.onnx --output_file test.wav

MCP Server Offline

If MCP servers are unavailable, the agent responds with:

"MCP server is offline. The requested tool is not available."

Enable auto-reconnect to handle transient failures.

See Also