VoIP Agent¶
The GVA VoIP Agent (gva-voip-agent) is an AI-powered voice assistant that can answer
calls and respond using natural language. It integrates with Ollama for LLM inference,
Piper for text-to-speech, and Vosk for speech recognition—all running locally without
cloud dependencies.
Overview¶
Features¶
Offline AI¶
All AI processing runs locally:
- Ollama - Local LLM server (Gemma, Llama, Mistral, etc.)
- Piper - Neural TTS with natural voices
- Vosk - Offline speech recognition
MCP Tool Support¶
The agent supports Model Context Protocol (MCP) tools for extended capabilities:
- Weather queries
- File system access
- Web fetching
- Custom tool servers
Built-in Military Tools¶
Native C++ tools optimised for military use:
| Tool | Description |
|---|---|
mgrs_to_latlon |
Convert MGRS to latitude/longitude |
latlon_to_mgrs |
Convert lat/lon to MGRS |
calculate_bearing |
Calculate bearing in degrees/mils |
calculate_distance |
Distance in meters/km/nautical miles |
format_dtg |
Format Date-Time Group (DTG) |
SRTP Encryption¶
All VoIP communications are secured with SRTP encryption (RFC 3711 / RFC 4568) using AES-256-CM-HMAC-SHA1-80—suitable for OFFICIAL-SENSITIVE classification. Encryption is always enabled and cannot be disabled.
Standalone Mode¶
Run without VoIP—uses local microphone and speakers:
Prerequisites¶
Ollama¶
Install and start Ollama with a model:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull gemma2
# Start server (runs on localhost:11434)
ollama serve
Piper TTS¶
Install Piper and download voice models:
# Install via pipx
pipx install piper-tts
# Download voice model
mkdir -p ~/.local/share/piper
cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json
Vosk STT¶
Download a Vosk model:
# Download model
mkdir -p ~/.local/share/vosk
cd ~/.local/share/vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 en-us
Command Line Options¶
gva-voip-agent [options]
Mode:
--standalone Use local mic/speaker instead of VoIP
SIP Options (VoIP mode):
-u, --user=<username> SIP username (default: agent.gemma4)
-s, --server=<host> SIP server address (default: 127.0.0.1)
-p, --port=<port> SIP server port (default: 5060)
--local-port=<port> Local SIP port (default: 5062)
--rtp-port=<port> Local RTP port (default: 10002)
--display-name=<name> SIP display name
LLM Options:
-m, --model=<name> Ollama model name (default: gemma2)
--ollama-url=<url> Ollama server URL (default: http://localhost:11434)
--system-prompt=<text> Custom system prompt
TTS/STT Options:
--voice=<path> Piper voice model path
--stt-model=<path> Vosk STT model path
--speaking-rate=<rate> TTS speaking rate (0.5-2.0, default: 1.0)
Behavior:
--no-auto-answer Don't auto-answer calls
--greeting=<text> Custom greeting message
--silence-timeout=<ms> Silence timeout (default: 3000ms)
--headless Run without GUI
MCP Options:
--mcp-weather Enable MCP weather server
--mcp-fetch Enable MCP fetch server
Example Usage¶
VoIP Mode (Answer Calls)¶
# Basic setup - answers calls from VoIP server
./build/bin/gva-voip-agent \
--user=agent.ai \
--server=192.168.1.10 \
--model=gemma2
# With custom greeting
./build/bin/gva-voip-agent \
--user=assistant \
--server=192.168.1.10 \
--model=llama3.2 \
--greeting="Hello, this is the vehicle AI assistant. How can I help?"
Standalone Mode (Local Audio)¶
# Direct mic/speaker interaction
./build/bin/gva-voip-agent \
--standalone \
--model=gemma2
# With custom system prompt
./build/bin/gva-voip-agent \
--standalone \
--model=gemma2 \
--system-prompt="You are a military vehicle assistant. Be concise and professional."
With MCP Tools¶
# Enable weather and web fetch tools
./build/bin/gva-voip-agent \
--standalone \
--model=gemma2 \
--mcp-weather \
--mcp-fetch
Conversation Flow¶
Built-in Tools¶
The agent registers these tools with the LLM:
Weather¶
User: "What's the weather in London?"
Agent: "The current weather in London is partly cloudy, 18 degrees Celsius,
with 65% humidity and light winds from the southwest."
Time¶
Military Coordinates¶
User: "Convert 51.5074 north, 0.1278 west to MGRS"
Agent: "The MGRS coordinate is 30U XC 99287 15350."
User: "What's the bearing from my position to coordinates 52.0, -1.0?"
Agent: "The bearing is 342 degrees, or 6044 mils, at a distance of 85 kilometers."
End Call¶
MCP Tool Integration¶
Adding Custom MCP Servers¶
Edit the agent configuration to add MCP tool servers:
// In your code
McpServerConfig myServer;
myServer.name = "my-tools";
myServer.command = "npx";
myServer.args = {"-y", "@myorg/mcp-tools"};
agent->addMcpServer(myServer);
Available MCP Servers¶
| Server | Package | Description |
|---|---|---|
| Weather | @modelcontextprotocol/server-weather |
Weather forecasts |
| Fetch | @anthropic-ai/fetch-mcp |
HTTP requests |
| Filesystem | @modelcontextprotocol/server-filesystem |
File operations |
| Memory | @modelcontextprotocol/server-memory |
Persistent memory |
Auto-Reconnect¶
MCP servers automatically restart on crash with exponential backoff:
McpServerConfig config;
config.name = "weather";
config.maxRestarts = 3; // Max restart attempts
config.restartDelayMs = 1000; // Initial delay (doubles each retry)
Tool Call Timeouts¶
Tool calls have configurable timeouts to prevent hanging:
mcp->setDefaultTimeout(30000); // 30 second default
mcp->callTool("slow_tool", args, callback, 60000); // 60 second override
Voice Models¶
Recommended Piper Voices¶
| Voice | Language | Quality | Size |
|---|---|---|---|
| en_US-lessac-medium | US English | Good | 65 MB |
| en_GB-alba-medium | UK English | Good | 63 MB |
| en_AU-danny-low | AU English | Fair | 15 MB |
Downloading Voices¶
# US Female (Amy)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx
# UK Male (Alan)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_GB/alan/medium/en_GB-alan-medium.onnx
Troubleshooting¶
Ollama Not Responding¶
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Restart Ollama
systemctl restart ollama
# or
ollama serve
No Speech Recognition¶
- Verify Vosk model exists:
ls ~/.local/share/vosk/ - Check microphone permissions
- Test with:
arecord -d 3 test.wav && aplay test.wav
TTS Not Working¶
- Check Piper installation:
piper --help - Verify voice model:
ls ~/.local/share/piper/ - Test:
echo "Hello" | piper --model ~/.local/share/piper/en_US-lessac-medium.onnx --output_file test.wav
MCP Server Offline¶
If MCP servers are unavailable, the agent responds with:
Enable auto-reconnect to handle transient failures.