VoIP Agent¶
The GVA VoIP Agent (gva-voip-agent) is an AI-powered voice assistant that can answer
calls and respond using natural language. It integrates with Ollama for LLM inference,
Piper for text-to-speech, and Vosk for speech recognition—all running locally without
cloud dependencies.
Overview¶
Features¶
Offline AI¶
All AI processing runs locally:
- Ollama - Local LLM server (Gemma, Llama, Mistral, etc.)
- Piper - Neural TTS with natural voices
- Vosk - Offline speech recognition
MCP Tool Support¶
The agent supports Model Context Protocol (MCP) tools for extended capabilities:
- Weather queries
- File system access
- Web fetching
- Custom tool servers
Built-in Military Tools¶
Native C++ tools optimised for military use:
| Tool | Description |
|---|---|
mgrs_to_latlon |
Convert MGRS to latitude/longitude |
latlon_to_mgrs |
Convert lat/lon to MGRS |
calculate_bearing |
Calculate bearing in degrees/mils |
calculate_distance |
Distance in meters/km/nautical miles |
format_dtg |
Format Date-Time Group (DTG) |
SRTP Encryption¶
All VoIP communications are secured with SRTP encryption (RFC 3711 / RFC 4568) using AES-256-CM-HMAC-SHA1-80—suitable for OFFICIAL-SENSITIVE classification. Encryption is always enabled and cannot be disabled.
Standalone Mode¶
Run without VoIP—uses local microphone and speakers:
Prerequisites¶
Ollama¶
Install and start Ollama with a model:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull gemma2
# Start server (runs on localhost:11434)
ollama serve
Piper TTS¶
Install Piper and download voice models:
# Install via pipx
pipx install piper-tts
# Download voice model
mkdir -p ~/.local/share/piper
cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json
Vosk STT¶
Download a Vosk model:
# Download model
mkdir -p ~/.local/share/vosk
cd ~/.local/share/vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 en-us
Command Line Options¶
gva-voip-agent [options]
Mode:
--standalone Use local mic/speaker instead of VoIP
SIP Options (VoIP mode):
-u, --user=<username> SIP username (default: agent.gemma4)
-s, --server=<host> SIP server address (default: 127.0.0.1)
-p, --port=<port> SIP server port (default: 5060)
--local-port=<port> Local SIP port (default: 5062)
--rtp-port=<port> Local RTP port (default: 10002)
--display-name=<name> SIP display name
LLM Options:
-m, --model=<name> Ollama model name (default: gemma2)
--ollama-url=<url> Ollama server URL (default: http://localhost:11434)
--system-prompt=<text> Custom system prompt
TTS/STT Options:
--voice=<path> Piper voice model path
--stt-model=<path> Vosk STT model path
--speaking-rate=<rate> TTS speaking rate (0.5-2.0, default: 1.0)
Behavior:
--no-auto-answer Don't auto-answer calls
--greeting=<text> Custom greeting message
--silence-timeout=<ms> Silence timeout (default: 3000ms)
--headless Run without GUI
MCP Options:
--mcp-endpoint=<url> MCP HTTP endpoint (default: http://127.0.0.1:7077/mcp)
Example Usage¶
VoIP Mode (Answer Calls)¶
# Basic setup - answers calls from VoIP server
./build/bin/gva-voip-agent \
--user=agent.ai \
--server=192.168.1.10 \
--model=gemma2
# With custom greeting
./build/bin/gva-voip-agent \
--user=assistant \
--server=192.168.1.10 \
--model=llama3.2 \
--greeting="Hello, this is the vehicle AI assistant. How can I help?"
Standalone Mode (Local Audio)¶
# Direct mic/speaker interaction
./build/bin/gva-voip-agent \
--standalone \
--model=gemma2
# With custom system prompt
./build/bin/gva-voip-agent \
--standalone \
--model=gemma2 \
--system-prompt="You are a military vehicle assistant. Be concise and professional."
With MCP Tools¶
gva-voip-agent consumes every non-local tool (weather, time, MGRS,
bearing/distance, DTG, bash, http_get, fs, sqlite, postgres, …) from
gva-mcp-server over HTTP. Start the server first, then point the agent at it.
# Terminal 1 — the tool host (or use the systemd unit gva-mcp-server.service)
./build/bin/gva-mcp-server \
--config etc/gva-mcp-server-ollama.json \
--no-stdio --http --port 7077
# Terminal 2 — the agent. Default endpoint already matches; flag shown for
# clarity / when overriding host/port.
./build/bin/gva-voip-agent \
--standalone \
--model=gemma2 \
--mcp-endpoint=http://127.0.0.1:7077/mcp
The only inline tool in the agent is end_call (it needs the live SIP
session). Everything else is discovered via tools/list at startup and
routed through mcp::McpHttpClient::callTool.
Conversation Flow¶
Built-in Tools¶
The agent registers these tools with the LLM:
Weather¶
User: "What's the weather in London?"
Agent: "The current weather in London is partly cloudy, 18 degrees Celsius,
with 65% humidity and light winds from the southwest."
Time¶
Military Coordinates¶
User: "Convert 51.5074 north, 0.1278 west to MGRS"
Agent: "The MGRS coordinate is 30U XC 99287 15350."
User: "What's the bearing from my position to coordinates 52.0, -1.0?"
Agent: "The bearing is 342 degrees, or 6044 mils, at a distance of 85 kilometers."
End Call¶
MCP Tool Integration¶
Architecture¶
All non-local tools live in gva-mcp-server and are consumed over HTTP. The
agent owns a single mcp::McpHttpClient pointed at --mcp-endpoint and:
- Connects and runs
initialize+tools/listat start-up. - Registers each returned tool with
OllamaClient::registerTool. - Forwards every tool dispatch (other than
end_call) tom_mcpHttp->callTool(name, args, …)and returns the joinedcontent[].textblocks to the LLM.
No subprocess management, no per-server config in the agent, no duplicated tool
definitions. Add a tool in src/qt6/mcp-lib/ or
src/qt6/gva-mcp-server/GvaTools.cpp;
restart the server; the agent picks it up on its next connection.
See docs/dev/integrations/MCP_AND_VOIP_AGENT_TOOLS.md in the repo
for the full breakdown.
Degraded mode¶
If gva-mcp-server is unreachable, end_call still works, and any other tool
call returns "Vehicle services are offline. The requested tool is not
available." to the LLM. No restart of the agent is required when the server
comes back up.
Voice Models¶
Recommended Piper Voices¶
| Voice | Language | Quality | Size |
|---|---|---|---|
| en_US-lessac-medium | US English | Good | 65 MB |
| en_GB-alba-medium | UK English | Good | 63 MB |
| en_AU-danny-low | AU English | Fair | 15 MB |
Downloading Voices¶
# US Female (Amy)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx
# UK Male (Alan)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_GB/alan/medium/en_GB-alan-medium.onnx
Troubleshooting¶
Ollama Not Responding¶
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Restart Ollama
systemctl restart ollama
# or
ollama serve
No Speech Recognition¶
- Verify Vosk model exists:
ls ~/.local/share/vosk/ - Check microphone permissions
- Test with:
arecord -d 3 test.wav && aplay test.wav
TTS Not Working¶
- Check Piper installation:
piper --help - Verify voice model:
ls ~/.local/share/piper/ - Test:
echo "Hello" | piper --model ~/.local/share/piper/en_US-lessac-medium.onnx --output_file test.wav
MCP Server Offline¶
If MCP servers are unavailable, the agent responds with:
Enable auto-reconnect to handle transient failures.