VoIP Agent¶

The GVA VoIP Agent (gva-voip-agent) is an AI-powered voice assistant that can answer calls and respond using natural language. It integrates with Ollama for LLM inference, Piper for text-to-speech, and Vosk for speech recognition—all running locally without cloud dependencies.

Overview¶

graph LR subgraph "VoIP Agent" SIP[SIP Client] RTP[RTP Audio] STT[Vosk STT] LLM[Ollama LLM] TTS[Piper TTS] MCP[MCP Tools] end subgraph "External" SRV[VoIP Server] OLL[Ollama Server] MCPS[MCP Servers] end SRV <-->|SIP| SIP SRV <-->|RTP| RTP RTP --> STT STT --> LLM LLM --> TTS TTS --> RTP LLM <--> MCP MCP <-->|stdio| MCPS LLM <-->|HTTP| OLL style LLM fill:#9C27B0 style STT fill:#4CAF50 style TTS fill:#2196F3

Features¶

Offline AI¶

All AI processing runs locally:

Ollama - Local LLM server (Gemma, Llama, Mistral, etc.)
Piper - Neural TTS with natural voices
Vosk - Offline speech recognition

MCP Tool Support¶

The agent supports Model Context Protocol (MCP) tools for extended capabilities:

Weather queries
File system access
Web fetching
Custom tool servers

Built-in Military Tools¶

Native C++ tools optimised for military use:

Tool	Description
`mgrs_to_latlon`	Convert MGRS to latitude/longitude
`latlon_to_mgrs`	Convert lat/lon to MGRS
`calculate_bearing`	Calculate bearing in degrees/mils
`calculate_distance`	Distance in meters/km/nautical miles
`format_dtg`	Format Date-Time Group (DTG)

SRTP Encryption¶

All VoIP communications are secured with SRTP encryption (RFC 3711 / RFC 4568) using AES-256-CM-HMAC-SHA1-80—suitable for OFFICIAL-SENSITIVE classification. Encryption is always enabled and cannot be disabled.

Standalone Mode¶

Run without VoIP—uses local microphone and speakers:

./build/bin/gva-voip-agent --standalone --model=gemma2

Prerequisites¶

Ollama¶

Install and start Ollama with a model:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull gemma2

# Start server (runs on localhost:11434)
ollama serve

Piper TTS¶

Install Piper and download voice models:

# Install via pipx
pipx install piper-tts

# Download voice model
mkdir -p ~/.local/share/piper
cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

Vosk STT¶

Download a Vosk model:

# Download model
mkdir -p ~/.local/share/vosk
cd ~/.local/share/vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 en-us

Command Line Options¶

gva-voip-agent [options]

Mode:
  --standalone            Use local mic/speaker instead of VoIP

SIP Options (VoIP mode):
  -u, --user=<username>   SIP username (default: agent.gemma4)
  -s, --server=<host>     SIP server address (default: 127.0.0.1)
  -p, --port=<port>       SIP server port (default: 5060)
  --local-port=<port>     Local SIP port (default: 5062)
  --rtp-port=<port>       Local RTP port (default: 10002)
  --display-name=<name>   SIP display name

LLM Options:
  -m, --model=<name>      Ollama model name (default: gemma2)
  --ollama-url=<url>      Ollama server URL (default: http://localhost:11434)
  --system-prompt=<text>  Custom system prompt

TTS/STT Options:
  --voice=<path>          Piper voice model path
  --stt-model=<path>      Vosk STT model path
  --speaking-rate=<rate>  TTS speaking rate (0.5-2.0, default: 1.0)

Behavior:
  --no-auto-answer        Don't auto-answer calls
  --greeting=<text>       Custom greeting message
  --silence-timeout=<ms>  Silence timeout (default: 3000ms)
  --headless              Run without GUI

MCP Options:
  --mcp-endpoint=<url>    MCP HTTP endpoint (default: http://127.0.0.1:7077/mcp)

Example Usage¶

VoIP Mode (Answer Calls)¶

# Basic setup - answers calls from VoIP server
./build/bin/gva-voip-agent \
    --user=agent.ai \
    --server=192.168.1.10 \
    --model=gemma2

# With custom greeting
./build/bin/gva-voip-agent \
    --user=assistant \
    --server=192.168.1.10 \
    --model=llama3.2 \
    --greeting="Hello, this is the vehicle AI assistant. How can I help?"

Standalone Mode (Local Audio)¶

# Direct mic/speaker interaction
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2

# With custom system prompt
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2 \
    --system-prompt="You are a military vehicle assistant. Be concise and professional."

With MCP Tools¶

gva-voip-agent consumes every non-local tool (weather, time, MGRS, bearing/distance, DTG, bash, http_get, fs, sqlite, postgres, …) from gva-mcp-server over HTTP. Start the server first, then point the agent at it.

# Terminal 1 — the tool host (or use the systemd unit gva-mcp-server.service)
./build/bin/gva-mcp-server \
    --config etc/gva-mcp-server-ollama.json \
    --no-stdio --http --port 7077

# Terminal 2 — the agent. Default endpoint already matches; flag shown for
# clarity / when overriding host/port.
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2 \
    --mcp-endpoint=http://127.0.0.1:7077/mcp

The only inline tool in the agent is end_call (it needs the live SIP session). Everything else is discovered via tools/list at startup and routed through mcp::McpHttpClient::callTool.

Conversation Flow¶

stateDiagram-v2 [*] --> Idle: Start Idle --> IncomingCall: Receive INVITE IncomingCall --> InCall: Answer InCall --> Listening: Say Greeting Listening --> Processing: User Speaks Processing --> Speaking: LLM Response Speaking --> Listening: TTS Complete Listening --> HangingUp: "Goodbye" / Timeout HangingUp --> Idle: Call Ended Processing --> ToolCall: Tool Required ToolCall --> Processing: Tool Result

Built-in Tools¶

The agent registers these tools with the LLM:

Weather¶

User: "What's the weather in London?"
Agent: "The current weather in London is partly cloudy, 18 degrees Celsius,
        with 65% humidity and light winds from the southwest."

Time¶

User: "What time is it?"
Agent: "The current time is Sunday, April 13, 2026 at 4:30 PM."

Military Coordinates¶

User: "Convert 51.5074 north, 0.1278 west to MGRS"
Agent: "The MGRS coordinate is 30U XC 99287 15350."

User: "What's the bearing from my position to coordinates 52.0, -1.0?"
Agent: "The bearing is 342 degrees, or 6044 mils, at a distance of 85 kilometers."

End Call¶

User: "Goodbye"
Agent: "Goodbye. Ending the call now."
[Call terminates]

MCP Tool Integration¶

Architecture¶

All non-local tools live in gva-mcp-server and are consumed over HTTP. The agent owns a single mcp::McpHttpClient pointed at --mcp-endpoint and:

Connects and runs initialize + tools/list at start-up.
Registers each returned tool with OllamaClient::registerTool.
Forwards every tool dispatch (other than end_call) to m_mcpHttp->callTool(name, args, …) and returns the joined content[].text blocks to the LLM.

No subprocess management, no per-server config in the agent, no duplicated tool definitions. Add a tool in src/qt6/mcp-lib/ or src/qt6/gva-mcp-server/GvaTools.cpp; restart the server; the agent picks it up on its next connection.

See docs/dev/integrations/MCP_AND_VOIP_AGENT_TOOLS.md in the repo for the full breakdown.

Degraded mode¶

If gva-mcp-server is unreachable, end_call still works, and any other tool call returns "Vehicle services are offline. The requested tool is not available." to the LLM. No restart of the agent is required when the server comes back up.

Voice Models¶

Recommended Piper Voices¶

Voice	Language	Quality	Size
en_US-lessac-medium	US English	Good	65 MB
en_GB-alba-medium	UK English	Good	63 MB
en_AU-danny-low	AU English	Fair	15 MB

Downloading Voices¶

# US Female (Amy)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx

# UK Male (Alan)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_GB/alan/medium/en_GB-alan-medium.onnx

Troubleshooting¶

Ollama Not Responding¶

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Restart Ollama
systemctl restart ollama
# or
ollama serve

No Speech Recognition¶

Verify Vosk model exists: ls ~/.local/share/vosk/
Check microphone permissions
Test with: arecord -d 3 test.wav && aplay test.wav

TTS Not Working¶

Check Piper installation: piper --help
Verify voice model: ls ~/.local/share/piper/
Test: echo "Hello" | piper --model ~/.local/share/piper/en_US-lessac-medium.onnx --output_file test.wav

MCP Server Offline¶

If MCP servers are unavailable, the agent responds with:

"MCP server is offline. The requested tool is not available."

Enable auto-reconnect to handle transient failures.