VoIP Agent

The GVA VoIP Agent (gva-voip-agent) is an AI-powered voice assistant that can answer calls and respond using natural language. It integrates with Ollama for LLM inference, Piper for text-to-speech, and Vosk for speech recognition—all running locally without cloud dependencies.

Overview

graph LR subgraph "VoIP Agent" SIP[SIP Client] RTP[RTP Audio] STT[Vosk STT] LLM[Ollama LLM] TTS[Piper TTS] MCP[MCP Tools] end subgraph "External" SRV[VoIP Server] OLL[Ollama Server] MCPS[MCP Servers] end SRV <-->|SIP| SIP SRV <-->|RTP| RTP RTP --> STT STT --> LLM LLM --> TTS TTS --> RTP LLM <--> MCP MCP <-->|stdio| MCPS LLM <-->|HTTP| OLL style LLM fill:#9C27B0 style STT fill:#4CAF50 style TTS fill:#2196F3

Features

Offline AI

All AI processing runs locally:

  • Ollama - Local LLM server (Gemma, Llama, Mistral, etc.)
  • Piper - Neural TTS with natural voices
  • Vosk - Offline speech recognition

MCP Tool Support

The agent supports Model Context Protocol (MCP) tools for extended capabilities:

  • Weather queries
  • File system access
  • Web fetching
  • Custom tool servers

Built-in Military Tools

Native C++ tools optimised for military use:

Tool Description
mgrs_to_latlon Convert MGRS to latitude/longitude
latlon_to_mgrs Convert lat/lon to MGRS
calculate_bearing Calculate bearing in degrees/mils
calculate_distance Distance in meters/km/nautical miles
format_dtg Format Date-Time Group (DTG)

SRTP Encryption

All VoIP communications are secured with SRTP encryption (RFC 3711 / RFC 4568) using AES-256-CM-HMAC-SHA1-80—suitable for OFFICIAL-SENSITIVE classification. Encryption is always enabled and cannot be disabled.

Standalone Mode

Run without VoIP—uses local microphone and speakers:

./build/bin/gva-voip-agent --standalone --model=gemma2

Prerequisites

Ollama

Install and start Ollama with a model:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull gemma2

# Start server (runs on localhost:11434)
ollama serve

Piper TTS

Install Piper and download voice models:

# Install via pipx
pipx install piper-tts

# Download voice model
mkdir -p ~/.local/share/piper
cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

Vosk STT

Download a Vosk model:

# Download model
mkdir -p ~/.local/share/vosk
cd ~/.local/share/vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 en-us

Command Line Options

gva-voip-agent [options]

Mode:
  --standalone            Use local mic/speaker instead of VoIP

SIP Options (VoIP mode):
  -u, --user=<username>   SIP username (default: agent.gemma4)
  -s, --server=<host>     SIP server address (default: 127.0.0.1)
  -p, --port=<port>       SIP server port (default: 5060)
  --local-port=<port>     Local SIP port (default: 5062)
  --rtp-port=<port>       Local RTP port (default: 10002)
  --display-name=<name>   SIP display name

LLM Options:
  -m, --model=<name>      Ollama model name (default: gemma2)
  --ollama-url=<url>      Ollama server URL (default: http://localhost:11434)
  --system-prompt=<text>  Custom system prompt

TTS/STT Options:
  --voice=<path>          Piper voice model path
  --stt-model=<path>      Vosk STT model path
  --speaking-rate=<rate>  TTS speaking rate (0.5-2.0, default: 1.0)

Behavior:
  --no-auto-answer        Don't auto-answer calls
  --greeting=<text>       Custom greeting message
  --silence-timeout=<ms>  Silence timeout (default: 3000ms)
  --headless              Run without GUI

MCP Options:
  --mcp-endpoint=<url>    MCP HTTP endpoint (default: http://127.0.0.1:7077/mcp)

Example Usage

VoIP Mode (Answer Calls)

# Basic setup - answers calls from VoIP server
./build/bin/gva-voip-agent \
    --user=agent.ai \
    --server=192.168.1.10 \
    --model=gemma2

# With custom greeting
./build/bin/gva-voip-agent \
    --user=assistant \
    --server=192.168.1.10 \
    --model=llama3.2 \
    --greeting="Hello, this is the vehicle AI assistant. How can I help?"

Standalone Mode (Local Audio)

# Direct mic/speaker interaction
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2

# With custom system prompt
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2 \
    --system-prompt="You are a military vehicle assistant. Be concise and professional."

With MCP Tools

gva-voip-agent consumes every non-local tool (weather, time, MGRS, bearing/distance, DTG, bash, http_get, fs, sqlite, postgres, …) from gva-mcp-server over HTTP. Start the server first, then point the agent at it.

# Terminal 1 — the tool host (or use the systemd unit gva-mcp-server.service)
./build/bin/gva-mcp-server \
    --config etc/gva-mcp-server-ollama.json \
    --no-stdio --http --port 7077

# Terminal 2 — the agent. Default endpoint already matches; flag shown for
# clarity / when overriding host/port.
./build/bin/gva-voip-agent \
    --standalone \
    --model=gemma2 \
    --mcp-endpoint=http://127.0.0.1:7077/mcp

The only inline tool in the agent is end_call (it needs the live SIP session). Everything else is discovered via tools/list at startup and routed through mcp::McpHttpClient::callTool.

Conversation Flow

stateDiagram-v2 [*] --> Idle: Start Idle --> IncomingCall: Receive INVITE IncomingCall --> InCall: Answer InCall --> Listening: Say Greeting Listening --> Processing: User Speaks Processing --> Speaking: LLM Response Speaking --> Listening: TTS Complete Listening --> HangingUp: "Goodbye" / Timeout HangingUp --> Idle: Call Ended Processing --> ToolCall: Tool Required ToolCall --> Processing: Tool Result

Built-in Tools

The agent registers these tools with the LLM:

Weather

User: "What's the weather in London?"
Agent: "The current weather in London is partly cloudy, 18 degrees Celsius,
        with 65% humidity and light winds from the southwest."

Time

User: "What time is it?"
Agent: "The current time is Sunday, April 13, 2026 at 4:30 PM."

Military Coordinates

User: "Convert 51.5074 north, 0.1278 west to MGRS"
Agent: "The MGRS coordinate is 30U XC 99287 15350."

User: "What's the bearing from my position to coordinates 52.0, -1.0?"
Agent: "The bearing is 342 degrees, or 6044 mils, at a distance of 85 kilometers."

End Call

User: "Goodbye"
Agent: "Goodbye. Ending the call now."
[Call terminates]

MCP Tool Integration

Architecture

All non-local tools live in gva-mcp-server and are consumed over HTTP. The agent owns a single mcp::McpHttpClient pointed at --mcp-endpoint and:

  1. Connects and runs initialize + tools/list at start-up.
  2. Registers each returned tool with OllamaClient::registerTool.
  3. Forwards every tool dispatch (other than end_call) to m_mcpHttp->callTool(name, args, …) and returns the joined content[].text blocks to the LLM.

No subprocess management, no per-server config in the agent, no duplicated tool definitions. Add a tool in src/qt6/mcp-lib/ or src/qt6/gva-mcp-server/GvaTools.cpp; restart the server; the agent picks it up on its next connection.

See docs/dev/integrations/MCP_AND_VOIP_AGENT_TOOLS.md in the repo for the full breakdown.

Degraded mode

If gva-mcp-server is unreachable, end_call still works, and any other tool call returns "Vehicle services are offline. The requested tool is not available." to the LLM. No restart of the agent is required when the server comes back up.

Voice Models

Voice Language Quality Size
en_US-lessac-medium US English Good 65 MB
en_GB-alba-medium UK English Good 63 MB
en_AU-danny-low AU English Fair 15 MB

Downloading Voices

# US Female (Amy)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx

# UK Male (Alan)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_GB/alan/medium/en_GB-alan-medium.onnx

Troubleshooting

Ollama Not Responding

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Restart Ollama
systemctl restart ollama
# or
ollama serve

No Speech Recognition

  1. Verify Vosk model exists: ls ~/.local/share/vosk/
  2. Check microphone permissions
  3. Test with: arecord -d 3 test.wav && aplay test.wav

TTS Not Working

  1. Check Piper installation: piper --help
  2. Verify voice model: ls ~/.local/share/piper/
  3. Test: echo "Hello" | piper --model ~/.local/share/piper/en_US-lessac-medium.onnx --output_file test.wav

MCP Server Offline

If MCP servers are unavailable, the agent responds with:

"MCP server is offline. The requested tool is not available."

Enable auto-reconnect to handle transient failures.

See Also