A voice chat based on Twilio telephony infrastructure and a conversational interface that supports Swiss German, French and Italian.
Caller → Twilio PSTN → Twilio Media Stream (WebSocket)
↕ mulaw 8kHz
FastAPI server (this app)
↕ PCM 16kHz / 24kHz
Gemini Live API (STT + LLM + TTS)
- Twilio handles the phone number and streams raw audio over WebSocket
- Gemini 2.0 Flash Live is a single model doing STT + conversation + TTS natively
- FastAPI glues them together and handles format conversion (mulaw ↔ PCM)
voice-assistant-demo/
├── src/voice_assistant/
│ ├── app.py # FastAPI app – HTTP + WebSocket endpoints
│ ├── call_handler.py # Per-call audio bridge (Twilio ↔ Gemini)
│ ├── gemini_session.py # Gemini Live API session wrapper
│ ├── audio.py # mulaw/PCM format conversion utilities
│ └── config.py # Settings loaded from .env
│ ├── repl.py # Interactive text REPL for local testing
├── scripts/
│ └── provision_twilio.py # Buy / configure Twilio phone number
├── pyproject.toml
├── .env.example
└── start.sh # Dev launcher (ngrok + server)
| Tool | Purpose |
|---|---|
| uv | Python + dependency management |
| just | Task runner |
| ngrok | Expose localhost to Twilio |
| Twilio account | Phone number + webhooks |
| Google AI Studio key or Google Cloud project | Gemini Live API |
just init-env # copies .env.example → .env (skips if .env exists)
# Edit .env and fill in your credentialsMinimum .env for Google AI Studio (easiest):
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+41xxxxxxxxx
GOOGLE_API_KEY=AIzaSyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
DEFAULT_LANGUAGE=de-CH
For Vertex AI instead of AI Studio, leave GOOGLE_API_KEY empty and set:
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
# Then authenticate: gcloud auth application-default login
just syncuv resolves and installs everything – no manual venv creation needed.
# List numbers already on your account
just twilio-list
# Buy a new Swiss number (run after `just dev` so PUBLIC_URL is in .env)
just twilio-buy # defaults to country=CH
just twilio-buy country=US # fallback if no CH numbers are available
# Or update the webhook on an existing number
just twilio-set-webhook +41XXXXXXXXXNote: Twilio may not always have local Swiss (+41) numbers available. If none are found, try
country=USfor testing, or buy a number manually in the Twilio Console and usetwilio-set-webhook.
just repl # text chat with the agent (default language)
just repl --lang fr-CH # test in French
just repl --verbose # enable debug loggingThe REPL connects to Gemini with the same system prompt and tools as the
voice flow — useful for iterating on agent behaviour without needing a
phone call. Only requires GOOGLE_API_KEY in .env.
just dev # starts ngrok + FastAPI server (full dev flow)start.sh (called by just dev) will:
- Start ngrok and auto-detect the public HTTPS URL
- Write
PUBLIC_URLinto.env - Print the webhook URL to configure in Twilio
- Start the FastAPI server via
uv run
To start without ngrok (when PUBLIC_URL is already set in .env):
just serveCall your Twilio phone number. Gemini will greet you in Swiss German (or whichever
language you set as DEFAULT_LANGUAGE) and you can have a conversation.
| Code | Language | Gemini voice |
|---|---|---|
de-CH |
Swiss German | Leda |
fr-CH |
Swiss French | Aoede |
it-CH |
Swiss Italian | Zephyr |
Change DEFAULT_LANGUAGE in .env to switch the default.
Future work: detect the caller's preferred language via a DTMF IVR menu and
pass it as a custom parameter in the TwiML.
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/voice |
Twilio webhook – returns TwiML |
WS |
/ws/media-stream |
Twilio Media Stream WebSocket |
GET |
/docs |
Swagger UI |
- IVR language selection menu (press 1 for German, 2 for French, 3 for Italian)
- RAG: upload product PDFs → vector store → Gemini function calling
- Calendar integration: check free slots and book appointments via Google Calendar API
- Call recording and transcript logging
- Interruption / barge-in handling
- Docker / Cloud Run deployment
I want to create a customer support hotline solution.
- There should be a phone number that I can call (e.g. Twilio).
- A friendly voice answers in Swiss German, Swiss French a Swiss Italian (e.g. Google Gemini).
- I can ask question and the voice answers politely.
- Later I want to add agent functionality and upload product documents to craft answers from, attach calendars to help with scheduling appointments with a sales person.
Response: see SOLUTION