Voice, Chat & Messaging
One continuity layer across the chat web app, real-time voice, Wing Mode, Listen-Only Mode, Telegram text, and Telegram voice.
One System, Different Surfaces
Viventium should not feel like a different assistant every time you switch surfaces.
The same cognitive system is meant to show up across:
- the chat web app for longer visible reasoning
- real-time voice calls for spoken conversation
- Wing Mode inside a live voice call when you want a quieter companion
- Listen-Only Mode when you want transcription without replies
- Telegram for mobile text, voice notes, voice replies, reminders, and worker callbacks
- scheduled delivery when a result should come to you later
Chat Web App
The chat web app is Viventium's main desktop surface.
Use it when you need:
- longer reasoning
- visible structure
- connected accounts
- uploaded files and tool calls
- project review
- drafts, plans, and artifacts you want to inspect closely
- background-agent follow-through after the first answer
Chat is where the full shape of the system is easiest to see: memory, tools, agents, connected workspaces, scheduling, and worker handoffs all meet in one visible conversation.
Voice Gateway
Voice matters because a lot of important thinking happens faster out loud than through typing.
Viventium's voice surface is built around the Voice Gateway so spoken turns can use the standard Viventium agent pipeline. The goal is not a separate "voice bot." The goal is the same assistant, same continuity, and same follow-through in a real-time spoken surface.
What users should notice:
- Natural interruption - you can interrupt mid-sentence like a real conversation
- Shared continuity - voice can use the same memory, background agents, and context story as chat
- Provider choice - local and hosted speech routes are explicit choices
- Speech-safe output - the voice layer strips or cleans raw URLs, markdown links, citations, code fences, tables, unknown tags, and punctuation fragments before they become spoken audio
- Provider-aware expression - expressive markers are kept only for providers that support them
Voice can run fully local when both speech-to-text and text-to-speech use local routes such as Whisper.cpp and Chatterbox. Hosted providers such as Cartesia, xAI, ElevenLabs, OpenAI, or AssemblyAI are explicit provider choices; when selected, the relevant audio or text goes to that provider.
For builders: the current implementation uses a real-time media layer under the Voice Gateway, while the chat web app is built on Viventium's web-chat fork. Those implementation names are useful when debugging or contributing, but the user-facing promise is the Viventium voice and chat experience.
Wing Mode
Wing Mode is a live-call companion mode.
It is for situations where you want Viventium present in the voice call, but not constantly talking. The assistant defaults to silence unless it is clearly addressed, genuinely useful, or there is an urgent reason to speak.
In user terms: Wing Mode is the difference between "answer every sound" and "be a thoughtful partner in the room."
It is not a generic always-on background microphone. It is a call-session state for the voice surface, and it is mutually exclusive with Listen-Only Mode.
Listen-Only Mode
Listen-Only Mode is for transcription and capture without assistant participation.
When it is active, Viventium transcribes and saves the ambient voice record but does not:
- reply in the moment
- call tools
- activate background agents
- write immediate memory
- inject the transcript into normal recall or prompt history as a normal chat turn
That is useful for meetings, brainstorming, or context gathering where you want a record first and assistance later.
Expressive Speech
Different voice providers support different kinds of expression.
- Cartesia Sonic-3 can preserve supported emotion controls and the documented
[laughter]marker, so expressive speech can come through when the selected voice route supports it. - xAI voice uses its own speech-style controls, such as pauses or vocal delivery cues, and should not be mixed with Cartesia emotion tags.
- Local Chatterbox keeps voice local, but does not use Cartesia-style emotion tags.
- Fallback providers strip unsupported markup so users hear clean speech instead of raw control text.
The important user benefit is simple: Viventium tries to make spoken output sound intentional without letting provider-specific markup leak into the conversation.
Telegram
Telegram is the mobile continuity surface.
It is for:
- quick text check-ins
- voice notes from your phone
- voice replies back to you
- scheduled briefings and reminders
- follow-up while away from your desk
- GlassHive worker callbacks and approval moments
- background-agent follow-through on mobile
Telegram should not feel like a separate lightweight bot. It should feel like the same Viventium meeting you on the surface you already have open.
Which Surface Fits Which Job
| Surface | Best for |
|---|---|
| Chat web app | long-form reasoning, plans, drafts, project review, files, tool calls |
| Voice | fast thinking out loud, live decisions, walking through ideas |
| Wing Mode | quiet companion presence inside a live voice call |
| Listen-Only Mode | transcription and capture without response or tool use |
| Telegram | mobile continuity, voice notes, voice replies, reminders, briefings |
The Real Promise
The important promise is not "many channels."
The important promise is:
- one memory story
- one background-intelligence story
- one project and follow-through story
- one system that can meet you where you already are
Keep Reading
- Core Services - The full service map
- Connected Workspaces - What these surfaces can pull from
- Scheduling - How proactive delivery fits the same system
- Background Agents - What deeper help can happen behind the scenes
- Architecture Overview - Where voice, chat, and messaging fit in the full system