Voice, Chat & Messaging

Docs

One continuity layer across the chat web app, real-time voice, Wing Mode, Listen-Only Mode, Telegram text, and Telegram voice.

One System, Different Surfaces

Viventium should not feel like a different assistant every time you switch surfaces.

The same cognitive system is meant to show up across:

the chat web app for longer visible reasoning
real-time voice calls for spoken conversation
Wing Mode inside a live voice call when you want a quieter companion
Listen-Only Mode when you want transcription without replies
Telegram for mobile text, voice notes, voice replies, reminders, and worker callbacks
scheduled delivery when a result should come to you later

Chat Web App

The chat web app is Viventium's main desktop surface.

Use it when you need:

longer reasoning
visible structure
connected accounts
uploaded files and tool calls
project review
drafts, plans, and artifacts you want to inspect closely
background-agent follow-through after the first answer

Chat is where the full shape of the system is easiest to see: memory, tools, agents, connected workspaces, scheduling, and worker handoffs all meet in one visible conversation.

Voice Gateway

Voice matters because a lot of important thinking happens faster out loud than through typing.

Viventium's voice surface is built around the Voice Gateway so spoken turns can use the standard Viventium agent pipeline. The goal is not a separate "voice bot." The goal is the same assistant, same continuity, and same follow-through in a real-time spoken surface.

What users should notice:

Natural interruption - you can interrupt mid-sentence like a real conversation
Shared continuity - voice can use the same memory, background agents, and context story as chat
Provider choice - local and hosted speech routes are explicit choices
Speech-safe output - the voice layer strips or cleans raw URLs, markdown links, citations, code fences, tables, unknown tags, and punctuation fragments before they become spoken audio
Provider-aware expression - expressive markers are kept only for providers that support them

Voice privacy and provider choice

Voice can run fully local when both speech-to-text and text-to-speech use local routes such as Whisper.cpp and Chatterbox. Hosted providers such as Cartesia, xAI, ElevenLabs, OpenAI, or AssemblyAI are explicit provider choices; when selected, the relevant audio or text goes to that provider.

For builders: the current implementation uses a real-time media layer under the Voice Gateway, while the chat web app is built on Viventium's web-chat fork. Those implementation names are useful when debugging or contributing, but the user-facing promise is the Viventium voice and chat experience.

Wing Mode

Wing Mode is a live-call companion mode.

It is for situations where you want Viventium present in the voice call, but not constantly talking. The assistant defaults to silence unless it is clearly addressed, genuinely useful, or there is an urgent reason to speak.

In user terms: Wing Mode is the difference between "answer every sound" and "be a thoughtful partner in the room."

It is not a generic always-on background microphone. It is a call-session state for the voice surface, and it is mutually exclusive with Listen-Only Mode.

Listen-Only Mode

Listen-Only Mode is for transcription and capture without assistant participation.

When it is active, Viventium transcribes and saves the ambient voice record but does not:

reply in the moment
call tools
activate background agents
write immediate memory
inject the transcript into normal recall or prompt history as a normal chat turn

That is useful for meetings, brainstorming, or context gathering where you want a record first and assistance later.

Expressive Speech

Different voice providers support different kinds of expression.

Cartesia Sonic-3 can preserve supported emotion controls and the documented [laughter] marker, so expressive speech can come through when the selected voice route supports it.
xAI voice uses its own speech-style controls, such as pauses or vocal delivery cues, and should not be mixed with Cartesia emotion tags.
Local Chatterbox keeps voice local, but does not use Cartesia-style emotion tags.
Fallback providers strip unsupported markup so users hear clean speech instead of raw control text.

The important user benefit is simple: Viventium tries to make spoken output sound intentional without letting provider-specific markup leak into the conversation.

Telegram is the mobile continuity surface.

It is for:

quick text check-ins
voice notes from your phone
voice replies back to you
scheduled briefings and reminders
follow-up while away from your desk
GlassHive worker callbacks and approval moments
background-agent follow-through on mobile

Telegram should not feel like a separate lightweight bot. It should feel like the same Viventium meeting you on the surface you already have open.

Which Surface Fits Which Job

Surface	Best for
Chat web app	long-form reasoning, plans, drafts, project review, files, tool calls
Voice	fast thinking out loud, live decisions, walking through ideas
Wing Mode	quiet companion presence inside a live voice call
Listen-Only Mode	transcription and capture without response or tool use
Telegram	mobile continuity, voice notes, voice replies, reminders, briefings

The Real Promise

The important promise is not "many channels."

The important promise is:

one memory story
one background-intelligence story
one project and follow-through story
one system that can meet you where you already are

Keep Reading

Core Services - The full service map
Connected Workspaces - What these surfaces can pull from
Scheduling - How proactive delivery fits the same system
Background Agents - What deeper help can happen behind the scenes
Architecture Overview - Where voice, chat, and messaging fit in the full system

Core Services Prompt Workbench