GeminiVRM Architecture
Goals
GeminiVRM keeps the original browser-first avatar experience from ChatVRM while replacing the response stack with Gemini Live native audio.
The current architecture optimizes for:
- low-latency playback from streamed PCM chunks
- simple local-first setup with no required backend
- compatibility with VRM lip sync and expression playback
- an optional YouTube Live relay that feeds broadcast comments into the existing chat flow
Runtime Flow
- The user sends a prompt from the main page.
src/pages/index.tsxstarts streaming playback mode on the active model.src/features/chat/geminiLiveChat.tsopens a Gemini Live session, sends the active turn throughsendRealtimeInput, and forwards transcript updates plus audio chunks.src/features/lipSync/lipSync.tsvalidates PCM metadata, queues chunk playback, and keeps the analyser fed for mouth movement.src/features/vrmViewer/model.tsbridges the audio stream into the VRM runtime.src/features/emoteController/*updates expression, eye, blink, and lip sync state each frame.
Optional YouTube Relay Flow
- The user opens
Settings, then enters the optionalStreamingsubpage and theYouTube relaypanel. src/features/youtube/googleOAuth.tsrestores or refreshes browser-side Google auth for YouTube access.src/features/youtube/youTubeLiveClient.tslists broadcasts, resolves live chat metadata, and polls incoming comments.src/pages/index.tsxpushes queued YouTube comments into the same chat flow that Gemini uses for normal turns, with optional auto-reply.
Key Files
src/pages/index.tsx- user input, streaming state, and chat flow
src/features/chat/geminiLiveChat.ts- Gemini Live connection lifecycle, realtime text input formatting for
gemini-3.1-flash-live-preview, chunk forwarding, and transcript assembly
- Gemini Live connection lifecycle, realtime text input formatting for
src/features/chat/geminiLiveConfig.ts- default model and voice preset configuration
src/features/lipSync/lipSync.ts- audio scheduling, PCM validation, analyser updates, and autoplay safety handling
src/features/vrmViewer/model.ts- VRM model audio bridge and streaming hooks
src/features/youtube/googleOAuth.ts- browser-side Google OAuth client bootstrapping and saved session restore
src/features/youtube/youTubeLiveClient.ts- broadcast discovery, live chat polling, and relay API helpers
src/components/*- UI for the viewer, settings, chat input, and assistant status
src/components/settings.tsx- Settings modal navigation including the optional
Streamingsubpage entry
- Settings modal navigation including the optional
src/components/youtubeLiveControlDeck.tsx- YouTube relay panel, broadcast selection, relay controls, and comment preview
Streaming Notes
Earlier revisions waited for a full turn to complete, converted the whole response into WAV, decoded it, and only then started playback. The current path instead plays PCM chunks as they arrive, which reduces perceived latency and better matches the Gemini Live model.
Safety guards now cover:
onmessagecallback failures- unsupported or missing PCM metadata
- partial PCM frames at chunk boundaries
- browser autoplay failures when
AudioContext.resume()is blocked
Asset Model
public/Kiyoka.vrm- bundled default avatar
public/bg-d.png- default background
public/idle_loop.vrma- idle animation asset
Limitations
- The Gemini API key is currently provided directly in the browser.
- Playback is low-latency, but still depends on browser audio scheduling and network conditions.
- The default preview model alias may not be enabled for every Gemini account.
- The optional YouTube relay depends on a Google OAuth web client ID and the selected broadcast exposing a live chat.
Documentation Surface
The public docs are served with VitePress under /docs/, while the main application remains a Next.js static export.
- local authoring uses
npm run dev:all - Pages builds bundle the exported app plus VitePress output into
.next-pages/ - the app exposes a
Docsshortcut so the runtime and docs stay connected