Features — Buzz Captions

Core

Real-time
transcription

Your audio streams directly to AI speech engines over an encrypted WebSocket connection. Captions appear in under a second — not after you finish speaking.

WebSocket binary streaming — continuous, not push-to-talk

Deepgram Nova-3 for English and major Western languages

Sarvam saaras:v3 for 23 Indic languages (Hindi, Tamil, Bengali, and more)

Gladia for multilingual code-switching sessions (up to 10 language hints)

Per-segment confidence scores for quality feedback

Flip for other person — one tap rotates captions 180° so the person across from you can read them face-on. Solo sessions only.

STREAM SPECS

ProtocolWebSocket (binary frames)

Audio encodingPCM / linear16

Max session duration4 hours

Segment outputInterim + final results

Primary STT (Western)Deepgram Nova-3

Primary STT (Indic)Sarvam saaras:v3

FallbackGladia → Google STT

Deepgram Sarvam Gladia Google STT

Translation

Live translation,
every session

Enable translation before you start and captions are translated in real time to your chosen language. No waiting — both the original and translation appear together.

Sarvam AI for Indic ↔ Indic and Indic → English translation

DeepL for Western languages: EN, ES, FR, DE, JA, ZH, KO, AR, PT, HI

Google Translate as universal fallback for all other pairs

Re-translate any completed session to a different target language

Available on Basic (→ English only) and Pro (→ all languages) plans

TRANSLATION ROUTING

Indic → Indic / EnglishSarvam AI

Western → DeepL targetsDeepL

All other pairsGoogle Translate

DeepL targetsEN, ES, FR, DE, JA, ZH, KO, AR, PT, HI

Sarvam targets23 Indic + English

Basic plan→ English only

Pro plan→ Any language

Diarization

Know who said
what, when

Speaker diarization automatically separates different voices and assigns color-coded labels. Rename any speaker to their real name — labels are saved with the session.

Powered by Deepgram for all Western and English sessions

Enabled by default — toggle off if not needed

Custom speaker names persist in transcript and PDF export

Always-on for voice note uploads via Deepgram (non-Indic) — Sarvam is used for Indic languages

Not available for Indic languages (Sarvam limitation)

SPEAKER 1 · Alice

I think we should prioritize the redesign this quarter.

SPEAKER 2 · Ben

Agreed — but we need to align on scope first.

SPEAKER 3

Can we schedule a dedicated call for that?

Import

Transcribe any
audio file

Upload any audio or voice message and get a complete, speaker-diarized transcript in seconds. Works with WhatsApp, Telegram, iMessage, and any recording app.

Formats: MP3, M4A, AAC, OGG, WAV, WebM, FLAC, CAF, MP4

Max file size: 50 MB

Speaker diarization always enabled for uploaded audio

Optional translation at upload time

Saved to session history like any live session

SUPPORTED FORMATS

MP3 M4A AAC OGG WAV WebM FLAC CAF MP4

SOURCES OPTIMIZED FOR

WhatsApp Telegram iMessage Voice Memos Recordings

Max size50 MB

Powered byDeepgram (non-Indic) / Sarvam (Indic)

Export

Your transcript,
your format

Share transcripts instantly or export them as polished PDFs with speaker labels, timestamps, and full multi-script font support.

Share as plain text or export as PDF — Pro plan only

PDF includes title, date, duration, word count, speaker count, language

Multi-script fonts: Devanagari, Arabic, Thai, Tamil, Telugu, Sinhala, CJK, and more

Colored speaker indicators in diarized PDFs

EXPORT FORMATS

Plain text sharePRO

PDF exportPRO

PDF INCLUDES

Speaker labels Timestamps Word count Duration Translation Multi-script fonts

Collaboration

Live group sessions
up to 10 people

Start a group session and invite others with a QR code or a 6-character join code. Each participant has their own audio stream — everyone's voice is transcribed separately.

Up to 10 participants per session

Join via QR code or 6-character alphanumeric code

Independent audio streams — no crosstalk issues

Full session history and transcript saved for the host

👥

BZ · 4R7K

Share this code or scan QR to join

3 / 10 joined Live

Privacy

Private by design,
not just policy

Your audio is never recorded or stored. Transcripts are encrypted at rest. We don't sell your data or use your content to train AI models.

Audio passes through server memory only — never written to disk

Transcript text encrypted at rest with AES-256

All connections over HTTPS/TLS

Firebase App Check (AppAttest / PlayIntegrity) prevents unauthorized API access

Auto-delete sessions by plan tier (7 / 30 / 90 days)

Full account & data deletion on request

SECURITY AT A GLANCE

Audio storageNever stored

Transcript encryptionAES-256 at rest

TransportHTTPS/TLS always

API securityFirebase App Check

Data soldNever

AI training on your dataNever

Payment data storedNever (Apple/Google/RC)

History

Every session saved and editable

Your sessions are organized chronologically. Edit any segment, fix a mistranscription, or rename a speaker. Your edits are saved immediately.

Searchable history

Find any session by date or scroll through your full history. 20 sessions per page with cursor-based pagination.

Inline transcript editing

Tap any segment to correct a word or phrase. Edits are saved server-side so they persist across devices.

Retention by plan

Free keeps sessions for 7 days. Basic for 30. Pro for 90. Sessions older than your limit are automatically deleted.

Powerful captions.
No compromise.

Real-time
transcription

Live translation,
every session

Know who said
what, when

Transcribe any
audio file

Your transcript,
your format

Live group sessions
up to 10 people

Private by design,
not just policy

Every session saved and editable

Searchable history

Inline transcript editing

Retention by plan

Ready to try it?

Powerful captions.No compromise.

Real-timetranscription

Live translation,every session

Know who saidwhat, when

Transcribe anyaudio file

Your transcript,your format

Live group sessionsup to 10 people

Private by design,not just policy

Every session saved and editable

Searchable history

Inline transcript editing

Retention by plan

Ready to try it?

Powerful captions.
No compromise.

Real-time
transcription

Live translation,
every session

Know who said
what, when

Transcribe any
audio file

Your transcript,
your format

Live group sessions
up to 10 people

Private by design,
not just policy