Skip to content

Podcast Benchmark Report

This page captures the current bilingual chart workflow for the natural Japanese podcast benchmark and the headline results from the latest measured run set.

English podcast benchmark chart

Scope

  • Topics: Today's Meal, AGI, AI Consciousness
  • Conversation length: 6 turns per run
  • Repetitions: 3 runs per mode
  • Reported chart metrics: first audio delay and handoff silence
  • Supporting card metrics: per-topic variance for the same two latency measures

Current Results

MetricStreamingBatchGainRelative
First audio mean1.22s10.91s9.70s faster8.94x faster / 88.9% lower
Handoff mean2.30s11.91s9.61s faster5.18x shorter / 80.7% lower

Per-topic means

TopicFirst audio S / BHandoff S / B
Today's Meal3.13s / 13.62s4.23s / 14.54s
AGI0.11s / 9.72s1.45s / 10.67s
AI Consciousness0.41s / 9.41s1.23s / 10.52s

Per-topic variance

TopicFirst audio variance S / BHandoff variance S / B
Today's Meal0.60 / 1.29 s²0.98 / 1.12 s²
AGI0.04 / 0.04 s²0.57 / 0.21 s²
AI Consciousness0.33 / 1.29 s²0.24 / 1.35 s²

Rebuild

bash
npm run report:podcast-benchmark
npm run verify:podcast-benchmark-layout

The report builder writes:

  • .tmp-benchmark-podcast-topics-natural/report/podcast-benchmark-overview.json
  • .tmp-benchmark-podcast-topics-natural/report/podcast-benchmark-overview.en.svg
  • .tmp-benchmark-podcast-topics-natural/report/podcast-benchmark-overview.en.png
  • .tmp-benchmark-podcast-topics-natural/report/podcast-benchmark-overview.ja.svg
  • .tmp-benchmark-podcast-topics-natural/report/podcast-benchmark-overview.ja.png
  • docs/public/benchmarks/podcast-benchmark-overview.json
  • docs/public/benchmarks/podcast-benchmark-history.json
  • docs/public/benchmarks/podcast-benchmark-history.csv
  • docs/public/benchmarks/podcast-benchmark-en.svg
  • docs/public/benchmarks/podcast-benchmark-ja.svg

Tracked History Data

The longitudinal tracking files store one row or entry per generated benchmark snapshot.

FieldMeaning
generatedAtISO timestamp for the generated report
benchmarkKeyStable identifier for this benchmark recipe
gitShaShort Git commit SHA when the report was generated
sourceKindWhether the report came from fresh local artifacts or the stable snapshot fallback
topicsTopic set used for the benchmark
firstAudioStreamingSecOverall streaming mean for first audio delay
firstAudioBatchSecOverall batch mean for first audio delay
firstAudioGainSecAbsolute improvement for first audio delay
firstAudioGainPctRelative improvement for first audio delay
handoffStreamingSecOverall streaming mean for handoff silence
handoffBatchSecOverall batch mean for handoff silence
handoffGainSecAbsolute improvement for handoff silence
handoffGainPctRelative improvement for handoff silence

Layout Verification

npm run verify:podcast-benchmark-layout checks both localized SVG files for:

  • missing data-fit-boundary annotations on text nodes
  • text overflow outside the assigned boundary
  • text-on-text overlap inside the same boundary

The current outputs pass that verifier for both English and Japanese.

Caveats

  • The AGI report still depends on one preserved successful run mean from console logs because the original first artifact folder was overwritten before the temp-dir fix.
  • The chart generator prefers the latest local benchmark artifacts. When those raw temp files are gone, it falls back to the stable snapshot at docs/public/benchmarks/podcast-benchmark-overview.json.