Same audio. Four systems.
Very different records.
We ran a blind experiment to test which transcript services can actually handle real-world legal recordings. The results reveal critical differences in speaker accuracy, name capture, and meaning preservation.
The Problem
Transcript errors don't just lose words.
They lose people.
In legal research, jury studies, and focus groups, you're not just capturing what was said — you're tracking who said it. When a transcript system merges two speakers into one, or flips the meaning of a statement, the entire analysis becomes unreliable.
Speaker Identity
Each participant needs a consistent identity throughout the session. Merging speakers makes it impossible to track individual perspectives.
Name Capture
When participants introduce themselves, their names must be transcribed correctly. Mangled names break searchability and attribution.
Meaning Preservation
Legal positions must be captured accurately. "Wasn't negligent" vs "was negligent" is not a typo — it's a reversal of position.
Key Concepts
Understanding the Metrics
Before diving into results, here's what we measured and why each metric matters for legal and research transcripts.
Speaker Collision
CriticalWhen a transcript system incorrectly assigns statements from two or more different people to the same speaker ID. This makes it impossible to track individual perspectives.
Diarization
The process of partitioning an audio stream into segments according to speaker identity. Good diarization means each person gets their own consistent speaker ID.
Meaning Reversal
CriticalWhen a transcription error changes the fundamental meaning of a statement. "Was not negligent" becoming "was negligent" flips a defense position to a plaintiff position.
Name Capture
Accurately transcribing participant names when they introduce themselves. Essential for searchability and attribution in the final transcript.
Methodology
How We Ran the Experiment
A blind comparison using identical source audio across four leading transcript systems.
Source Audio
We selected two real multi-speaker recordings: a 35-minute US mock trial deliberation and a 45-minute UK jury deliberation over Zoom.
Blind Processing
The same audio files were submitted to four transcript services without any pre-processing or enhancement. No system received an advantage.
Analysis
We compared outputs for speaker collision, name capture accuracy, and meaning preservation — with timestamps for every finding.
Test 1: US Mock Trial
Controlled conditions
Test 2: UK Virtual Deliberation
Challenging conditions
No system is expected to be perfect; the goal was to compare intelligence-readiness for mock trial analysis.
Results
The Scorecard
We tested 4 categories across 2 test conditions. Here's how each system performed.
Best for Mock Trial Intelligence
87% pass rate
Strongest overall performance across reviewed test samples.
- Speaker count aligned with room structure
- Persistent identities across the session
- <1% observed speaker collisions
- Names and introductions preserved
- Built-in manual speaker cleanup
Critical failures
13% pass rate
- Speaker collision in both tests
- Dropped juror name entirely
- Reversed "wasn't negligent" to "was negligent"
- Merged foreperson with another speaker
Mixed results
50% pass rate
- 51 lines contaminated by speaker collision
- "Shwankar" became "Shawan Wanka"
- Created ghost speaker that didn't exist
Inconsistent
38% pass rate
- 2 speaker collisions in Test 2
- "Shwankar" became "Juanka"
- "Ingrid" became "grid"
- Only detected 10 of 12+ speakers
Why This Matters
Any system can perform well on clean audio. The real test is consistency under challenging conditions — Zoom compression, multiple accents, crosstalk. That's where most systems fail, and where the difference becomes critical for research validity.
Evidence
Detailed Findings
Click each finding to see the actual transcript excerpts and timestamps.
US Mock Trial — Clean Audio
Ghost Speaker + Name Failure
Strong Speaker Separation, Broken Names
UK Zoom Deliberation — Challenging Audio
Foreperson Contaminated + Juror Fragmented
Juror Merged with Court Staff
Reviewable Confidence
Built for review, not blind trust.
Transcript systems perform differently depending on audio quality, crosstalk, accents, room acoustics, microphone placement, and source recordings. SecureStream is built to expose uncertainty, preserve speaker structure, and keep findings tied back to source audio — so teams can review the moments that matter instead of blindly trusting a flat transcript.
- Lower-confidence segments can be reviewed
- Speaker identities can be corrected in workflow
- Findings stay linked to timestamps and source audio
- Source quality limitations are surfaced, not hidden
Confidence Map
Confidence is exposed, not buried. Reviewers know which moments deserve a second listen.
Reviewability matters most when audio conditions are challenging.
When the verdict depends on who said what, you need a transcript you can trust.
Every timestamp in this study is auditable. Every finding is reproducible. See for yourself how Secure Stream performs on your recordings.
