2026-05-18
How Fast Is My Transcriber on Your Mac?

A quick reference for whisper transcription speed across Mac hardware, the surprise most Intel users don't know about, and the bench script so you can verify on your own machine. Numbers from May 2026.

The short answer

One hour of audio — a typical meeting, a long voice memo, a podcast episode — on the two Macs we benchmarked:

Hardware Best backend 1 hour of audio takes…
M1 Max / 64 GB Metal (GPU) ~2 min wall time
Intel i9-8950HK / 32 GB CPU, 4 threads ~58 min wall time

Same audio clip, same large-v3-turbo model (1.6 GB, multilingual, what My Transcriber ships on every Mac with 16 GB+ of RAM), same code path the app uses under the hood. The M1 Max finishes a 1-hour file in roughly the time it takes to make coffee. The Intel i9 finishes in roughly the duration of the recording itself — real-time-ish, fine for "drop a file, come back after lunch," painful if you queue up a backlog.

Other Macs land between these two. We didn't get to bench an M2/M3/M4 (or the 14"/16" Apple Silicon variants); if you have one and want to contribute results, the bench harness is at the end of this post.

The Intel-Mac surprise: don't use Metal

The unintuitive finding behind the Intel number above: on an Intel Mac with a discrete GPU (the AMD Radeon Pro 5xx series in 2018–2019 MacBook Pros, for example), the Metal-GPU transcription path is both slower than CPU and reliably crashes on audio longer than ~3 minutes.

"Crashes" means: sona, the small Go program My Transcriber uses to run whisper.cpp, aborts with a Metal-backend assertion failure mid-transcription. Repeatedly. Deterministic. The same audio file runs cleanly on an M1 Max in Metal mode — so this is an Intel + discrete-GPU bug in the underlying whisper.cpp Metal kernels, not anything we can fix on the app side.

What My Transcriber does about it: the app picks the right backend for your machine automatically. On Intel with a discrete GPU, that means CPU, 4 threads — the fastest and most stable path. On Apple Silicon, that means Metal. There's nothing in Settings you need to configure. If you've been using My Transcriber on an Intel Mac and noticed transcriptions taking forever (or failing on long files), v0.3 fixes that for you.

Specifically, the abort is GGML_ASSERT(buf_src) failed at ggml-metal-device.m:1561, after ~2–3 minutes of audio. Full reproducer + cross-platform comparison: polaris-whisper-crash-long-audio.md.

The Apple Silicon advantage

Apple Silicon's unified memory + well-tuned Metal kernels for whisper add up to roughly ~32× faster than the best Intel path on the same audio. Same model, same code, same audio file — the speed gap comes entirely from the hardware. On M1 Max we measured ~0.03 seconds of wall time per second of audio: a 10-minute file takes about 18 seconds, a 1-hour file ~2 minutes, a 5-hour conference recording ~10 minutes.

Thread count on Apple Silicon doesn't really matter — we ran the same audio through every thread setting from 1 to 10 and the results were within ~5 % of each other. Metal does the actual inference work; the CPU side (tokenizer, beam search) is light enough that one thread or eight makes barely any difference.

Two things do matter: power state, and what else is running.

  • Power state. macOS Low Power Mode throttles the M1 Max GPU 5–6×. macOS also auto-enables Low Power Mode on battery below ~20 %, and we additionally discovered a second silent throttle below ~5 % battery (another ~3× slowdown, even when High Power Mode is explicitly on). For long backlog processing, plug in — otherwise transcription that takes 2 min plugged in can take 30+ min on a draining battery.
  • Other heavy processes. On Intel, transcription is CPU-bound, so a parallel build, a video encode, or a busy browser will slow it down noticeably — sometimes 30–45 %. On Apple Silicon, the transcription work runs on the GPU via Metal, so most CPU load is harmless; the exception is another local AI tool that's also using Metal at the same time (a second whisper, a local LLM, etc.). If your numbers ever drift from the table above, that's usually the cause.

What we benched, in detail

Same setup on both machines: large-v3-turbo whisper model (the v0.11 default on 16 GB+ Macs), 60-second test clip (LibriVox public-domain Sherlock Holmes), plus a 10-minute clip from the same source to verify the numbers scale to realistic file lengths. Clean systems — no other heavy processes running, AC plugged in, High Power Mode where the Mac supports it.

Hardware Best backend @ 60 s clip @ 10 min clip Notes
Intel i9-8950HK / 32 GB / Radeon Pro 560X CPU, 4 threads 0.80 s/s audio 0.96 s/s audio CPU 2× faster than Metal here. Metal crashes >~3 min audio.
M1 Max / 64 GB Metal (any n_threads) 0.02 0.03 Metal 8–22× faster than CPU; gap grows with audio length.

Reported in seconds-of-wall-time per second-of-audio. Multiply by your audio length to get a rough estimate: a 30-minute meeting at 0.03 s/s audio = 54 s wall.

A subtle thing the 10-minute column reveals: whisper's per-second cost grows non-linearly with audio length on CPU but stays linear on Metal. Intel CPU per-audio-sec went 0.80 to 0.96 (1.2× worse) when audio grew 10×. M1 Max CPU went 0.17 to 0.65 (3.8× worse) on the same comparison — the M1's CPU starts much faster but degrades faster too. Metal stays linear on both archs. For very long files (an entire podcast season, a full day's voice memos), the gap above is a lower bound — CPU paths get slower per audio-second the longer the file.

Reproducing on your hardware

We benched exactly one Intel i9-8950HK / 32 GB and one M1 Max / 64 GB. If you have a different Mac and want to know exactly where it lands, the bench script + canonical audio clip is at github.com/rememberthis-ai/local-ai-benchmarks/transcription-bench. Pull via git-LFS to get the LibriVox clip, run ./sona_bench.sh, and you'll get the same 4-arm matrix we did. Takes ~17 minutes on Intel, ~5 minutes on Apple Silicon.

Pre-flight checklist that the script itself reminds you about:

  • Quit My Transcriber (and Remember This if you have it). Concurrent sona processes contaminate results.
  • Plug in. Battery throttles GPU. Power Mode = High if your Mac exposes it.
  • Don't run a big build, cargo compile, etc., in parallel. We caught ourselves at this once mid-bench — numbers were 30–45 % slow until we shut everything down.

Open an issue or PR with your results — we'd love to expand the table above with more Mac tiers. Especially M2/M3/M4 numbers and any Intel + iGPU-only machines (the Polaris-crash finding is dGPU-specific; the older Intel Macs without dGPUs might behave differently).

The bigger picture

My Transcriber is a local-only whisper transcription app. Whisper is the model; everything above is about how fast that model runs on your hardware, with the runtime settings tuned to whatever your Mac can actually use. None of the audio leaves your machine; the model itself ships locally; the only cloud thing in the loop is the auto- updater.

If you're curious about the broader benchmark work this came out of — we also benched 17 LLMs and 18 vision models for our sibling app Remember This — the full write-up of cross-arch local AI on Mac in May 2026 is at rememberthis.ai's blog. The TL;DR for transcription specifically lives here; the broader picture for photo captioning + agentic LLM workloads lives there.

My Transcriber icon

My Transcriber

Free. Local. Private. macOS 15+.

Not sure which? Apple menu → About This Mac. "Chip: Apple M..." = Apple Silicon. "Processor: Intel..." = Intel.

Stay updated

Get notified when we publish new posts. Sign up and we'll send updates straight to your inbox.