I found a Linux version with a similar workflow and forked it to build the Mac version. It look less than 15 mins to ask Claude to modify it as per my needs.
for me it strikes the balance of good, fast, and cheap for everyday transcription. macwhisper is overkill, superwhisper too clever, and handy too buggy. hex fits just right for me (so far)
p0w3n3d
There's also an offline-running software called VoiceInk for macos. No need for groq or external AI.
I just vibe coded a my own NaturalReader replacement. The subscription was $110/year... and I just canceled it.
Chatterbox TTS (from Resemble AI) does the voice generation, WhisperX gives word-level timestamps so you can click any word to jump, and FastAPI ties it all together with SSE streaming so audio starts playing before the whole thing is done generating.
There's a ~5s buffer up front while the first chunk generates, but after that each chunk streams in faster than realtime. So playback rarely stalls.
It took about 4 hours today... wild.
vesterde
Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.
Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!
muratsu
For those using something like this daily, what key combinations do you use to record and cancel. I’m using my capslock right now but was curious about others
show comments
kombinar
Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.
I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.
knob
This thread is a beautiful intro into our near future. Yet more and more custom coded software. Takes me back to the days of late 90s. Loving this!
My take for X11 Linux systems. Small and low dependency except for the model download.
corlinp
I created Voibe which takes a slightly different direction and uses gpt-4o-transcribe with a configurable custom prompt to achieve maximum accuracy (much better than Whisper). Requires your own OpenAI API key.
Is it possible to customise the key binding? Most of these services let you customise the binding, and also support toggle for push-to-talk mode.
spelk
Does anyone know of an effective alternative for Android?
show comments
SomaticPirate
Seeing this thread, sounds a blog post comparing the offerings would be useful
show comments
arcologies1985
Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
show comments
baxtr
Is there a tool that preserves the audio? I want both, the transcript and the audio.
To build your own STT (speech-to-text) with a local model and and modify it, just ask Claude code to build it for you with this workflow.
F12 -> sox for recording -> temp.wav -> faster-whisper -> pbcopy -> notify-send to know what’s happening
https://github.com/sathish316/soupawhisper
I found a Linux version with a similar workflow and forked it to build the Mac version. It look less than 15 mins to ask Claude to modify it as per my needs.
F12 Press → arecord (ALSA) → temp.wav → faster-whisper → xclip + xdotool
https://github.com/ksred/soupawhisper
Thanks to faster-whisper and local models using quantization, I use it in all places where I was previously using Superwhisper in Docs, Terminal etc.
Was searching for this this morning and settled on https://handy.computer/
i've used macwhisper (paid), superwhisper (paid), and handy (free) but now prefer hex (free):
https://github.com/kitlangton/Hex
for me it strikes the balance of good, fast, and cheap for everyday transcription. macwhisper is overkill, superwhisper too clever, and handy too buggy. hex fits just right for me (so far)
There's also an offline-running software called VoiceInk for macos. No need for groq or external AI.
https://github.com/Beingpax/VoiceInk
I just vibe coded a my own NaturalReader replacement. The subscription was $110/year... and I just canceled it.
Chatterbox TTS (from Resemble AI) does the voice generation, WhisperX gives word-level timestamps so you can click any word to jump, and FastAPI ties it all together with SSE streaming so audio starts playing before the whole thing is done generating.
There's a ~5s buffer up front while the first chunk generates, but after that each chunk streams in faster than realtime. So playback rarely stalls.
It took about 4 hours today... wild.
Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.
Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!
For those using something like this daily, what key combinations do you use to record and cancel. I’m using my capslock right now but was curious about others
Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.
I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.
This thread is a beautiful intro into our near future. Yet more and more custom coded software. Takes me back to the days of late 90s. Loving this!
https://github.com/rabfulton/Auriscribe
My take for X11 Linux systems. Small and low dependency except for the model download.
I created Voibe which takes a slightly different direction and uses gpt-4o-transcribe with a configurable custom prompt to achieve maximum accuracy (much better than Whisper). Requires your own OpenAI API key.
https://github.com/corlinp/voibe
I do see the name has since been taken by a paid service... shame.
Do any of these works as an iOS keyboard to replace the awful voice transcription Apple is currently shipping?
Utter uses your OpenAI key (~$1/month). https://utter.to/. Has an iPhone app.
Saved you a click: Mac only and actually Grok; local inference too slow.
Won't be free when xAI starts charging.
MacOS only. May this help you skip a click.
Nice! I vibe coded the same this weekend but for OpenAI however less polished https://github.com/sonu27/voicebardictate
Is it possible to customise the key binding? Most of these services let you customise the binding, and also support toggle for push-to-talk mode.
Does anyone know of an effective alternative for Android?
Seeing this thread, sounds a blog post comparing the offerings would be useful
Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
Is there a tool that preserves the audio? I want both, the transcript and the audio.
title lacks: for Mac
Anything similar for iOS?
Spokenly?