It’s fun doing interviews, but who actually likes transcribing them afterwards? The novelty wears off after about 5 minutes. Unless you’re a fast and accurate typist, this activity is also incredibly time-consuming. Fortunately, it’s now possible to get near-perfect transcripts at a very low cost.
In my previous post, I explained how you could use software like Dragon Dictate to write with your voice. This doesn’t work properly with anyone else’s voice – it’s trained to understand your tone, speech patterns, and vocabulary.
This time, I’ll introduce you to three tools that can handle multiple voices. They’re ideal for transcribing interviews, events, or meetings.
Rev is perhaps the best-known transcription company. Using a combination of AI and humans, they achieve at least 99% accuracy. You upload your audio or video file, choose your service, then a transcript lands in your inbox within 12 hours.
The price is $1.25 per minute, so a 20-minute interview would cost $25. You can pay an extra $1 per minute if you need it faster. To get it back in 3 hours, that 20-minute interview would cost $45. There’s a useful calculator to show exactly how much you’d pay and how long it might take. They’ll handle multiple speakers, specialist terminology, difficult audio, and accents at no extra cost.
I used Rev myself a few years ago and the transcript was very high quality. I’ve also heard good reports from several other users. My big concern is the humans behind Rev. If you’re only paying $1.25 per minute, they’re receiving a tiny amount for work that demands high accuracy. The company has recently reduced the rates paid to transcribers to remain competitive.
Even though the rates are reasonable (at least for users), the cost does mount up if you have a lot of material to transcribe. If so, you might want to investigate some DIY options, such as Descript and Sonix.
Although aimed mainly at podcasters, Descript is a great tool for any type of transcription. There’s an AI-only version where everything is automated, or you can pay an extra $2 per minute if you want a human to check it. Given this makes it around twice the cost of Rev, I’m assuming the humans get paid better.
For DIY transcription, you upload your file and wait while the virtual elves do their work. It takes a couple of minutes for a 10-minute recording. I found the accuracy to be about 90%. It’s affected by the quality of the recording and also your accent. Occasionally, I achieved better results by adopting an (atrocious) American accent. Unlike Dragon Dictate, it doesn’t learn your voice.
The cost of the basic package is $12 per month, which includes 10 hours of transcription. I’ve looked at a lot of AI transcription tools and this is an amazing deal. There’s also a generous free trial with three hours’ transcription included.
You can either use Descript through their website or download the desktop app. Using the app is a lot more stable as you’re not relying on a web browser to handle large files. However, the software needs at least 20Gb of free disk space to work properly.
If you do happen to be a podcaster, there are heaps of other features to make editing much easier for you, such as overdubbing and creating speech from text. When you spot a mistake in your audio recording, you can correct it by typing. Descript uses other examples of your speech to recreate the words. Yes, it’s very clever! I’ve also used it for generating video captions.
I was a dedicated user of Descript for a few months. Unfortunately, it has one major drawback for me: there’s no custom dictionary. This means you’ll have to go through and manually correct any specialist vocabulary or words that Descript routinely mistranscribes. This is unlikely to be a problem for many users, but it’s a limitation for more technical people. I think they’ll add this feature at some point. Sadly, in the meantime, it slows me down too much.
Sonix works in a similar way to Descript, but it has one big advantage: a custom dictionary. You can preload up to 400 words and Sonix transcribes them perfectly. It can also handle 30 different languages. I achieve around 93% accuracy and the excellent web-based software makes correcting transcripts really speedy.
The pay-as-you-go option costs $10 per hour. In return for a monthly subscription of $22, you’ll pay only $5 per hour. And you get various other features for organising and sharing your transcripts. There’s a free trial where you’ll get 30 minutes’ transcription for free (if you subscribe after clicking this link, we both get 100 minutes free).
I mainly use Sonix for transcribing videos and creating captions. It suits my workflow really well and I can even export my transcripts and captions in different languages – it magically translates all the text in a matter of minutes. This is an extraordinary feature that’ll be a huge benefit to some users.
These three tools all do the same job, but in different ways. As ever, what you use depends on your budget and requirements. Rev is a good choice if you need high accuracy, can wait up to 12 hours, and have the money. Descript is perfect if you want to do it yourself and don’t mind repeatedly correcting any technical terms. It’s definitely the winner for anyone who wants to edit audio files, too. For multi-language support and the custom dictionary, Sonix is the frontrunner.
I flit between Sonix and Descript, depending on my project. Maybe one day there’ll be a transcription tool that does everything perfectly. For now, give them a try and decide which is best for you.