Writing. Thinking. More Writing

Writing, Blog

AI Audiobooks Will Make Human Narrators Obsolete by 2030

Robot Reading Book into Microphone DALL-E

Artificial Intelligence (AI) Audiobooks will make human narrators obsolete by the end of the decade (2030). That’s a prediction, implying an educated guess, but it’s almost a certainty. And whether it’s 2029 or 2031 doesn’t really matter; what matters is that an entire profession will either vanish or be so greatly diminished it will seem quaint. When I first started thinking about this series, and before doing much research, I thought that human obsolescence so soon was unlikely. Now I think it’s conservative. This profession is under immediate threat from AI, and will disappear without regulation or protections of some kind.

What is Audiobook Narration?

While there are many markets for voice overs and narration, ranging from video or power point augmentations to political commercials, the narrations that matter for novelists are audiobooks–the purely aural version of novels that are increasingly popular as people ride, drive and generally multi-task their way through life. You can find thousands (millions) of these audiobooks on Amazon.com and its subsidiary, Audible. Because audiobooks rely on a finished creative product (the written novel), and the advanced state (and low cost) of text-to-speech technologies, human narrators are among the artists most susceptible to replacement by AI even as the market for audiobooks is booming.

Audiobook market forecast Grandview Research

Ironically, I doubt that these billions in audiobook sales are driving much of the AI voice technology market. It’s far more likely that the cost savings of using deepfake actors and talent, combined with using AI voices overlayed on either real or faked news presenters, actors, etc., in combination with massive big-tech interest in the field, is driving the underlying technology.

The net effect of this parallel developments will be that the quality of AI voices will rise rapidly while the cost will plummet–making AI audiobooks an easy market where book purchases might simply come with audiobooks for free.

Current AI Audiobook Offerings

This is a quick and dirty sampling of a few current and leading offerings, with more provided toward the end of this post. The provider names don’t mean much; in tech, there’s always a burgeoning of small companies making innovative offerings in a given space, followed by a glut, excessive me-too VC investment, price wars and then consolidation. It’s likely the usual big names–Google, Amazon, etc.–will drive others out of the market or simply buy the competition, leaving the standard tech oligopolies as the dominant offerings. Which makes no difference to audiobook narrators–you’ll be out of business soon, no matter what the AI competition is called.

Murf.ai, for instance, offers more than a hundred voices across twenty languages (though it’s not targeted at novels). All of them suffer from the flat, uninflected sound of computer-generated systems, and the pronunciation is sometimes…lacking. But compared to five years ago, these voices are miraculous, and very similar to Google’s ebook voices (though a small sampling from Google seemed to be better at unusual word pronunciation).

Murf.ai Screen Capture 2022

Murfi.ai showing partial selection of sample voices by age, gender and language

Are these voices good enough for audiobooks today? Absolutely, but they’re not yet as good as human voices; they’re simply good enough for many readers who are willing to accept a less-than-human product for a substantially discounted price (and having an audiobook that might not otherwise exist). And that’s all it takes to start cutting into the human voice artist market.

Here’s one example of Google’s AI audiobook voices. Not great, but not terrible either:

 

For self-published authors like myself, each ebook done by a professional human artist costs at least $3,000 and will never pay itself back in audiobook sales. On Google Books, a lesser-but-still-ok product is completely free (though it takes some time to customize and check). This is the inflection point all AI generated art will go through just before it dominates the market–it will suddenly be good enough, and far less expensive than human art. At that point, without some kind of intervention, the human marketplace for that type of art will start to collapse.

So, without further ado, here’s my prediction of how the human audiobook narration market will collapse (unless something is done):

Human Narrators Obsolete by 2030

Date Prediction Cost & Examples
Recent
Past
Inferior AI product
Excellent sound quality
Clearly a computer
Bad pronunciation, timing
No inflection or emotion
Not commercialized
NA
2022 Good-Enough AI product
Excellent sound quality
Clearly a computer
Okay pronunciation, timing
Some inflection & emotion
Somewhat Commercialized
$0 Free (Google Books)
$-$$ Services (Speechki)
Minor Market Disruption (Addition)
Inflection Point (Viability)
2023-24 Human parity AI product
Excellent sound quality
Indistinguishable from human
Excellent pronunciation, timing
Full inflection & emotion
More Commercialized
$0 Free (Google Books)
$0 (Other Retailer)
$-$$ Services (Speechki)
$0 Early Apps (MS Office)
$-$$$ Platforms (Azure)
$0 Open Source
2025 Superior AI product
Excellent sound quality
Indistinguishable from human
Perfect pronunciation & timing
Broadly Commercialized
+
Licensable Human Voices
Platform Agnostic Sources
Multi-Version Audiobooks
$0 Free (Google Books)
$0 (Other Retailer)
$-$$ Services (Speechki)
$0 Viable Apps (MS Office)
$-$$$ Platforms (Azure)
$0 Open Source
Moderate Market Disruption
Low-Mid Market Cannibalization
2025-30+ All of the Above
+
Multi-voice Readings
Personalization to Reader
Other Features TBD
$0 Free (Google Books)
$0 (Other Retailer)
$-$$ Services (Speechki)
$0 Viable Apps (MS Office)
$-$$$ Platforms (Azure)
$0 Open Source
Major Market Disruption
Practical Human Obsolescence

What will remain after 2030 or so are premier human offerings from large traditional publishers, used with their bestselling authors to increase perceived value and margins. Stephen King will get a human narrator and “narrated by [human name] (verified human!)” sticker, but the vast majority of self- and traditionally-published books will get very inexpensive but exceptional AI-produced audiobooks.

Some Detail on AI Audiobook Predictions

Good Enough? This is entirely subjective, but refers to a very specific case. Google Books narration quality is flat, there aren’t enough voice options, and the resulting product is nowhere near professional human quality, but it’s not terrible. Trad publishers and self-published authors with enough money and sales to pay for human-narrated audiobooks won’t go anywhere near Google’s offering. But there is a market that will: self-published authors with relatively small sales that see incremental revenues by offering an AI audiobook to readers who will rarely or never buy print or digital books. I suspect that for even these authors, the bad reviews they’ll get on the audiobooks might discourage use where it can be avoided, but that doesn’t matter; AI market disruption has already started.

Patch McQuaid, founder of iD Audio, which produces audiobooks for major publishers, says the audiobook trend isn’t just driven by publishers. Consumers, he says, “expect there to be an audio version of pretty much everything that’s released.” – Input Mag

Google showcases some short clips in multiple languages of its text-to-speech technology here: cloud.google.com/text-to-speech/docs/voices. And many of those are, if not great, good enough.

Sound Quality. This refers to the overall quality of the audiobook product, including clarity of voice, background noise, volume and tone settings, etc. Human narrators have to have or get access to sound-proof studios or at least very quiet recording environments, purchase expensive recording and editing equipment and software, calibrate these systems, etc., in order to get an Audible-worthy sound recording. Computers don’t need any of this; they can produce perfect digital sound every time, simply because nothing is produced in the “real world.” This alone will make it nearly impossible for humans to compete.

Inflection and Emotion. These are two of the greatest challenges for computer-generated voices and narrations. AI voices sound flat and inhumanly monotone, devoid of emotion corresponding to the text or even the personality of the speaker. This can make novels extremely boring and hard to listen to. Which is why Murf.ai and many other offers allow you to “coach” a cloned system based on your desired style; however, this is in its infancy and will take a year or two to produce human-sounding voices (Predicted by 2024) and possibly longer to become widespread. Companies like DeepZen and Speechki are focused on this exact problem, but the greater challenge is having the AI learn what part of your prose deserves which emotion and inflection without tagging or telling it each time.

DeepZen Screenshot

DeepZen has a finished book offered for consideration, The Korean War by Jeremy Maxwell, narrated in the voice of Edward Herrmann. The emotional content and inflection aren’t bad at all.

Licensable Human Voices. Many of you will know that James Earl Jones licensed his voice as Darth Vader to be used in future productions of Star Wars. The same thing will happen in the book world, where people license their voices to be used as narrators (or characters) in books they might never have heard of. This will be one of the great disrupters in the market, allowing authors (or readers) to have AI-replicated versions of voice actors for their books for the same or lower cost as unheard-of human narrators today.

Platform Agnostic Sources. Right now, you can create an audiobook on Google and download it, but Amazon / Audible won’t accept it on their platforms because the audio quality is…computery. Once AI reaches human equivalence, any audiobook file produced anywhere can presumably be used on any platform (so it could occur even earlier than shown), but for now Google’s audiobooks are for Google Books only. I’m hoping non-retailer applications like MS Word offer a viable alternative in the near future, and Microsoft clearly hopes it’s Read Aloud feature will do the trick.

Multi-Version Audiobooks. Let’s say you want to offer your audiobook in voices that appeal to a male demographic, a female audience, and English-speaking Latinx teenagers. That means three human narrators or one very talented person trying to satisfy multiple audiences, both of which mean time and money. In contrast, there’s no reason AI can’t produce multiple versions of your audiobooks, each of which appeals to a market segment, for the same or similar price as one narrator. Readers can then pick which one appeals to them rather than being stuck with a single generic version.

Multi-Voice Audiobooks. Let’s say you’re reading Game of Thrones because you’ve recovered from the horror of the final season on HBO. The chapters are broken down by character, so why shouldn’t the audiobook narration? Today, with AI products like Google’s, you get one narrator per book. But ideally, you could get one voice per chapter or even per character (changing each time someone speaks). This will truly bring audiobooks to life (well, not literally; that would be terrifying). To get this with human narrators, you’d probably need several voice actors and many thousands of dollars. With AI, multi-voice books will soon be fast to create, low-cost and available with hundreds…or millions of possible voices.

Personalization to Reader. This makes the tools of AI used to create or select narration details available directly to the reader. Rather than choosing from a pre-set list of audiobooks, the reader can choose the voice, accent, age, emotionality and even race of the narrator and individual book characters, choosing from a list of default and premier options. They might even be able to add a background soundtrack to pump up the drama. Readers then download this unique book to their listening device and have a completely personalized experience. This is nearly impossible for human narrators, unless multiple people read each book in multiple ways, which would make the book prohibitively expensive.

Other Features TBD. I have no doubt there are creative minds out there coming up with far more audiobook features than I have. Maybe you have an idea of your own? How would AI enable that idea to be realized?

A Proposal to Protect the Human Novel

All of the above doomsaying is leading up to a proposal–that we formally lobby Congress and other legal bodies to protect human art and artists in a very specific way:

  1. As mentioned above, make it illegal to copyright work created wholly, largely, or substantially by machines (AI or otherwise).
  2. Require all machine-generated art to be labeled something like “Made by Machine” or “Created by AI” and include the platform it was created on.
  3. Establish serious legal penalties or trying to pass AI or machine-generated art off as human.
  4. Work to make this standard international law, to protect all human artists, everywhere.

I’ll have much more on this later, but for now, can you take a moment to sign the Change.org petition to Protect Human Art & Artists from Artificial Intelligence (AI)?

Protect Human Art

I’d love to know what you think of this idea in general. Would you back it, or not? What concerns do you have? What other ideas do you have to protect human art and artists?

Thank you!

AI Audiobook Providers

If you’d like more information on current AI audiobook service providers, here’s a start, but it’s by no means all-inclusive. There are simply too many new offerings to track, and more coming out every month.

Provider Voice
Samples
Audiobook
Pricing
Human
Baseline
There will usually be several samples of a single narrator or group, depending on service. According to Scribe: “If you want to make an audiobook, plan on spending anywhere between $2,500 and $3,750 for a five-hour book. That breaks down to about $500 to $750 per finished hour of audiobook content.”
Audiobook Only 1 sample of their 146 voices in 43 languages. Unknown.
DeepZen 15 samples, male and female, in US and UK accents, in addition to an emotion comparison sample that is excellent Managed Service includes a pronunciation check, proofing review, customer quality review and 2 corrections stages, post-processing and mastering ready for distribution. The turn-around is 1 to 2 weeks. The cost per finished hour is $129.00/ £99.00.
Google  Books More than 30 voices, all of which can found here Free. That’s $0 per finished hour.
Murf More than 100 examples in 20 languages, multiple genders and accents Subscription-based pricing, so costs vary from $6 – $26+ / month.
Speechki More than 360 synthetic voices in 77 languages and dialects including dozens of US and UK accented English voices. The basic cost is $500 or $1000 per book (depending on which voice you choose) or $120 for each hour of audio production. Pricing includes human proof-listening and corrections.

AI Audiobook Provider Details

Just a little more detail on the services listed above. I suspect each offering has pros and cons that will make them more or less applicable to a given book length, genre, style, etc., and you should do more research before picking a solution for your next novel.

Audiobook / Audiobook AI Service

Audiobook offers Ai audiobook solutions targeted at both authors and publishers, and is heavily marketing (or at least advertising) their solutions.

Audiobook.ai Value Comp

This type of marketing is typical for AI audiobook services, clearly targeting the human competition as slower, more expensive and more error prone…which it probably is.

Audiobook Workflow

There is only one sample on the site, which is fine and seems mostly human, but it would be nice if they offered more of the 146 voices to review. Pricing was obvious.

DeepZen / Audiobook AI & Voiceover AI Service

DeepZen offers a range of samples and excerpts of its audiobook AI offerings:

DeepZen Audio Samples

You can also find multiple full-book examples here, including

DeepZen Audiobook Samples

And DZ also offers both managed and automated services with varying feature and costs, and presumably quality:

DeepZen Ai Audiobook Services

Google (Play) Books / “Auto-Narrated Audiobooks”

Probably what everyone assumes is the big natural player in this space, Google has thus far made its audiobooks free for Google Play Books publishers (and these books can be used on any other service that will take them), but the sound quality is surprisingly flat and emotionless–perhaps reflecting a feature-reduced version of what they could release if so desired.

I suspect this offering will evolve quickly if Google puts resource behind it, though it’s always possible they’ll let it flounder like so many other programs they’ve lost interest in over time.

Murf / Voiceover AI Service

See details above. Murf offers more than 100 samples of their AI voices, covering 20 languages, two genders and multiple accents.Murf.ai Screen Capture 2022Pricing is on a subscription basis, with time sufficient for a novel-length audiobook starting around $13 / month. Though you’ll notice they throttle voices and other features by subscription plan, and I suspect the best voices are kept for the more expensive options.

Murf Subscriptions

Speechki / Audiobook Narration Service

Speechki claims to be able to produce a human-quality audiobook in about fifteen minutes, though they focus mostly on shorter non-fiction. Here’s the founder talking about what they offer:

And here (below) is their audiobook sample. “Your audiobook could sound like this!” Note that the voice has substantially more emotion and human inflection than the standard Murf.ai or Google Books samples (which I assume is why the cost is substantially greater than the latter).

Two key points from their FAQs:

  • Speechki provides the recording with more than 360 synthetic voices in 77 languages and dialects including dozens of US and UK accented English voices. Speechki does not have human narration available.
  • The basic cost is $500 or $1000 per book (depending on which voice you choose) or $120 for each hour of audio production. Pricing includes human proof-listening and corrections.

Related Stuff

These are just a few links for people who’d like more information on the topic of AI narrated audiobooks. You can tell the market is immature by how amateurish much of this content is, but that will also change rapidly over the next few years.

The Creative Pen / AI Narration for Audiobooks

A quick if visually boring update on AI Narration from YouTube. It’s a good introduction to the topic.

But the related article is far better IMHO if you don’t mind, you know, reading.

You can listen to narrators and voice artists discuss AI narration on the VOBoss Podcast. They have lots of different episodes on the topic, and there are a variety of views from ‘it should not be allowed,’ to ‘it all sounds terrible and AI can never communicate like a human,’ to ‘we should embrace AI and license our voices and use the tools to spread our personal brand and make more money.’ As ever, there are always varied opinions on new tools and technologies. Also, check out VoiceBot.ai which also has a podcast discussing these issues in depth. – The Creative Pen

M.K. Williams / How to create an AI narrated audiobook…

A short and pleasantly amateur first-time lesson on how to create a Google Audiobook, takes you through a bit of the process.

Publisher’s Weekly / AI Comes to Audiobooks

A great overview of audiobook AI, including value proposition and challenges to the industry.

The professionals in this business are represented by a powerful union, SAG-AFTRA (Screen Actors Guild–American Federation of Television and Radio Artists), which describes itself as “the world’s largest labor union representing performers and broadcasters.”

The union has two concerns about AI. On the one hand, replacing live actors with computerized voices is bad for business. Of great concern also is AI’s increasing capacity to clone human voices, with substantial risk that the voice owner will either be under-compensated or not paid at all.

This article also includes several more vendors than listed here.

The vendors profiled here make it clear that they are not trying to replace narration on top-selling frontlist and backlist titles. The opportunity is in the deeper backlist, where a $500 or $1,000 audiobook investment might make financial sense.

More to Come

Peace.

Leave a Reply