The 50% of AI

Recently, I published the audiobook for AI as your Teammate, which I recorded using AI.

I started by paying around $40 to clone my voice. The audio I sent the company was terrible. Just absolutely terrible. It sounded like I was recording my voice from across a room. 

They cleaned up the audio and really created an amazing narrator. It sounded almost exactly like me (which was beneficial as I listened back, as I don’t like the sound of my own voice).

Then, I copied and pasted my book into the platform, and I pressed “Submit.”

It didn’t go well.   There were all sorts of formatting errors and other elements that needed cleaned up.

So, I went through the whole book, inside the app I used, and I cleaned it up. 

I pressed “Submit” again.

It was better, but it wasn’t perfect. There were logical errors. Emphasis was occasionally placed on the wrong words.

For example, the book uses sub headers.  In print, sub headers are noted as being a different font and weight, then, the paragraphs follow.  In the AI audiobook, the sub header was copied-and-pasted, but it didn’t contain a period, so the AI read it and the first sentence of the first paragraph as a run-on sentence.

Witty remarks I made in the book were awkward.  At one point, in regards to using data, I say “It’s the laziest person on your team. Hopefully it is, anyway.”  A human reads that as a witty remark.  The AI said it in a completely flat tone.  It took 5-10 different attempts to achieve the result I wanted.

Then, the platform, an admittedly very new platform, had a bug. I couldn’t export files. Then another bug. I couldn’t generate new audio. Then another bug.  I couldn’t export files (again). Nothing major, but the bugs stopped me from accomplishing the task at hand.

All-in, I spent about 45 days on the process, on and off.  It was probably 10 hours of work.

Had I recorded the audio in a sound studio and hired someone on Upwork to edit and export the audio, it would’ve cost me upwards of $600, taken 2.5 hours to record, and maybe an hour or two to manage the project with the freelancer.

Instead, I spent only $150 but 10 hours.

The AI voice clone was so magical that I thought “of course it’ll work to record the book.”  From there on, my usage of the AI was like a toddler crawling around and bumping into furniture.   I made progress, but not without frequent and sudden setbacks.

In my use of many new AI tools promising the world, there’s a common thread: the AI is literal magic at doing a task, then it sucks at some other part of the task that’s equally important but seems “basic,” and you’re left picking up the pieces.

It seems like I often spend as much or more time finagling the AI than I would’ve spent simply doing the task the old-fashioned way.  I don’t have a problem working with platform customer support.  Hell, sometimes, our code has bugs! It’s only when they promise you the world, then seemingly underdeliver.

I was in a discussion with a friend recently who had a similar experience with an audiobook but using a different service.  In our discussion, I called it “the 50% of AI.” 

Sometimes, the AI is nothing short of magic, and it’s a 150% improvement on whatever currently exists. Other times, it is a drag, it’s buggy, and it’s irritating, which is a  -100% improvement.

Is it an improvement? Yes.

If I did another audiobook, would I now know what to do and it would be way easier? Yes.

Is it a perfect-fit solution that changes the way everyone operates forever and ever? No.

If you work with ChatGPT frequently, you’ll notice this phenomenon.  Sometimes, it’s easier to just do the work yourself.

Over time, I’m sure the -100% will get resolved, but I think the question is when?

Thanks to Miranda Wagner, Shannon Waller and Terise Ryan for reading drafts of this essay.

Feature image credit: DALL-E accessed through ChatGPT with the following prompt: An image of a computer with headphones on in a sound studio recording an audiobook. The sound studio is an old 70’s studio with lots of smoke and wood panel walls. There’s also the room with the sound booth. The computer is smoking a cigarette. illustrative style. The computer is in front of a microphone. There’s a human working on the sound board.

Leave a Comment