What could possibly go wrong?
Call From Mom
Meta-formerly-Facebook has unveiled a “breakthrough” new text-to-speech AI that can edit existing audio, speak in six languages, and — in a more unsettling turn — replicate your loved ones’ voices.
You know, just in case you want to hear from your aunt, but don’t actually want to hang out on the phone for an hour.
“Today, we’re announcing a breakthrough in generative AI for speech,” Meta wrote in a press release, published late last week. “We’ve developed Voicebox, a state-of-the-art AI model that can perform speech generation tasks — like editing, sampling and stylizing — that it wasn’t specifically trained to do through in-context learning.”
Basically, all you have to do in order to replicate someone’s voice is feed the program an audio clip as short as two seconds long. Voicebox will then “match the audio style,” and boom: with little more than a written prompt and the click of a few buttons, you can get an AI-powered replica of your friend or family member’s voice — and the ethical and legal implications are palpable.
Deepfake Friend
To be fair, Meta does offer a compelling use case for this specific function of the model, arguing that the tech could “allow visually impaired people to hear written messages from friends in their voices.” Fostering accessibility in tech is essential, and we could certainly see this being helpful.
Nevertheless, the concept of replicating your bestie’s voice is still a bit unsettling, not to mention ripe for abuse. After all, if you can replicate a friend’s voice with just a two-second sound clip, you could practically replicate anyone’s voice as long as you had the audio.
It’s a potential lapse in safety that could give way to phishing scams, misinformation, and even an audio version of deepfaked porn.
Thankfully, Meta is more than aware of this, and is opting to keep the model and its underlying code closed source for the time being.
“There are many exciting use cases for generative speech models, but because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time,” the company wrote in a separate research blog.
And that can only be a good thing given the sheer potential for abuse.
More on text-to-speech AI: Startup Shocked When 4Chan Immediately Abuses Its Voice-Cloning AI