AI Tools Make It Easy to Clone Someone’s Voice Without Consent

AI Tools Make It Easy to Clone Someone’s Voice Without Consent
FotografiaBasica

Tech companies have made it trivially easy to impersonate someone’s voice with artificial intelligence tools capable of producing realistic, machine-generated speech — and most of the companies behind the tools make little or no attempt to ensure that the humans being copied have consented to the process.

Using just a short recording of someone’s voice, AI “voice cloning” tools can convincingly imitate a person’s speech to say anything a user desires. Proof News surveyed eight voice cloning tools and found that most of them make little or no effort to ensure the voices being cloned belong to consenting adult humans. While many of the companies have terms of service that explicitly prohibit copyright infringement and misuse, in many cases there are few — if any — practical hurdles preventing someone from cloning a voice nonconsensually.

Proof News tested the popular Eleven Labs, a $5-per-month service, along with AI voice cloning tools that are available to the public without charge, including Speechify, PlayHT, LOVO, Veed, VoCloner, and Descript. We also looked at Respeecher, which has branded itself as an “ethical” voice cloning service for creative industries. Several of the companies allow voice cloning with a free account, while others are part of a larger suite of AI content generation tools that require a paid monthly or annual subscription. But many of the tools don’t require users to do anything other than create an account, upload an audio sample, and click a checkbox claiming that they have the right to use the voice being cloned. 

Genny is one such tool, created by LOVO, a Berkeley, Calif.–based company that is being sued in U.S. District Court for the Southern District of New York by a pair of voice actors who allege their voices were used without permission. 

LOVO’s terms of service state that users must “have the written consent, release, and/or permission of each and every identifiable individual person” whose voice is used to create an AI clone. But the company doesn’t seem to verify whether this consent was actually obtained. Instead, before cloning a voice, Genny’s interface simply asks users to confirm that they “possess the required rights or permissions to upload and duplicate these voice samples” and “assure that the content generated by the platform will not be utilized for any unlawful, deceitful, or detrimental intentions.” The closest the tool gets to actually verifying consent is by providing a form to add contact information for the cloned voice’s owner — a step which is entirely optional.

Screenshot from LOVO, June 11, 2024

LOVO did not respond to a request for comment.

The tools have caused particular anger among professional voice actors. Tim Friedlander, the founder and president of the National Association of Voice Actors (NAVA), said that these companies have made it far too easy for someone to clone, use, and sell their voices without permission.

“Right now, because of the ease with which you can upload and manipulate and distribute and use [voices] without the consent and control of the original voice actor, there’s really nothing that voice actors are very comfortable using,” Friedlander told Proof News.

PlayHT presents a similar “I agree” checkbox before cloning a voice, taking it on faith that the user has “all the necessary rights or consents to clone and use this voice without violating any copyrights and rights of publicity.” The company also disclaims liability for anything uploaded to the platform in its terms of service, saying that it has “no obligation to edit or control User Content that you or other users Submit.”  

Screenshot from PlayHT, June 6, 2024

Eleven Labs, too, presents a checkbox for users to click, affirming that they have the “necessary rights or consents” to use the voice and will not use it for an illegal or harmful purpose. Eleven Labs spokesperson Sam Sklar noted that it requires “users to complete a ‘voice CAPTCHA’ verification process” in the professional version of its service. This article tested the $5-a-month “starter” version of the tool, not the $99-a-month professional version.

Rules with unclear enforcement mechanisms are used by nearly all the services Proof News tested. VoCloner, whose sparse terms of service only specify “noncommercial use,” stipulates no other consent requirements or restrictions.

Coqui AI, the company that built the tool, did not respond to a request for comment. Notably, the company appears to be shutting down.

The few technical hurdles that might prevent someone from cloning a voice without consent were fairly easy to circumvent. For example, when creating a cloned voice, Descript and Veed require users to read from a provided script, then use voice transcription to validate that they are speaking the required text. This prevents someone from cloning a voice by uploading audio clips found on the internet. However, Proof News was still able to get around this constraint by simply activating the microphone and then prompting a different voice cloning tool to read the required text. Doing so was especially easy with PlayHT, which has few limits on what can be generated with the free version of its software.

In an emailed statement, Veed co-founder Sabba Keynejad called this “an unrealistic use case,” while suggesting that other tools should improve their security. 

“Anyone intent on causing harm will have already achieved their goal (voice cloned without consent) on the original tool,” Keynejad told Proof News. “There is no real value or motivation to re-clone the already cloned voice on a different tool (where registration and user verification is required).”

“This highlights how crucial Industry collaboration to reduce the possibility of misuse is and we hope others with this functionality will add further security measures,” Keynejad added.

Veed also said the company “recently added enhanced measures that mean the voices of well known politicians and celebrities now get automatically blocked.” Eleven Labs also says it has a list of “no-go voices” that prevents the creation of voices based on “political candidates actively involved in presidential or prime ministerial elections,” but that list is currently limited to candidates in the U.S. and U.K.

Descript and PlayHT did not respond to a request for comment. 

Voice cloners are just one product of the ongoing tech industry hype over machine learning tools. Companies have angered artists, musicians, voice actors, and others who say their livelihoods and safety are being threatened by these tools, which are frequently used to create scams and disinformation and aid in harassment. AI experts have repeatedly pointed out the dangers of large AI models, which often use data scraped from the internet and have been known to exhibit bias through their large training datasets.

Like other AI tools, the dangers of voice cloners are exacerbated by their wide availability. Many of the tools tested by Proof News are limited in functionality by paywalls, preventing users with free-tier accounts from using some, but not all, of the tool’s features. For example, Speechify allows users to create cloned voices for free but will only generate up to 1,000 characters worth of text at a time. Descript allows free accounts to generate speech with cloned voices but is limited to a 1,001-word vocabulary and replaces anything outside the limit with gibberish. LOVO’s voice cloner generates lower quality speech for users with free accounts, while PlayHT — the most permissive of those we tested — allows one cloned voice at a time, which is automatically deleted after an hour. (Free users can still make new generations with the cloned voice or delete it to create a new one.)

Of all the tools reviewed, only Respeecher, which markets itself to the film and gaming industry and not individual users, seems to have a verification process to ensure the company has permission to clone a person’s voice. Clicking “Request Custom Voice” links to a contact form, which prompts users to specify the intended purpose of the voice clone and details on whose voice is being used. The company states that “Explicit permission is required for all voice replications, secured through a mutually signed agreement to ensure full understanding and consent” and that creating a voice clone “requires additional security checks, dedicated developer efforts, and may be costly.”

Anna Bulakh, head of ethics and partnerships at Respeecher, said the company made a deliberate choice to mediate the cloning of voices. The company was founded in 2018, and its voice cloning service predates many of its competitors. 

Ingredients
Hypothesis
AI companies do not adequately protect against unauthorized, nonconsensual voice cloning.
Sample size
Eight voice cloning tools, primarily ones that allow users to use some features for free.
Techniques
We attempted to use each tool to clone an identifiable person’s voice and noted any technical hurdles that might prevent an average user doing so without consent. We also noted “I agree” prompts with ethics statements or terms of service.
Key findings
Companies, for the most part, erect no true barriers to cloning whatever voice you may want. Most companies also disclaim liability for unauthorized voice cloning, putting the onus on users to use the tools responsibly.
Limitations
This is not a comprehensive review of all the voice cloning tools that are available, nor an exhaustive examination of the ways they could be misused.

“What we’re doing is kind of acting several steps ahead,” Bulakh told Proof News. “We want to work with IP owners, and for them to adopt the technology to monetize their likeness. So that’s kind of a more strategic vision. And behind that is a lot of work on the technical side and on the policy side.”

The rest of the tools, meanwhile, allow users to generate voice clones automatically, leaving it up to the companies to implement moderation policies. In most cases, it’s unclear how those policies are enforced. The services instruct individuals and copyright holders to submit a takedown notice if they believe their rights have been infringed. But without some sort of security measure like digital fingerprinting, it’s unclear how an average person might track a nonconsensual voice clone of themselves back to the tool that created it.

Of all the publicly accessible tools tested, only Eleven Labs provided an “AI Speech Classifier” that allows users to test whether a voice was generated with its voice cloner. When a voice sample that Proof News generated was uploaded, the tool returned a 98% certainty that the voice was made with Eleven Labs.

Friedlander emphasized that companies should, at minimum, implement technical methods to verify and track voices back to their original owners — a process known in the industry as provenance. This way, a voice can be traced to an owner before it’s used to train a voice clone or used to generate speech.

“What we’re really pushing for is provenance and watermarking on the input, so that every audio file that gets uploaded to these sites has to pass through a verification process that says, ‘This voice belongs to this person, and this person has given this website authority to upload it,’ ” said Friedlander. “Really what we want is informed consent, and not just, ‘Oh, I found my voice on this [platform] and it let me opt out.’ ”

Currently, there are no laws or standards around these verification techniques. Several congressional proposals exist that would regulate the creation of voice clones, label content as AI-generated, and establish digital rights for individuals over their own voice and likeness.


UPDATE: This article was updated to reflect Eleven Labs’ comment, which arrived after the article was published.

Republish This Article