Just as Proteus in Greek mythology could change his shape at will, looking like a young man, a lion, a boar, a serpent, a bull, a stone, a tree, water, or flame, voice cloning technology allows to manipulate and transform voices in an unprecedented manner.
“… but you are only deceiving me, and so far from displaying the subjects of your skill, you decline even to tell me what they are, for all my entreaties. You are a perfect Proteus in the way you take on every kind of shape, twisting about this way and that…” (Plato, Ion 541e).
Voice Cloning 101
Voice cloning technology uses artificial intelligence (AI) and machine learning algorithms to accurately replicate an individual’s voice. This is achieved by analyzing a person’s unique vocal characteristics, such as pitch, tone, and cadence, and creating a digital replica that mimics the original voice. Criminals could and do use this technology to create convincing scams targeting financial advisors, families, businesses and anyone who relies on voice recognition. In addition, voice actors and other professionals who rely on their distinctive voices are at risk of having their vocal identities misused, potentially undermining their careers and misleading audiences. CBS’s report (June 25, 2024) states that Voice actors sue AI startup, claiming they clone their voices. Hollywood Reporter Winston Cho writes in Actors Hit AI Startup With Class Action Lawsuit Over Voice Theft, “The voices of other LOVO voice options are undoubtedly the voices of other class Plaintiffs who neither gave their authorization to use their voice – for either teaching Genny, use by LOVO, or sale by LOVO as part of its service – and were never properly compensated, Steve Cohen of Pollock Cohen, a lawyer for the actors.”
VALL-E by Microsoft is one example of voice cloning technology, demonstrating its potential for text-to-speech synthesis. Microsoft states in its ethics statement:
“VALL-E could synthesize speech that maintains speaker identity and could be used for educational learning, entertainment, journalistic, self-authored content, accessibility features, interactive voice response systems, translation, chatbot, and so on. While VALL-E can speak in a voice like the voice talent, the similarity, and naturalness depend on the length and quality of the speech prompt, the background noise, as well as other factors. It may carry potential risks in the misuse of the model, such as spoofing voice identification or impersonating a specific speaker.”
According to Zion Market research in their report: Voice Cloning Market: Industry Perspective, the global voice cloning market size was worth around USD 1.58 billion in 2023 and is predicted to grow to around USD 14.06 billion by 2032 with a compound annual growth rate (CAGR) of roughly 27.50% between 2024 and 2032. Waking up to the threat, in 2023, the Federal Trade Commission ( FTC) announced the Voice Cloning Challenge to address the present and emerging harms of artificial intelligence- or “AI”-enabled voice cloning technologies.
Time.com’s article From Scams to Music, AI Voice Cloning Is on the Rise, describes that AI scam calls are set up through voice cloning. For example, “…once a scammer finds an audio clip of someone’s voice online, they can easily upload it to an online program that replicates the voice.” In the BBC surveillance drama The Capture, the first episode such a scam is enacted. The MP and security minister Isaac Turner (Paapa Essiedu) is portrayed by a deepfake. An unauthorized digital twin version of him appears on TV, announcing a policy reversal on Chinese AI. The real Turner watches helplessly as his digital double misleads the public.
Voice Cloning as a Trojan Horse
“Timeo Danaos et dona ferentes” “I fear the Danaans [Greeks], even those bearing gifts” Virgil, Aeneid (II, 49)
The Trojan Horse portrays deception and infiltration, which parallels contemporary concerns about voice cloning being used to misrepresent identities. Just as the Greeks used the horse to enter Troy undetected, voice cloning could allow imposters to “sneak into” learning environments. Voice cloning technology could have major implications for students, educators, and institutions of higher education. Some potential dangers include:
- Misrepresentation and Cheating: Students could use voice cloning to misrepresent themselves or others during remote learning sessions, leading to confusion and undermining trust within the learning environment.
- Privacy Violations: The use of voice cloning could result in privacy violations for both students and educators. For instance, unauthorized voice cloning could be used to create defamatory content, highjack or impersonate faculty or students.
- Threat to Academic Integrity: The proliferation of voice cloning technology could pose a threat to academic integrity. It might become increasingly difficult for educators to verify the authenticity of student work, leading to the potential erosion of academic standards. Students might receive false or misleading information from sources impersonating their faculty undermining student learning and trust in educational and academic sources.
The Digital Twin Conundrum
Voice cloning contributes to the burgeoning concept of digital twins, the creation of digital replicas of individuals. According to Gigabyte, “ A digital twin is a virtual representation of a real-life entity, such as a person, product, process, or system, which has been designed to mirror or simulate changes to the real-life entity.” Such digital replicas could have significant implications for privacy and identity, blurring the lines between digital and biological identities. If misused, voice cloning could play a significant role in the exploitation and manipulation of individuals via digital twins.
McKinsey (2022) in Digital Twins: From one twin to the metaverse defines as an example, “a digital twin could provide a 360-degree view of customers, including all the details that a company’s business units and systems collect about them—for example, online and in-store purchasing behavior, demographic information, payment methods, and interactions with customer service.” The report further states that “An employee digital twin, for example, could help a company develop an AI-driven coach that provides real-time nudges to improve the performance and productivity of employees.”
In an educational setting, this could mean creating digital replicas of classrooms or even students and faculty to analyze and predict learning outcomes. While this might enhance understanding and provide tailored educational experiences, there is a risk that it could also lead to over-reliance on digital proxies.
The application of voice cloning technology in education raises significant ethical concerns. If faculty or student voices are cloned to interact within these digital spaces, it could lead to scenarios where students are receiving feedback or instruction not from their actual instructors, but from AI-driven replicas. This raises questions about authenticity – students may no longer be sure whether they are interacting with a real human or a digital clone and vice versa.
The combination of these technologies could potentially “hollow out” the educational experience by reducing the presence of genuine human interaction, which is fundamental to nuanced learning and the development of critical thinking. It could transform education into a mere transactional process focused on efficiency and data-driven outcomes, rather than a transformative process that values human insight and developmental growth.
Beware and heed as Virgil stated:
“Facilis descensus Averno:
Noctes atque dies patet atri ianua Ditis;
Sed revocare gradium superasque evadere ad auras,
Hoc opus, hic labor est.
(The gates of Hell are open night and day;
Smooth the descent, and easy is the way:
But to return, and view the cheerful skies,
In this task and mighty labor lies.)”
P. Vergilius Maro, Aeneid, Book 6, line 124 – Perseus Digital Library
The true essence of education lies not in the transmission of information but in the nurturing of curiosity, empathy, and the innate human desire to learn and grow.
This article has been produced by Dr. Jasmin (Bey) Cowin, Associate Professor and U.S. Department of State English Language Specialist (2024) As a Columnist for Stankevicius she writes on Nicomachean Ethics – Insights at the Intersection of AI and Education.