If you’re listening to or have already listened to the beginning of this episode, you probably noticed that the voice introducing Kimberly Adams sounded a bit off, right?
That’s because it’s an audio “deepfake,” and it was created by Yisroel Mirsky at the Offensive AI Research Lab through a process called voice cloning.
It uses software to study a short sample of someone’s voice and create a deepfake, which in this case took only 15 seconds to generate, according to Mirsky.
It’s far from a perfect replica, but it turns out that scammers recently started to use deepfake audio for telephone scams and in some cases have succeeded.
Marketplace’s Kimberly Adams recently spoke with Kyle Alspach, a cybersecurity reporter with Protocol, about how the tech behind audio deepfakes works and how they’re being used in phone scams. The following is an edited transcript of their conversation.
Kyle Alspach: The technology involves having [artificial intelligence], deep learning, and it takes an audio sample of someone’s voice. It can be as little as three seconds, and it trains itself on the way that this person sounds. And it creates a model that can then be used to replicate that person’s voice. The current way that it works is you have to use text to speech, which is just you type in a phrase, and then it speaks using the cloned voice.
Kimberly Adams: How are these audio deepfakes being used in scams?
Alspach: So they’re, at least we know, being used to target businesses at this point, especially larger businesses. Someone will call up pretending to be someone else with someone else’s voice, typically some kind of executive or someone’s boss, and ask them to transfer funds or ask them for password credentials, that kind of thing. And in some cases, they have been successful.
Adams: Do you have a sense of how common audio deepfakes are now? And where do experts in the industry see it going?
Alspach: You know, among the larger businesses, I think more and more of them are starting to see these because they’re really ripe targets for this kind of thing. But right now, I don’t think that it’s something that a lot of people are seeing, but it is the beginning of this kind of technology being used in this way.
Adams: What do cybersecurity experts suggest that regular people or businesses do to avoid falling for these deepfake audio scams?
Alspach: For business purposes, for instance, if you’re going to transfer money, you might want to add some additional steps to that process and involve someone saying some kind of challenge phrase or something like that, that everyone’s agreed on beforehand, [to] prevent that from happening. But the biggest thing is just to be aware that it might happen to you and to pay attention when someone calls you up and make sure that it actually sounds the way you would expect, and they are asking you for things that they normally would be asking you for. It’s really at this point kind of a low-tech approach to defending against this kind of thing. But really, it’s just creating the possibility in your mind of this happening. And I guess, unfortunately, you kind of have to become more skeptical now when people call you up.
You can read more of Alspach’s reporting on audio deepfakes here.
Yisroel Mirsky told us that the tech he used to generate the deepfake audio for Alspach’s article, and for our show, is “relatively old.”
Alspach also mentioned in his piece that there is an open-source voice cloning tool online that anyone — even scammers — can download and potentially use.
But as we found out, getting that software to work isn’t a simple process of downloading and hitting the Install button.
Tech barriers can stop you at a number of steps in the process. Like, if you don’t know the coding language Python or if your computer isn’t powerful enough, you’ll likely be unable to use it, as our producer Daniel Shin discovered.