With a little help from PVC pipe Wisconsin researchers fool some voice identification systems

WUWM 89.7 FM | By Chuck Quirmbach

Published August 21, 2023 at 10:04 AM CDT

Shimaa Ahmed is a PhD student at UW-Madison. She investigated whether it was possible to alter the resonance, or specific frequency vibrations, of a voice, to defeat automatic speaker identification devices.

Is it possible to fool computer systems that use a person's voice as a passcode? Some Wisconsin engineers say the answer is yes, and that they've done so partly by using plastic pipe you can find in a hardware store.

Some online banking systems use automatic speaker identification — more simply put — an account holder's voice as a passcode. UW-Madison Electrical and Computer Engineering Prof. Kassem Fawaz says also Apple iPhone users are likely familiar with the virtual assistant Siri, which only responds to the owner.

"The reason Siri only responds to you is because they employ this technology called speaker identification. So, they get some sort of a voice print, which is similar to a finger print , and they can ascertain whether it came from you, or someone else. And this is how Siri can make sure the user or the owner of the phone is talking to it," Fawaz says.

But Fawaz and two UW-Madison Ph.D. students are part of a multi-university effort to improve digital security. And so, they've been looking into ways to fool the computer systems. They and others have already been working on cloning the human voice and having a computer speak like that person.

More recently, Fawaz says the engineers realized they could backpedal from digital and go analog to trick many speaker identification systems.

"And this is how we got into the idea of designing some sort of an analog device that doesn't have any sort of digital electronics that allows you to impersonate others," Fawaz says.

Kassam Fawaz, assistant professor in the Department of Electrical & Computer Engineering in the College of Engineering at the University of Wisconsin–Madison, is pictured in a studio portrait on Feb. 15, 2022. Fawaz is one of twelve recipients of the 2022 Distinguished Teaching Award (DTA). (Photo by Althea Dotzour / UW–Madison)

Ph.D. student Shimaa Ahmed says she first tried speaking through the cardboard tube found in many paper towel rolls, imitating celebrities.

"And it worked. When I passed some of the celebrity voices through this kitchen paper towel tube it changed the prediction of those celebrities," Ahmed says.

Eventually, Fawaz bought some plastic PVC pipe from the plumbing parts aisle of a hardware store and the team began using that.

Fawaz says they realized the tubes had shortcomings.

"Regular plumbing tubes have fixed dimensions, right? You can control the length by cutting it. But you can't control the diameter. For some of the experiment we needed tubes with a special diameter, which you can't find at Ace Hardware. So what we needed to do fabricate these tubes, and the easiest way to fabricate the tubes is to 3D print them," Fawaz says.

The team turned to then-undergrad, now Ph.D. student, Yash Wani, who 3D printed some tubes. Wani says the work changed his academic focus.

"It was very cool, honestly, that that's how I wound up doing a Ph.D. It was cool enough for me to continue doing that," he says.

The researchers developed an algorithm, or rigorous instructions, that figured out the pipe dimensions needed to transform the resonance — that's tone intensity and quality — of almost any voice to imitate another.

In one recording, Ahmed read from a conventional speech dataset and sounded a bit like actress Lisa Kudrow — you know, Phoebe from Friends.

"There was nothing on the rock," went the reading. Ahmed also tried an imitation of actress Kelly Reilly, who's in the TV series Yellowstone. "'I have no idea,' replied Phillip," Ahmed said on the recording.

They were not exact imitations. But they were good enough to get through digital attack filters of a voice authentication system and fool it. Also using other student voices in their experiment, the UW-Madison engineers report deceiving the security systems 60% of the time in a test of 91 voices.

Good enough to write a paper and for Ahmed to present the findings at a security symposium this month in California.

"People were like curious as to how we can make devices like tubes, but more complicated, that we can impersonate any person," Ahmed says.

Fawaz says all the makers of speaker identification systems — Apple, Google, IBM, Microsoft and others — are aware of the various shortcomings of their technology and are trying to fix them.

Funders of the UW research include the National Science Foundation and DARPA — the big research arm of the Defense Department.