Your voice can be a biometric identifier, like your fingerprint. Does Apple really have to store it on its own servers?
Even in an age of vanishing privacy, people using Apple’s digital assistant Siri share a distinct concern. Recordings of their actual voices, asking questions that might be personal, travel over the Internet to a remote Apple server for processing. Then they remain stored there; Apple won’t say for how long.
That voice recording, unlike most of the data produced by smart phones and other computers, is an actual biometric identifier. A voiceprint—if disclosed by accident, hack, or subpoena—can be linked to a specific person. And with the current boom in speech recognition apps, Apple isn’t the only one amassing such data.
There may be a way to keep this identifier more private. Researchers say Apple and others developing voice recognition applications like Siri could do part of the data processing right on the phone. Then, instead of sending out the full recording, they could transmit specific information that is harder to definitively link to an individual.
“Maybe anything that IDs you should stay on the phone,” says Prem Natarajan, executive vice president at Raytheon BBN Technologies in Cambridge, Massachusetts, a major center for speech recognition research. He says it might be wiser for Apple to “transmit features from speech—and not the speech itself.”
While this approach would put more burden on the phone’s processor and battery, it wouldn’t hurt the quality of the speech recognition. “I think it is safe to say that not having access to the [full voice] signal does not impose any meaningful penalty,” Natarajan says. Limiting the amount of biometric data that gets shared would follow the example of devices such as Microsoft’s Kinect, which for privacy reasons have been engineered to keep such data onboard.
Trudy Muller, an Apple spokeswoman, confirmed that voice recordings are stored when users ask a spoken question like “What’s the weather now?” “This data is only used for Siri’s operation and to help Siri improve its understanding and recognition,” she said. Muller added that the company takes privacy “very seriously,” noting that questions and responses that Siri sends over the Internet are encrypted, and that recordings of your voice are not linked to other information Apple has generated about you. (Siri does upload your contact list, location, and list of stored songs, though, to help it respond to your requests.)
While voiceprints are not as unique as fingerprints, they can positively identify the speaker in many circumstances. The U.S. Department of Homeland Security uses voiceprints to identify frequent travelers who have enrolled in a system to allow speedy border crossings.
To see why voiceprints could matter, consider the murder trial of Casey Anthony, the Florida mother acquitted last year in the death of her two-year-old daughter, Caylee. At one point prosecutors pointed to Internet searches—for “chloroform” and other incriminating terms—made from the accused’s computer. Anthony’s mother testified to having typed in the search term herself, as a misspelling of “chlorophyll.” If the searches had been made by voice on Siri, it might have been possible for prosecutors—and jurors—to determine who actually said “chloroform.” (Apple declined to say whether it has ever received a subpoena for anyone’s voiceprint.)
Meanwhile, if you dictated an inappropriate text or asked Siri about a sensitive medical matter and Apple got hacked (or a malicious employee released data), not only would the embarrassing communication be released, but it would be in your own voice. Natarajan says biometrics could raise entirely new privacy questions. For example, someone searching for the location of a protest against a repressive regime could be in trouble if the data became available to that government. “If you have a group of people asking about protests, you now unfortunately have voice biometrics for those people,” he says.
Some observers, including large technology firms, are raising broader questions about Siri. Last month Technology Review reported that IBM had asked its employees not to use the feature, a decision IBM said was motivated by the need to protect contact lists and other sensitive company information. It’s a concern that other organizations should share, some experts believe. “If I were to run an intelligence agency or a large corporation, I would not allow such a service in-house,” says Radu Sion, a computer scientist at Stony Brook University and a leading researcher on cloud computing security.