views
Human ability to detect artificially generated speech is not reliable. A study showed that humans can detect deepfake speech only 73 per cent of the time.
Deepfakes are synthetic media intended to resemble a real person’s voice or appearance. They fall under the category of generative artificial intelligence (AI), a type of machine learning (ML) that trains an algorithm to learn the patterns and characteristics of a dataset, such as video or audio of a real person, so that it can reproduce original sound or imagery.
Researchers at University College London used a text-to-speech (TTS) algorithm trained on two publicly available datasets, one in English and one in Mandarin, to generate 50 deepfake speech samples in each language.
These samples were different from the ones used to train the algorithm to avoid the possibility of it reproducing the original input.
These artificially generated samples and genuine samples were played for 529 participants to see whether they could detect the real thing from fake speech. Participants were only able to identify fake speech 73 per cent of the time, which improved only slightly after they received training to recognise aspects of deepfake speech.
“Our findings confirm that humans are unable to reliably detect deepfake speech, whether or not they have received training to help them spot artificial content,” said UCL’s Kimberly Mai, in the study published in the journal PLOS ONE.
“It’s also worth noting that the samples that we used in this study were created with algorithms that are relatively old, which raises the question whether humans would be less able to detect deepfake speech created using the most sophisticated technology available now and in the future.”
The next step for the researchers is to develop better automated speech detectors as part of ongoing efforts to create detection capabilities to counter the threat of artificially generated audio and imagery. Though there are benefits from generative AI audio technology, such as greater accessibility for those whose speech may be limited or who may lose their voice due to illness, there are growing fears that such technology could be used by criminals and nation states to cause significant harm to individuals and societies.
Comments
0 comment