Scientists at Stanford University, the Max Planck Institute for Computer Science, Princeton University and Adobe Research have demonstrated how the deepfake technique has become increasingly accessible and compelling. It consists of the use of software based on machine learning, with the purpose of changing the speech of a speaker in a particular video, only from the edition of his transcription of text.
The researchers conducted a study with the presence of 138 volunteers. In the tests where the fake videos were displayed, about 60% indicated there were issues; already in relation to the original versions of the videos, only 80% affirmed that they were legitimate. Although the result appears to be somewhat controversial, the fact that participants learned that it was a video-editing research may have influenced responses.
How are fakes created?
The technique only works well on videos that focus on a speaker and requires at least 40 minutes of input data, which is the basis of artificial intelligence training of the software used. For best results, it is important that fake speech is not very different from the original.
Base videos go through a few steps until fakes are created. They are scanned to capture phonemes and the 3D model from the bottom of the speakers’ face. Then the phonemes are combined with the facial expressions of each sound.
By changing the text transcript of the video, the program combines all the data collected to generate new images, in which the speaker produces the movements according to the phonemes and sounds required by the inserted text. Then the process is “pasted” on top of the source video to get the final result.
In the example below, actor Bill Hader had his face changed by the face of Arnold Schwarzenegger.
Benefits vs. malfunctions
The researchers suggested that the technique could bring some benefits: movie studios, for example, could fix misspelled speech without the need to re-record the scenes. However, if fake news in text already causes enormous damage, which is news that can be easily unmasked, one can imagine the negative impact that fake videos could generate in society if they were obviously used with bad intentions. Fortunately, at the moment, they are being used only in jokes.