I spent last night staring into a hypnotic grid of fuzzy Travis Kalanicks, as reconstructed by an AI.
For the last couple of days, I’ve been experimenting with FakeApp, a program that uses Google’s TensorFlow machine learning to morph faces in videos. The app is mostly associated with pornographic “deepfakes,” where celebrities’ faces are pasted onto a porn actor’s body. (That’s been widely deemed nonconsensual pornography and banned everywhere from Pornhub to Reddit, which is one of the sternest rebukes something can get on the internet.) But watching safe-for-work deepfakes, like videos of Nic Cage starring in every movie, is innocently silly fun. Making them turns out to be surprisingly fun too, even if so far, I’ve basically failed at it.
Deepfakes news coverage rightly emphasizes how much easier this kind of face-swapping is getting, as long as you’ve got a reasonably powerful PC. But easier isn’t the same as foolproof. Being a FakeApp amateur is a constant cycle of “I can’t believe this is working,” “Why isn’t this working?,” and “I’ve lost all conception of what the word ‘working’ means.”
The steps aren’t that complicated. You need a lot of pictures of your face donor, and a lot of pictures from the main video you’ll be using. (Some tutorials refer to “celebrity” and “porn,” which is more memorable but unfortunately creepy.) FakeApp crops the faces from each image set and launches a training sequence that maps frames from your two subjects onto each other. Finally, it uses the model to seamlessly overwrite one face with another. Here’s what a good result looks like:
And here’s one of my attempts:
In the likely event that this video doesn’t make any sense, that’s ex-Uber CEO Travis Kalanick giving Michael Douglas’ “Greed is Good” speech from Wall Street, as seen in the recent legal fight between Uber and Waymo. It’s moderately worse than my best effort so far, where the ghost of Elon Musk possesses Jeff Bezos:
What’s the problem here? Well, I haven’t exactly mastered the training system. Users don’t control the algorithm, but the model quality depends on your source images. There are a few obvious tips: go through the extracted faces and delete any bad crops before starting, for instance. But even with tutorials, I don’t have an intuitive sense of how similar the lighting or skin tone has to be, or when the model is optimally trained. (There’s a “loss” number that goes down over time, but no single definitive stopping point. The Kalanick/Douglas model training ran overnight, while Bezos/Musk only trained a few hours.) I can’t tell when my raw material is too bad to work with, and when I’m just screwing up.
I’m figuring FakeApp out pretty slowly, since I can only run a few models each day. But as a side effect, I’ve become ridiculously addicted to watching the training models work. Seriously, look at it:
Those screenshots were taken over half an hour of training, and I probably checked in on the model every couple of minutes. It’s also immensely satisfying to go to bed with alternating rows of crisp still frames and flesh-toned blobs, and wake up to find that an AI has sorted them into this:
Sure, the model still produces half-human monstrosities whose provenance I barely understand. I’ll probably still go home and watch the next one for hours.
The end point of AI face-swapping is that anybody can convincingly fake a video with a couple of clicks, potentially sabotaging our confidence about the entire medium, or at least forcing us to be more careful about trusting it. But for now, there’s something fun about being able to see exactly what’s going on with FakeApp, whatever the end results.