Zoom calls have exploded during the current coronavirus lockdown. But if you’ve already exhausted the possibilities of different angles and backdrops for your video calls, why not try being a different person entirely? You know, like former Apple CEO and co-founder Steve Jobs.
At least, that’s the tech demo/experiment created by coders Ali Aliev and Karim Iskakov. They’ve developed an Animoji or Memoji-style tool called Avatarify that lets users superimpose a real-time mask onto themselves during video calls.
How does Avatarify work?
Avatarify is based on an artificial neural network called First Order Motion Model, developed by researchers in Italy. Trained on more than 12,000 videos, it makes it possible to animate a still avatar image without manual tuning. That means users need only add a still image of a face. Avatarify will then turn it into an animated mask.
It works by extracting key points from the webcam video of your actual face, and then tracking key points as you move. It next applies that information to the avatar image to move, say, its nose at the same time as your own. The animated image is then streamed to Zoom, Skype, or whatever other video calling service you wish to use.
To animate a person, just place their picture in a specific folder, fire up the Avatarify app, and then start a video call.
“That idea came spontaneously when I stumbled upon the First Order Motion Model,” Ali Aliev told Cult of Mac. “I was surprised by its performance in terms of animation quality … I decided to make fun of my colleagues, [and] quickly created a prototype, and broke into our weekly Zoom call with [the] face of famous MMA fighter Khabib Nurmagomedov. They appreciated the joke. Karim, who is also my colleague, got an idea. He ported it onto Mac and authored the video with fake Elon Musk.”
The surprisingly convincing results can be seen below.
The Steve Jobs preset
When they decided to publish Avatarify, the pair made the decision to include a preset of avatars. “We admire Steve for his commitment to great ideas, so definitely wanted to have him in the avatars preset,” Aliev said.
The results aren’t perfect, of course. The model was trained on 256×256 image crops, meaning that the quality — while perfectly acceptable for Zoom calls — isn’t exactly going to stand up to HD scrutiny. Better training of the AI, particularly with extreme face angles, would help improve that.
More significantly, unlike the famous deepfake videos you’ve probably seen, Avatarify works with 2D images. As Aliev said, it knows nothing about the 3D world. That’s most notable when it comes to head rotations. (Again, think of this like wearer a flat cardboard mask, rather than a 3D one.)
Avatarify does have some limitations, though
“The other side of the problem is performance,” he said. “Right now you need a GPU-powered computer to run Avatarify with a reasonable 30 FPS. Running it on a CPU-only device [is insufficient] for pleasurable video conferencing. We think it’s possible to speed-up the model so that it works real-time on a CPU-machine [such as a] MacBook. but it’s quite a resource-intensive research problem that requires a lot of efforts and time. Another option for improving performance we are looking at is using cloud GPUs. [That would mean] all heavy computations are done somewhere else, but not on your laptop.”
The one final issue — inescapable for anyone who looks at the Elon Musk demo — is that the voice sounds off. At the end of the day, it’s still your voice coming out of another person’s face. There have actually been some pretty impressive (and scary) demos of deepfake audio recently, able to replicate the voice of famous individuals. But those could not easily be used here. So maybe start practicing your impressions!
Avatarify can be downloaded from online code repository Github. It’s free and open-source, although you’ll need a bit of basic coding understanding to get it up and running. After that? It’s just a matter of waiting for your next Zoom call…