Yet another AI genie's out of the bottle. The…geniuses over at Microsoft have unveiled their new VASA technology for "Lifelike audio-driven talking faces generated in real time." The tech can use a single snapshot, then animate it to match whatever audio is plugged into it. "Make anyone say anything," is how this reviewer describes it.
It has to be seen to be believed (obviously you need to click the sound on):
While it has a slight air of videogame-cut-scene, would you have noticed these aren't real if you weren't told in advance?
Watching them and knowing, it is chilling just how dead the eyes look. I don't know if it's the blinking rate, or if human eyes IRL have some subtle twinkling that we're not aware of, but the demo is downright creepy. Even worse, it's just a matter of time before they attend to this and get it "right."
And as this example shows, it looks like some animation jobs are going to evaporate:
Enter a caption (optional)A full demonstration:
Enter a caption (optional)
Create a Core77 Account
Already have an account? Sign In
By creating a Core77 account you confirm that you accept the Terms of Use
Please enter your email and we will send an email to reset your password.
Comments
I heard someone say something poignant on this topic the other day: "It's not that AI will make us believe a fake thing is real, the problem is it will make us think real things are fake"
The flexible teeth are a bit creepy