I mean, I know it's scary, but I'll admit it is impressive, even when I watched it with jaded "every day is another AI breakthrough" exhaustion.
The subtle face movements, eyebrow expression, everything seems to correctly infer how the face would articulate those specific words. When you think of how many decades something like this would be in the uncanny valley even with a team of trained people hand -tweaking the image and video, and this is doing it better in nearly every way, automatically, with just an image? Insane.
It's pretty wild that this is the tech being produced by the trillion dollar company who has already been granted a patent on creating digital resurrections of dead people from the data they left behind.
So we now already have LLMs that could take what you said and say new things that seem like what you would have said, take a voice sample of you and create new voice synthesis of that text where it sounds a lot like you were actually saying it, and can take a photo of you and make a video where you legit look like you are saying that voice sample with facial expressions and all.
And this could be done for anyone who has a social media profile with a few dozen text posts, a profile photo, and a 15 second sample of their voice.
I really don't get how every single person isn't just having a daily existential crisis questioning the nature of their present reality given what's coming.
Do people just think the current trends aren't going to continue, or just don't think about the notion that what happens in the future could in fact have been their own nonlocal past?
It reminds me of a millennia old saying by a group that were claiming we were copies in the images of original humans: "you do not know how to examine the present moment."
Edit - bonus saying on the topic: "When you see your likeness, you are happy. But when you see your images that came into being before you and that neither die nor become visible, how much you will have to bear!"