Yes, but game engines also hold the entire world inside themselves. There's no guessing, no estimating, no making sure that what it's looking at is actually a human or a bush - it already knows that.
The problem with computer vision being lazy is that it can't ignore something without understanding what it's looking at, and it can't understand what it's looking at without analyzing the data. It's a circular problem, and will be ridiculously hard to solve - the crux of the issue is that we as people are analyzing that same data, we just don't realize it.
Humans are bad at it, too. If you've ever ridden a bike or motorcycle, you quickly learn that car and truck drivers simply aren't looking for 2 wheelers. And therefore they don't see them. (I think this reinforces your point).
Anyone who has been a pedestrian is also well acquainted with this. Cars watch for and expect other cars, not people (bikes, mopeds, etc). The amount of times I’ve almost been hit by someone who is staring intently at oncoming traffic and flooring it without looking anywhere else is too damn high.
That's also true. It's less of a problem in pedestrian-heavy walkable cities and towns. But in the average American city or town covered in stroads where car is king, it's a big problem.
They are, but we've mostly got our subconscious doing it still, it's not that we're always doing big tasks we just have dedicated processes for it, so maybe that's one way to tackle the problem, specialised processes for sorting data types that engages the main process to do the processing of said data.