Human-like object concept representations emerge naturally in multimodal large language models
Human-like object concept representations emerge naturally in multimodal large language models
Human-like object concept representations emerge naturally in multimodal large language models
Isn't this just because LLMs use the object concept representation data from actual humans?
The object concept representation is an emergent property within these networks. Basically, the network learns to create stable associations between different modalities and associate an abstract concept of an object that unites them together.
But it's emerging from networks of data from humans, which means our object concept representation is in the data. This isn't random data, after all, it comes from us. Seems like the LLMs are just regurgitating what we're feeding them.
What this shows, I think, is how deeply we are influencing the data we feed to LLMs. They're human-based models and so they produce human-like outputs.