It's the same thing though, no? Whatever power it takes to run a query in dedicated hardware in a data center is the same or lower than the power to do it on a cell phone. On a cell phone it's even worse because charging the battery, then using battery power to run AI queries is less efficient than just powering a GPU to run several queries in parallel. That's without getting into other efficiencies of scale and the fact that a data center is designed to keep power usage low compared to an iPhone which is designed to be the worst consumer product someone will pay $1000 for.
Not even close. A phone is lightyears more efficient than a server because it has to run on a battery. A server just needs to not outpace the air conditioning unit positioned right in front of it. Servers do a lot more per watt than say a desktop or maybe even a laptop. But phones do so much with almost no power otherwise you'd get an hour of battery life.
You're right that phones are more efficient than I gave them credit for, but power costs are absolutely a consideration for the tech companies that are training large models.
Besides, how much more power efficiency does a phone have that it can make up for only doing 1 query at a time compared to a GPU running several at a time, benefiting from cache locality since it's just using the same data over and over for different queries, etc? I highly doubt that the efficiency of scale could be outweighed by mobile hardware's power usage edge.
It's so great. You can't even buy a new washing machine now without AI being crammed into it. I'm sure the next kettle I buy will also have AI, somehow