How does data sent over the internet know where to go?
I saw a map of undersea internet cables the other day and it's crazy how many branches there are. It got me wondering - if I'm (based in the UK) playing an online game from someone in Japan for example, how is the route worked out? Does my ISP know that to get to place X, the data has to be routed via cable 1, cable 2 etc. but to get to place Z it needs to go via cable 3, 4?
I'd like to know this as well actually but on a physical level, i understand the TCP/IP stack well enough, but what is the circuitry that actually sends the light down the correct cable?
it doesn't send it down the correct cable. It sends it on.
Imagine your friends. you need to talk to somebody. Lets call him Garry. You don't know Garry's contact info. So instead, you pull out your phone, and text Sally, asking her to ask Garry if he knows where your glasses are. Sally pretty much knows every one. Or at least, you thought she did. Reality is she sent to to Becky who sends it on to Steve. Now, Steve is the one who invited Becky to Garry's party, and because... reasons, Becky invited Sally who invited you... so now, Steve relays the question to Garry.
Garry hasn't seen your glasses, but, he does have a weird set of car keys with a giant Charzard key fob... maybe they're yours? So, he sends his reply to Steve, which forwards it to Becky, who sends it to Sally, who giggles and asks if you really have a charzard key fob.
You get the idea. Only unlike people, the data usually doesn't get mangled.
I wrote up a whole thing that didn't post. There's good answers here but I think that, like me, you wanted a more "voltage based" one.
Short answer is they don't. Everything on the network is always listening, and security is based solely off of a handshake. Everything is always employing a fancy multimeter that measures voltage high/low as a 1/0 turning it from bits to bytes etc. The router listens to that and decides where to send it upstream, which it isolates from downstream.
For a realllllly basic example look at the modbus protocol. That's also why industrial equipment folks get real touchy about network access. For things like computers, theres talk back and forth to verify. Modbus is just "if the byte is the thing I do the thing". But fundamentally, that's the physical basis: all devices are always listening, the TCP/IP stack is what tells them what to disregard.
But surely that can't really be true either like if I post a selfie on Instagram in London, some guy's Minecraft server in Minnesota can't be receiving that and be like "oh not for me - ignore". It just seems horribly inefficient. But maybe I'm having trouble conceptualising how fast light is? 😅
And based on another answer ITT by FuglyDuck, it would seem that once you've resolved a domain you do send it to a central hub that then resolves subnets until it gets to it's destination, so I can imagine that it does so by physically sending it down "the right cable" as it gets past each layer to get to the final destination via the recepient's ISP, but imagining it as a giant automated telephone switchboard is all my feeble software brain can comprehend it as and that doesn't seem right either.
~~Edit: well actually network switches do operate on the data link layer, but also not on the physical one?
I guess what I'm trying to say is: if I'm sending a packet to Japan from the UK - once my packet reaches a hub of a first tier ISP, does it just go down every oceanic cable in every direction, or the one that actually is in the direction of Japan?~~
The answer is that yes - the internet is just a telephone switchboard between what amounts to otherwise isolated networks of ISPs and exchange points physically send light down correct cables with switches:
Yes, sorry, I did oversimplify to the local network. On your local network everything is always listening, but absolutely your home router/modem in Kansas does NOT excite some wires in Tokyo unless you tell it to lol.
And it sounds like you know way more about the software than I do, but I can say with confidence that when a router starts putting ossilating high/low on a cable, everything on that cable "sees" it. I'm fairly sure that's why different address blocks have the limits they do; there's only so many addresses you can have without needing to ossiclate that voltage stupid fast.
You should look into some of the serial examples for raspberry pis/ arduinos, with your software background you'd probably really enjoy it! It's funny to run into things like the fact that you can have issues like the wire not going back to low sometimes, and the myriad physical issues.
And seriously check out MODBUS. It's crazy how "simple" it is. With no handshake and a standardized data format, you can trigger all sorts of stuff. That's the protocol that controls most people industrial things, including GIANT pumps and valves.
The circuitry doesn't determine which cable is the correct one. That is determined by a protocol that associates various IP networks with different network interfaces. So, for example, all data going to 192.168.5.0/24 goes to interface eth0, and 192.168.0.0/24 goes to eth1 and 10.0.0.1 goes to eth2 and so on. Each interface is a separate RJ45 Ethernet port on your router, for example. It doesn't have to be RJ45 it could be your router has a Thick Ethernet or Thin Ethernet connector. Or it could have wifi. Or something else.
Anyway, forwarding the packet to the correct interface / subnet can be done with a static route defined on the router. Another way is dynamic routing using BGP (border gateway protocol) which is an exterior gateway protocol that dynamically routes between your network and somewhere exterior to your network. Yet another protocol is OSPF (open shortest path first) which is used inside a corporate network for dynamic routing.
For any of these the router knows how to send the IP packet to the next hop, another router, which in turn knows how to send it to the next hop.
Where to send is based on the destination IP. The routers know which interfaces and which other routers are responsible for different subnetworks.
It is sort of like how once your mail makes it to a main hub in your state, it is then routed to the main hub for the destination state, and from there to the post office responsible for the destination zip code, and then to the mail route (and hence truck) responsible for the street and number.
So if your destination is 1.1.1.1 maybe there is a router known to be responsible for 1.0.0.0/8 and then it knows what router is responsible for 1.1.0.0/16 and so on until we get to a router that has 1.1.1.1 on one of its subnets then it sends directly to 1.1.1.1.
IPs and packets are well and good and I do have a decent working knowledge of TCP/IP, but what physically is actually happening? Thanks for replying anyway!
Physically, at the physical / link layers, an Ethernet transceiver integrated circuit is used that knows how to take data provided by the cpu and communicate it by sending signals along the RJ45 Ethernet physical layer to communicate with the switch. By looking at the datasheet and IEEE 802 specs one could figure out more detail.