How does data sent over the internet know where to go?
I saw a map of undersea internet cables the other day and it's crazy how many branches there are. It got me wondering - if I'm (based in the UK) playing an online game from someone in Japan for example, how is the route worked out? Does my ISP know that to get to place X, the data has to be routed via cable 1, cable 2 etc. but to get to place Z it needs to go via cable 3, 4?
There are things called routers that...route traffic. A dumbed down version is routers talk to other routers to find out what they know about.
If a game server you connect to matches you with someone in Japan, your computer sends a packet with the address in Japan attached to it. Your home router probably has no clue where that is, so it goes to its upstream router and asks if they know, this process repeats until one figures it out and you get a route.
This all happens very quickly, and it's why people say the Internet routes around damage.
Your home router probably has no clue where that is, so it goes to its upstream router and asks if they know, this process repeats until one figures it out and you get a route.
That's not how that works. The router merely sends the packet to the next directly connected router.
Let's take a simplified example:
If you were in the middle of bumfuck nowhere, USA and wanted to send a packet to Kyouto, Japan, your router would send the packet to another router it's connected to on the west coast*. From your router's perspective, that's it; it just sends it over and never "thinks" about that packet again.
The router on the west coast receives the packet, looks at the headers, sees that its supposed to go to Japan and sends it over a link to Hawaii.
The router in Hawaii again looks at the packet, sees that it's supposed to go to Japan and sends it over its link to Toukyou.
The router in Toukyou then sends it over its link to Kyouto and it'll be locally routed further to the exact host from there but you get the idea.
This is generally how IP routing works; always one hop to the next.
What I haven't explained is how your router knows that it can reach Kyouto via the west coast or how the west coast knows that it can reach Kyouto via Hawaii.
This is where routing protocols come in. You can look up how exactly these work in detail but what's important is their purpose: Build a "map" of the internet which you can look at to tell which way to send a packet at each intersection depending on its destination.
In operation, each router then simply looks at the one intersection it represents on the "map" and can then decide which way (link) to send each individual packet over.
The "map" (routing table) is continuously updated as conditions change.
Never at any point do routers establish a fixed route from one point to another or anything resembling a connection; the internet protocol is explicitly connectionless.
* in reality, there will be a few local routers between the gateway router sitting in your home and the big router that has a big link to the west coast
I think the previous comment omitted something, which is why you think it's inefficient: routers don't ask for directions every packet, they record the directions in their route table.
I'm no expert but it seems like the most efficient way with the given technology! The hops between routers are much less frantic than (I think) you're imagining.
To oversimplify, think of it like boxes in boxes where each box is a router.
Your PC is in the first small box. It says "I want to connect to [IP]" and the box says "I don't have that IP, let me ask the bigger box"
The bigger box (your ISP) says "I don't have it either, I'll ask the big box"
The big box says "I don't have it but based on the address, I know it's in this other big box"
Other big box says the same thing and sends it to another small box. That small box has the PC you're looking for and the packet is delivered!
I wouldn't call that "messy and inefficient" but you do you. I'd be curious to know what's a "clean and efficient" solution for you when it comes to routing packets around the planet :)
it's not efficient from the perspective of organization. But the thing nobody tells you here is that packets have no predefined route, they take whatever route gets them there optimally. So it's highly redundant, and very fault tolerant. When you consider that, for what it does, it's a highly efficient routing system.
To the point where you could cut an undersea cable, and traffic would still route perfectly fine, albeit probably a lot slower, assuming that isn't your only connection of course. The fact that it works it all is kind of a miracle.
Oh wow, this unlocked a memory! Pretty sure I watched back in school. Quite informative, though it felt like it skipped a lot between leaving the host computer and reaching the destination - is it just the same process over and over until it reaches the right place?
Yes, the packet passes through routers at each stage and they direct the packet to the ‘closest’ path based on its destination, until the final router has the destination on its network. This can happen a few times (for something in your ISP network), or 10-30+ times for something further away.
So at a basic level, well only talk about routers. Every computer/server on a network has an address. When your computer wants to talk to another it attaches the IP address of the destination computer to every piece of data that leaves your computer saying where that data wants to go.
It goes from your computer to your router which has a table of the addresses it knows (your network at your house) and then an address of another router that it sends everything that it doesn't know.
It does this a few times before your data gets to a router that says "oh, I know a router that knows someone that knows where that is" and it sends it that way. Until it reaches a router that knows the specific computer to send it to.
A packet is like a sealed mailing envelope. Its headers are like things written on the face of an envelope, including an address. Chunks of data on the internet are so many letters in these envelopes, carried and delivered by a network of other computers.
And the Domain Name (Google.com) get's converted from words we understand to the IP address. This is the Domain Name System, or DNS. Everyone on the network agrees that Google.com equals 142.250.189.174. If that address changes, the change gets passed through the system until everyone agrees on the new IP address. DNS is how your computer learns the address.
The simplest explanation is that my computer doesn’t know where to go for everything but does know where to go to get answers. It sends its traffic to the place that will know where to send things. Rinse and repeat until you finally hit the place you wanted to go.
Comments are correct here with one missing high level component for routers. That is the very top level routers are designed for tier 1. I started an internet company and we got large enough to decide to become a tier 1 provider. There is one big difference in this configuration is that we publish our own blocks of IPs and we listen for published IPs. We have routers that essentially maintained a list of where all the IPs or block of IPs worldwide needed to go. More importantly, I would send out a list of my IP blocks that would propagate across all the tier 1 routers across the world. That could take an hour but more likely minutes.
Having this allowed me to essentially connect to the internet at zero cost. There is some cost to be assigned IPs but I was trusted. While I say zero cost, I still had to pay for large bandwidth dark fiber to new York or other major meet me points. I also had to pay rack space to put a tier1 router into these buildings. But what is really gives me is the ability to have multiple connections to the pipes and because I publish my own IPs, I can balance all the routes and other providers can find the best way to me thru a process called weighing. Also if I loose a connection which is rare at this level, I could rapidly and automatically republish my route on working connections and usually within 15 minutes, all the routers in the world would know. 15 minutes actually is likely long. These days 5 minutes.
Now the interesting part of this, I publish my own IPs. I have to be extremely careful as with a single stroke, I could say I own all the IPs to China. Well likely a few strokes. I certainly could make a simple mistake and take control of a shit load of IPs. That means suddenly traffic could come to me that was destined for another country. More correct, because they are publishing, it would just make a mess and take some IPs down. If I publish a big block in China, I would essentially DOS myself because the pipe sizes I buy are factors smaller. Now this is a trusted system because we all connect together randomly. There is and can not be any central control as we all need to publish freely for this to work. But if I were to screw up and say divert a shit load of IPs destined to say Washington, it would rapidly be figured out and I would rapidly be determined to not be trusted. I would be shut down physically at some point.
Essentially I have fairly normal routers with one feature that allows them to dynamically keep track of all the routes worldwide and to periodically publish all the IPs I own.
I'd like to know this as well actually but on a physical level, i understand the TCP/IP stack well enough, but what is the circuitry that actually sends the light down the correct cable?
it doesn't send it down the correct cable. It sends it on.
Imagine your friends. you need to talk to somebody. Lets call him Garry. You don't know Garry's contact info. So instead, you pull out your phone, and text Sally, asking her to ask Garry if he knows where your glasses are. Sally pretty much knows every one. Or at least, you thought she did. Reality is she sent to to Becky who sends it on to Steve. Now, Steve is the one who invited Becky to Garry's party, and because... reasons, Becky invited Sally who invited you... so now, Steve relays the question to Garry.
Garry hasn't seen your glasses, but, he does have a weird set of car keys with a giant Charzard key fob... maybe they're yours? So, he sends his reply to Steve, which forwards it to Becky, who sends it to Sally, who giggles and asks if you really have a charzard key fob.
You get the idea. Only unlike people, the data usually doesn't get mangled.
I wrote up a whole thing that didn't post. There's good answers here but I think that, like me, you wanted a more "voltage based" one.
Short answer is they don't. Everything on the network is always listening, and security is based solely off of a handshake. Everything is always employing a fancy multimeter that measures voltage high/low as a 1/0 turning it from bits to bytes etc. The router listens to that and decides where to send it upstream, which it isolates from downstream.
For a realllllly basic example look at the modbus protocol. That's also why industrial equipment folks get real touchy about network access. For things like computers, theres talk back and forth to verify. Modbus is just "if the byte is the thing I do the thing". But fundamentally, that's the physical basis: all devices are always listening, the TCP/IP stack is what tells them what to disregard.
But surely that can't really be true either like if I post a selfie on Instagram in London, some guy's Minecraft server in Minnesota can't be receiving that and be like "oh not for me - ignore". It just seems horribly inefficient. But maybe I'm having trouble conceptualising how fast light is? 😅
And based on another answer ITT by FuglyDuck, it would seem that once you've resolved a domain you do send it to a central hub that then resolves subnets until it gets to it's destination, so I can imagine that it does so by physically sending it down "the right cable" as it gets past each layer to get to the final destination via the recepient's ISP, but imagining it as a giant automated telephone switchboard is all my feeble software brain can comprehend it as and that doesn't seem right either.
~~Edit: well actually network switches do operate on the data link layer, but also not on the physical one?
I guess what I'm trying to say is: if I'm sending a packet to Japan from the UK - once my packet reaches a hub of a first tier ISP, does it just go down every oceanic cable in every direction, or the one that actually is in the direction of Japan?~~
The answer is that yes - the internet is just a telephone switchboard between what amounts to otherwise isolated networks of ISPs and exchange points physically send light down correct cables with switches:
The circuitry doesn't determine which cable is the correct one. That is determined by a protocol that associates various IP networks with different network interfaces. So, for example, all data going to 192.168.5.0/24 goes to interface eth0, and 192.168.0.0/24 goes to eth1 and 10.0.0.1 goes to eth2 and so on. Each interface is a separate RJ45 Ethernet port on your router, for example. It doesn't have to be RJ45 it could be your router has a Thick Ethernet or Thin Ethernet connector. Or it could have wifi. Or something else.
Anyway, forwarding the packet to the correct interface / subnet can be done with a static route defined on the router. Another way is dynamic routing using BGP (border gateway protocol) which is an exterior gateway protocol that dynamically routes between your network and somewhere exterior to your network. Yet another protocol is OSPF (open shortest path first) which is used inside a corporate network for dynamic routing.
For any of these the router knows how to send the IP packet to the next hop, another router, which in turn knows how to send it to the next hop.
Where to send is based on the destination IP. The routers know which interfaces and which other routers are responsible for different subnetworks.
It is sort of like how once your mail makes it to a main hub in your state, it is then routed to the main hub for the destination state, and from there to the post office responsible for the destination zip code, and then to the mail route (and hence truck) responsible for the street and number.
So if your destination is 1.1.1.1 maybe there is a router known to be responsible for 1.0.0.0/8 and then it knows what router is responsible for 1.1.0.0/16 and so on until we get to a router that has 1.1.1.1 on one of its subnets then it sends directly to 1.1.1.1.
IPs and packets are well and good and I do have a decent working knowledge of TCP/IP, but what physically is actually happening? Thanks for replying anyway!
basically, the entire TL;DR of this post, from someone who is a linux nerd, that knows some things about networking.
Everything knows where everything is, and if it doesn't it knows something else that does, and if that doesnt, well, repeat adnauseam. The technicality here is that not every individual point knows where every other individual point is, but it knows it's immediate neighbors. And those immediate neighbors do as well, at the high routing level, think data center.
Think of it like a tree structure, but a really fucking big one, and with a lot of circular and unusual connection points. You can get from one point, to any other point. It's just a matter of knowing how.
Also, to be pedantically accurate here, the internet is a hodge podge of packet flinging hardware, "routes" aren't really a thing. Packets will take whatever route is determined to be optimal by the hardware it interacts with. I.E. it dynamically changes as needed, that's why your ping is always variable
I didn't see this mentioned yet, but IP ranges are normally assigned by generic location, so each of thes routers routing to the next one (hops) basically have a memory table from prior routes/configured by ISPs to say "this is the best current upstream router to route to for this destination". They also store the distance between routers and aim for the smallest distance. this is how they are fast and is called routing tables.
Routing tables can be misconfigured causing major outages and old routers used to be able to only store a smaller table so 512k day happened. We already passed the next one 768k though ISPs mostly had their crap together for that one.