Why did S3/object storage succeed while WebDAV apparently failed?
WebDAV has been around a lot longer and does many of the same things as object storage. It also has support for random access read/writes where object storage requires you to download, edit, and re-upload the whole file. Seems like a no-brainer if you wanted to offer cloud storage to customers.
I thought maybe supporting large uploads was the draw, but WebDAV can support chunking, so you don't need to allocate extra server resources to accommodate large files.
I use both daily, and WebDAV just seems like it does everything better: object storage feels like throwing files in a junk drawer and WebDAV more like an organized filing cabinet.
Aside from Nextcloud and a few FOSS applications, the only big thing I recall that adopted WebDAV was Frontpage back in the day.
So, what am I missing? What makes object storage so compelling that it became ubiquitous while WebDAV is practically a legacy spec?
To give you a real answer, from someone who loves WebDAV and has written a WebDAV server with an S3 backend, object storage is easier/possible to run at scale and serves a different purpose.
Object storage is and always has been based on a key-value model. You put a key and value in, and later you can request that key to get that value. It technically has no concept of hierarchy. WebDAV supports so much more than that. WebDAV has collections (hierarchy), live and dead properties (S3 has something similar to these), methods like MOVE and PROPFIND, and a system of hierarchical locking (depth 1 locking on a collection and depth infinity locking on an entire namespace).
This means that in order to build a WebDAV server, you need to know a lot of information about what exists in the data storage. S3 is a lot “dumber” in that regard. The funny thing is S3 has added functionality that essentially rewrites most of WebDAV in a more convoluted form. Whereas on WebDAV you can just propfind a collection with depth 1, on S3 you need to list keys with a prefix and delimiter, then make additional requests for any other props you may need.
This is where the “at scale” thing comes in. If you have hundreds of millions of keys in a bucket, getting them all back at once would certainly break your system, and probably would tax the server unnecessarily. So basically the answer is S3 is designed for scale.
That being said, S3 is not really designed for humans to interact with. This is where the “different purpose” thing comes in. It doesn’t have a real concept of hierarchy, just common prefixes and delimiters. So something like renaming a directory would require copying every object with that prefix to a new key, then deleting the originals (which is what my S3 adapter does for my WebDAV server). S3 is more meant to be used with something like UUIDs or hashes for keys. Keys that don’t change. WebDAV is designed more like a file system.
I hope that explains it well.
PS: Two minor corrections, WebDAV itself does not support random writes. That’s a separate RFC that’s not part of WebDAV, but is perfectly compatible, and many WebDAV servers offer that functionality. Also S3 does support random read requests via the Range header.
An additional point is that CardDAV and CalDAV are both extensions of the WebDAV spec, and are widely used by a number of products, so WebDAV is definitely not a legacy spec. It’s the foundation to two very popular specs supported by billions of devices.
Wait, so when I want a directory listing from WebDAV and the directory contained 1000 files, I would always have to wait for the whole thing? That explains so much.
Thanks for the detailed reply. That pretty much answers it.
I definitely agree on the different purposes, but sadly that doesn't help where object storage is used where it really doesn't make sense (my org replaced their fileserver with object storage and a client sync app - grr).
WebDAV itself does not support random writes. That’s a separate RFC that’s not part of WebDAV, but is perfectly compatible, and many WebDAV servers offer that functionality
Ah, true. I was looking at SabreDAV specifically which does support it and made a leap that it was part of the spec.
Also, I am definitely going to check out your Nephele Serve project. Thanks for mentioning that.
I don't know much about the history, but I would guess that adoption was driven by the actual service that was provided, not how good the protocol was. AWS did their own thing instead of adopting WebDAV, who knows why. Then people started using S3 and building stuff on it since it was cheap. Now people build services that are S3 conformant so that the stuff built on S3 can be migrated to it.
When S3 was released, the huge draw was its pay-as-you-go model, not its new protocol. If amazon was using webdav instead of making their own protocol, I bet it'll still got popular.
S3 succeeded due to the scaling capabilities and the ability to abstract completely away from a server or disk. The straight forward Key/Value nature of the s3api was a big assistance in achieving the scaling and adoptability.
Comparing it to WebDav seems like comparing apples and... an orange smoothie.
Couldn’t say for sure but WebDAV probably would be clunky if fronted by a distributed database. The beauty of S3 is you add more servers, add more disks, and bam you’ve got more S3. That happens most easily when the metadata system sitting in the front can expand easily. I don’t know how easy that would be to plumb up with WebDAV. Whether or not one was better here, S3 ultimately won because it’s a primitive API that was essentially impossible to fuck up.
I'm only cursorily familiar with WebDAV, but I think the needs of cloud storage aligned much better to the object storage model than WebDAV's file/directory structure. For example, in a distributed cloud across continents, referencing a file in WebDAV might have a canonical path, but object storage would just need a key or hash. And by using a key/hash, automatic deduplication is achieved, since the same object should hash to the same key. File paths necessarily imply context, but the point of clouds is to be homogeneous. If paths need to be world-unique but locally-cached, then the path is just a unique identifier and we slowly end up with the database-like semantics of object storage anyway.
Phrased another way, a file/directory structure imparts an organization to the contents of those files. Cloud doesn't need that organization, so throwing stuff in the junk drawer is perfectly reasonable.
This is funny because most object storages now use keys that represents a path. For example, you can host a website on S3 with folders for js/css/etc and it "just works".
Thanks. So content-based addressing is the draw then? I guess I can see that. Unfortunately, that's one of the things I really dislike about it (and why it feels like throwing files in a junk drawer lol).
Dunno but I remember trying WebDAV back in the day when my webhost offered it as an alternative to FTP and I remember it not working very well for that.