How do HTTP servers figure out Content-Length?

94 points35 comments7 hours ago
simonjgreen

Along this theme of knowledge, there is the lost art of tuning your page and content sizes such that they fit in as few packets as possible to speed up transmission. The front page of Google for example famously fitted in a single packet (I don't know if that's still the case). There is a brilliant book that used to be a bit of a bible in the world of web sysadmin from the Yahoo Exceptional Performance Team which is less relevant these days but interesting to understand the era.

https://www.oreilly.com/library/view/high-performance-web/97...

show comments
hobofan

I think the article should be called "How do Go standard library HTTP servers figure out Content-Length?".

In most HTTP server implementations from other languages I've worked with I recall having to either:

- explicitly define the Content-Length up-front (clients then usually don't like it if you send too little and servers don't like it if you send too much)

- have a single "write" operation with an object where the Content-Length can be figured out quite easily

- turn on chunking myself and handle the chunk writing myself

I don't recall having seen the kind of automatic chunking described in the article before (and I'm not too sure whether I'm a fan of it).

show comments
pkulak

And if you set your own content length header, most http servers will respect it and not chunk. That way, you can stream a 4-gig file that you know the size of per the metadata. This makes downloading nicer because browsers and such will then show a progress bar and time estimate.

However, you better be right! I just found a bug in some really old code that was gzipping every response when it was appropriate (ie, asked for, textual, etc). But it was ignoring the content-length header! So, if it was set manually, it would then be wrong after compression. That caused insidious bugs for years. The fix, obviously, was to just delete that manual header if the stream was going to be compressed.

show comments
jaffathecake

The results might be totally different now, but back in 2014 I looked at how browsers behave if the resource is different to the content-length https://github.com/w3c/ServiceWorker/issues/362#issuecomment...

Also in 2018, some fun where when downloading a file, browsers report bytes written to disk vs content-length, which is wildly out when you factor in gzip https://x.com/jaffathecake/status/996720156905820160

skrebbel

I thought I knew basic HTTP 1(.1), but I didn't know about trailers! Nice one, thanks.

flohofwoe

Unfortunately the article doesn't mention compression, because this is where it gets really ugly (especially with range requests), because IIRC the content-size reported in http responses and the range defined in range requests are on the compressed data, but at least in browsers you only get the uncompressed data back and don't even have access to the compressed data.

show comments
lloeki

Chunked progress is fun, not many know it supports more than just sending chunk size but can synchronously multiplex information!

e.g I drafted this a long time ago, because if you generate something live and send it in a streaming fashion, well you can't have progress reporting since you don't know the final size in bytes, even though server side you know how far you're into generating.

This was used for multiple things like generating CSV exports from a bunch of RDBM records, or compressed tarballs from a set of files, or a bunch of other silly things like generating sequences (Fibonacci, random integers, whatever...), that could take "a while" (as in, enough to be friendly and report progress).

https://github.com/lloeki/http-chunked-progress/blob/master/...

show comments
aragilar

Note that there can be trailer fields (the phrase "trailing header" is both an oxymoron and a good description of it): https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Tr...

remon

Totally worth an article.

TZubiri

len(response)

Am4TIfIsER0ppos

stat()?

_ache_

> Anyone who has implemented a simple HTTP server can tell you that it is a really simple protocol

It's not. Like, hell no. That is so complex. Multiplexing, underlying TCP specifications, Server Push, Stream prioritization (vs priorization !), encryption (ALPN or NPN ?), extension like HSTS, CORS, WebDav or HLS, ...

It's a great protocol, nowhere near simple.

> Basically, it’s a text file that has some specific rules to make parsing it easier.

Nope, since HTTP/2 that is just a textual representation, not the real "on the wire" protocol. HTTP/2 is 10 now.

show comments
pknerd

Why would someone implement the chunk logic when websockets are here? Am I missing something? What are the use cases?

show comments