Improved web caching proposal

Many webmasters decide to load JS libraries and fonts from CDNs instead of hosting them locally. The primary benefit of loading popular JS libraries from popular CDNs is that browsers can cache them. Thus user likely have those libraries cached already after loading them once.

There are however a number of security and privacy issues to consider. One of the main providers of CDNs for Js libraries is google. You may have seen or used ajax.googleapis.com and fonts.google.com. Needless to say, google likely does not do this for fun but for profit. How do they profit? Simple. When a user requests a site, the browser also requests those JS libraries. Requests to the CDN server typically contain, among all the usual data that can be used to identify the user, a referrer that identifies the website that requires the resource. In simple terms: whoever runs the CDN knows who visits which websites. They may not hear of it reliably every time, the requests are cached after all, but they can still get a pretty good picture. For a nice introduction to browser caching have a look at this article

Caching works based on URLs, browsers identify JS libraries using their URL and figure out whether they have them cached or not. Because of that caching does not work if you load the exact same JS library from two different URLs. I think it would be nice if it worked, not only to improve caching but to take away the main argument for using libraries hosted on CDNs.

I propose to identify the JS libraries using checksums instead of URLs. This would enable caching for locally hosted JS libraries as well as between different CDNs. Some URL pointing to the library would still be required, but it would no longer serve as identifier.

There is one obvious side effect: wildcard versions of libraries would no longer be possible. You would not be able to use jquery/3/jquery.min.js, you could only load specific versions, like jquery/3.2.1/jquery.min.js. The reason is simply that you can't checksum what you do not yet know or does not yet exist. There may be workarounds for this, the server could request the checksum of the latest version from somewhere and inject this into the page.

This is just a vague description of what could be done. I'm not interested in fleshing out the details and I doubt that this will get implemented anyway. I just had this idea lying around for a few months and now it's out.

2017-05-14

murks

Improved web caching proposal