Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching issue with gtfs.org #23

Closed
bdferris-v2 opened this issue Aug 1, 2022 · 12 comments
Closed

Caching issue with gtfs.org #23

bdferris-v2 opened this issue Aug 1, 2022 · 12 comments

Comments

@bdferris-v2
Copy link

I've had an issue where gtfs.org seems to be showing stale/old content. The site looks like the following for me:

Screen Shot 2022-08-01 at 3 27 05 PM

Clicking through to links like "Static" shows a version of the GTFS spec that hasn't been updated since 2019. Apparently others have experienced this as well?

Doing some digging, I'm not 100% sure what's going on, but here is some speculation on what might be going that might help (or hinder!) any follow-up investigations:

Critically, I see the latest, updated version of the site if I load the page in an Incognito window in Chrome.

When I load https://gtfs.org/ or https://gtfs.org/reference/static in a regular Chrome window with DevTools running, I see that most of the page content is actually coming from a Service Worker running for the site:

Screen Shot 2022-08-01 at 3 31 38 PM

My understanding is that this means that no content is actually been pulled over the network, but instead a Service Worker is intercepting the request and returning content.

Looking a bit closer at the Service Worker:

Screen Shot 2022-08-01 at 3 34 13 PM

sw.js:

/**
 * Welcome to your Workbox-powered service worker!
 *
...

It appears there is a Worbox service worker serving as an application-level cache in front of a Gatsby CMS.

Now, when I compare that to https://gtfs.org/ loaded from an Incognito tab or on a different computer, I don't see a Service Worker or Gatsby-style content. Looks like a different site with a different framework.

Screen Shot 2022-08-01 at 3 45 54 PM

So not knowing anything about how gtfs.org is run or setup, I'm guessing at some point in the past, it was configured with Workbox + Gatsby, but was then migrated to something else. But Service Workers appear to be pernicious, especially when it comes to caching. See also Service Workers Break the Browser’s Refresh Button by Default; Here’s Why. I /think/ a force-refresh of the page fixes the issue, but I wanted to keep my browser in its current state in case it helps with investigations.

@fredericsimard
Copy link
Contributor

Hi Brian, the website is hosted by GitHub (GitHub Pages), the domain is verified by them and all the DNS are set properly (verified twice). Updates are therefore pushed automatically from a specific branch on the repo by GitHub. Therefore we have little to no control over the whole hosting side, besides turning it on and off essentially.

The old site was hosted on an Amazon S3 bucket, but nothing should still connect to that anymore.

Force reloading the page usually works, no idea why the old website is still being showed, and it's been doing that unpredictably since the new site was launched.

This is the gist of it, from my perspective.

@derhuerst
Copy link

If the start page is not being cached client-side indefinitely, would it be a fix if MobilityData were to serve a script that explicitly clears all service workers?

@fredericsimard
Copy link
Contributor

If the start page is not being cached client-side indefinitely, would it be a fix if MobilityData were to serve a script that explicitly clears all service workers?

The issue is that we do not have control over the webserver. The pages are in markdown converted to HTML and deployed by GitHub themselves. And the client would need to load the new page in order to see the script and clear its caches. Every trick to force the client to empty its caches require it to load the new page in order to get this command.

@derhuerst
Copy link

And the client would need to load the new page in order to see the script and clear its caches. [...]

I missed that the start page is cached by the service worker as well.

@bdferris-v2 Does the service worker also respond with data for pages that you have never visited in this browser session, e.g. ones that don't exist? We could serve a custom script there.

@bdferris-v2
Copy link
Author

It is a tricky problem, but I see two possible paths:

  1. The Service Worker was originally registered using the url https://gtfs.org/sw.js. Per the Chrome Service Worker Lifecycle docs, Chrome should be checking for an update to the service worker itself every time there is a page navigation on gts.org (assuming I understand the docs correctly). And while I can't see Chrome checking for sw.js in DevTools, I can see the request using chrome://net-export. And indeed, it's coming back as a 404 because the new GitHub-hosted sites doesn't have a sw.js file in its root directory. (Apparently, under that scenario, Chrome just keeps using the existing service worker instance?)

Critically, I think this a potential point of ingress. Specifically, I believe you can upload a sw.js file to the root of your gtfs.org gh-pages branch and GitHub will start serving it (I've seen other projects with sw.js files).

What should be in sw.js? I think we could start with an empty file and see if that clears the existing Service Worker. If that doesn't work, we could try a code snippet like:

(function () {
  if(window.navigator && navigator.serviceWorker) {
    navigator.serviceWorker.getRegistrations()
    .then(function(registrations) {
      for(let registration of registrations) {
        registration.unregister();
      }
    });
})()

Per https://stackoverflow.com/questions/33704791/how-do-i-uninstall-a-service-worker

  1. There is another potential ingress point that I'm less certain about. The Gatsby CMS + Service Worker is making a request for https://gtfs.org/page-data/app-data.json. It's coming back 404 from GitHub (because the URL no longer exists), which causes the Service Worker to use its previously cached version. We could potentially create that file manually on the GitHub hosted site? But what should go in it? I'm not sure if it's actually possible to inject a script via a Gatsby app-data.json file, so more research would be needed. But maybe it's an option? Either way, I'd try option Add issue templates #1 first.

@fredericsimard
Copy link
Contributor

@scmcca I think there's a possible fix here, as explained by Brian. Do you believe it would be possible to add this file at the root and have MKDocs export it in the gh-pages branch?

@scmcca
Copy link
Contributor

scmcca commented Aug 2, 2022

@fredericsimard @bdferris-v2

I went ahead and added an empty sw.js at the root. You can see it here (let me know if I put it in the right place!).

Please let me know if the issue has resolved, otherwise we could try putting some code in the file. Thanks!

@bdferris-v2
Copy link
Author

I think that fixed it! I'm now seeing the updated site and on DevTool inspection, I see a blank sw.js Service Worker loaded in Chrome.

@scmcca
Copy link
Contributor

scmcca commented Aug 2, 2022

@bdferris-v2 Incredible! Thank you for the on-the-nose insight.

Should we assume this as a solution for all browsers?

@bdferris-v2
Copy link
Author

I think so? Mozilla mentions similar update frequency.

@fredericsimard
Copy link
Contributor

Well that's awesome. This was beyond my knowledge pool. I'm glad it got resolved. Perhaps we'll see the impacts in Google Analytics :-) Thank you Brian!

@scmcca scmcca closed this as completed Aug 3, 2022
@derhuerst
Copy link

A friend just told me about the Clear-Site-Data header with its storage directive, which would also solve this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants