New ask Hacker News story: Tell HN: Server error (5xx) in Google Search Console may not be 5xx at all

Tell HN: Server error (5xx) in Google Search Console may not be 5xx at all
3 by santah | 0 comments on Hacker News.
I run https://ift.tt/HqpY9z4 and recently (a few months ago) - I noticed it started getting pages not indexed because of “Server error (5xx)” in Google Search Console (GSC): https://ift.tt/N85WDTH On the website everything looked good and all reported links worked fine. I tried validating these errors at GSC as fixed, but it would always report back that the issue is still present and new links would keep popping up to have 5xx errors (as seen on the screenshot). This was worrisome because it indicated there was some kind of an issue I wasn’t aware of that may be affecting not only Google’s crawlers, but my users as well. I did what everyone would do - checked my server, Cloudflare and analytics logs for anything suspicious and placed some additional logging to try and catch what was happening. This turned out nothing - as far as I could tell - no requests returned any 5xx errors, so I decided it’s just a weird Google quirk and ignored it for a while. With time though, Google kept reporting these problems and the count of 5xx URLs only grew larger so once again (about 2 weeks ago) I started investigating what was happening. This time around, I tried to match the URLs reported by GSC with the analytics provided by Cloudflare and bingo - I found that all these requests had the Edge Status Code (and Origin Status Code) of “429 Too Many Requests”. Now that was progress. There is only one thing on my service that would return this status code and is my custom rate limiting which would be triggered if you do more than 30 requests in less than 10 seconds. What changed so that Google suddenly decided to crawl so aggressively and hit that limit (something that never happened before, and Next Episode is online for more than 19 years now!) and why it’s reporting them 5xx in GSC when my server clearly returns 429 - I don’t know. What I do know for sure is that Google is misreporting 429 server status as 5xx. To fix this (at least as a quick fix for now) - I whitelisted in my rate limiter all Google Crawlers’ IPs (which I found through here: https://ift.tt/qC9uxG1 ) - listed in this JSON provided by Google: https://ift.tt/xWn13sh For just in case, I also passed on the ASN in the request header (through a Cloudflare transform rule) and whitelisted the whole Google ASN (15169) as well. After - I monitored for new 5xx errors popping up in GSC and new 429 statuses logged in Cloudflare from Google’s ASN and so far (for more than 2 weeks) - so good.

Comments

Popular posts from this blog

New ask Hacker News story: Brother Printers Sending Ink Data to Amazon?

New ask Hacker News story: Tell HN: Equifax free credit report dark patterns