I recently came across the website https://www.web.sp.am/, it's a site that catches internet crawlers in a loop.

I thought about it for a while and had the thought that wouldn't it be fun to waste the resources of the internet giants scraping our data and using it to train their AI models?

I clearly answered that question with a, Yes, lets waste some resources by doing something similar and catching them in loops.

Over the course of a couple of evenings I hashed something together called Gridlock which can be found on here.

Screenshot of Gridlock

It is essentially just a little HTTP server written in Go that will generate a bunch of links with unique subdomains on each page load. In my use case it is frontend by a nginx server configured to handle all subdomains and forward it to the Go HTTP server.

There is an live example here.

Hopefully, the crawlers will eventually find it and then follow those links and be trapped in the loop.

The crawlers could become more intelligent to not follows these links but lets see if this happens or not.


Before you say, but surely they aren't that stupid. Well in some cases they are. Here is a recent article from 404 media about OpenAI's crawler getting stuck https://www.404media.co/openai-training-bot-crawls-worlds-lamest-content-farm-3-million-times-in-one-day/.