When you want to prevent AI from stealing your content

AI robots scrape the web, looking for content to learn from or — (depending on how you see it) — to plagiarize. You can protect yourself by blocking some of those bots.

ChatGPT, Midjourney and other AI platforms scrape the web for content which is then used to train AI systems to be able to generate similar content. For example, if you’re a horticulture blogger, everything about your website is fair game to these scrapers: your technical knowledge, the kinds of subjects you write about, how your website is organized, and even the design of your branding.

Unpaid, nonconsensual participation in the education of AI makes me deeply uneasy. Is it theft? Or just inspiration? I’d rather not grapple with these big questions, and would just prefer to recuse myself entirely.

If you’re suitably unsettled, I’ll tell you how to turn off scraper bots in Squarespace, Wix and WordPress. They aren’t guaranteed to work, since the bot needs to honor them. (That said, in most cases, the big systems honor these rules.)

Preventing AI theft in Squarespace

Preventing AI scrapers in Squarespace is (annoyingly) the easiest of them all. Just go to Settlings > Crawlers > and you’ll see a slider for preventing AI bots. Turn it off!

I’m not sure *which* of the AI bots Squarespace includes in their list, but hopefully they’re updating them constantly. As of this writing, you cannot manually edit your robots.txt file in Squarespace.

Preventing AI theft in WordPress

Users have significantly more control within WordPress, but it means it can be a bit of a maze to navigate. Many guides online suggest adding a WordPress plugin to fix the problem and prevent scrapers. You can do this *without* a plugin, so of course I would say you must.

One way of preventing AI scraper bots is to edit your robots.txt file. What’s that? It’s a file in your website’s main folder which tells search engines how to interact with your website. One example: when “discourage search engines” is checked on your WordPress settings, it adds a line to your robots.txt file to tell Google and Bing to cool it.

Unfortunately, you can’t block all scrapers from this file. You can only identify the ones you want to block. Here’s an example of some lines one might add to your robots.txt file, from Neil Clarke:

And here are a few other bots other sites have blocked for similar reasons:

Ultimately, you’ll want to look into these bots and make your own decision. Besides needing the bots to adhere to your request, another problem with this method is it requires you to keep tabs on which user-agents to block as new AI scrapers get released. Other more effective methods are in the works, including legal declarations of ownership and consent embedded in <meta> tags or an ai.txt file that also lives in your website’s home directory.

The other problem is that WordPress and SEO plugins often create their own dynamic robots.txt file. So while you could edit or upload a robots.txt file in your public HTML directory (most hosts should have a text editor built into their system), it’s better to use your WordPress theme’s functions.php file. (Unless, of course, you have Yoast installed, in which case you can use Yoast’s file editor.)

If you want to go the WordPress developer route, plop this code from 10Web into your functions.php file and input the bots you want to block using the syntax from earlier. (Visible under Appearance > Theme Editor, or in your host’s file editor.)

As ever, proceed at your own risk!

Preventing AI theft in Wix

In Wix, the path is similar to WordPress. You’ll want to edit the robots.txt file with the information above. Wix offers this guide for editing your robots.txt file.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *