AOI image scraping: What you can do

Copyright law in the UK forbids the copying of images without specific permission. Yet so far it has been beyond the possibilities of creators to challenge use of their work when it has been scraped and used for training generative AI, due to lack of transparency from AI developers on which exact works have been used, and how.

So, what can be done? Although, this wording cannot stop bots scraping your website, we recommend this wording is included as part of your website terms and conditions:

Other than strictly as permitted by law you must not carry out on this website or its content any automated data mining, web scraping or other processes for extracting data or images.

This means that any bot who ignores this wording is breaking your terms and conditions.

Practical measures

These two programmes developed by the University of Chicago are available:

Glaze – is a cloaking tool that can be applied to artwork files that disrupts AI models’ ability to interpret images effectively.

Nightshade – transforms images into “poison” samples, ‘so that models training on them without consent will see their models learn unpredictable behaviors that deviate from expected norms.’

Datatsets

Generative AI is trained on data illegally scrapped from the internet. LAION-5B is an example of a data set formed this way, and the Have I Been Trained website gives access to ascertain if artists’ images are included.

LAION-5B database – a dataset of 5,85 billion CLIP-filtered image-text pairs scraped from the internet. New version released August 2024.

Have I Been Trained – Search 5.8 billion images in the LAION-5B dataset used to train a number of generative AI models. Drag your image in, or search by title text, to see if it is included in the dataset.