BREIN takes LLM offline
BREIN took a Dutch generative AI large language model called GEITje-7B offline last week. The model was trained on the largest Dutch-language training dataset that BREIN already took offline last summer (see: here).
BREIN summoned the LLM provider i.a. because the model was trained on tens of thousands of copies of Dutch-language books from an illegal source. This source is Library Genesis, a service that has been found unlawful by the Dutch courts and is being blocked by Dutch access providers at BREIN’s request. The LLM has also been trained, for example, on texts copied from news sites without permission.
We see a worldwide trend that creators of AI models have little or no respect for copyright. Apparently, the thinking is that all the attention, time and money put into copyrighted works by creators and media companies are less important than the AI models. Whether it is music, text, photos or video, the entire Internet is being copied without permission to train generative AI models without compensating the creators and rights holders of the original works. This should stop.
BREIN is not against (training) AI but does believe that the authors of all that music, books, etc. should receive fair compensation for it and if the original creators do not want their material to be used for training AI, that should also be respected.
The LLM provider i.a. argued that text and data mining is allowed for scientific purposes and that the model is used by scientists. However, the model was publicly made available for commercial use on Huggingface.co, the community of AI developers. The AI Act requires that scientists must have lawful access to material in order to be allowed to use such a source for text and data mining for AI. That is not the case if obviously illegal sources are used in training an AI model.
Dozens of lawsuits are already pending in the United States against providers of AI models. In Europe, the first cases are now also brought before the courts. Gradually, the realization is dawning that copyright must be respected and we are seeing the first licensing agreements. For example between Open AI and the Financial Times and recently also the preliminary agreement between the large music companies and Claude AI. Ultimately, it is about the tech industry also abiding by the law and respecting copyrights. Makers and producers should be able to earn an honest living and (big) tech should pay for the use of other people’s copyrighted material just like everyone else, according to BREIN managing director Bastiaan van Ramshorst.