Do you remember last summer, when the war on memes took place? When you just simply had to sign that petition to #SaveYourInternet? I would like to take you back to the legal instrument that created so much controversy and focus on two articles that got very little attention yet that substantially change the legal situation they govern.
Articles 3 and 4 of the Directive on Copyright in the Digital Single Market (DSM Directive) did not create a grand upheaval in the public sphere for the simple reason that it does not affect your own daily internet experience. Instead, the provision impacts the attractiveness of the European Union in relation to certain start-ups, scientific research, private research establishments and journalism & information institutions among others.
In order to understand the implications, we must first define what Text and Data Mining (TDM) is. In recital 8 of the DSM Directive, we find that TDM is “the processing of large amounts of information with a view to gaining new knowledge and discovering new trends possible. Text and data mining technologies are prevalent across the digital economy […] However, in the Union, […] organisations and institutions are confronted with legal uncertainty as to the extent to which they can perform text and data mining of content.”
This was the first time a legal instrument from the European Union concerning copyright actually defined TDM, despite the technique having been used since the 1980s in government intelligence.
All that is very well and interesting, but you would be entitled to ask us what the link between TDM and copyright is. TDM is used in many different areas but let us take the example of scientific research as an example.
Research papers are copyright protected, as they are the tangible expression of an idea and the intellectual creation of the author. A new paper is published every 30 seconds in the global research community, making it basically impossible for a scientist to read all the relevant literature in the field before starting their own research! That is where TDM comes into play generally – when there are large swathes of data that need to be sorted and analysed. This technique has the potential to infringe some of the rights given by copyright because it often requires copying, downloading, using and maybe even replicating thousands of scientific articles – rights that may be reserved to the author and that require the author’s authorization. Asking for consent is simply not be a viable option when dealing with such huge numbers of potential copyright owners. If you are further interested in how copyright and TDM interact specifically with the scientific publication community, this video is great.
Coming back to the regulation of TDM in copyright more generally, we will first look into the previous Directive that regulated the situation. The Harmonisation of Copyright in the Information Society Directive (InfoSoc Directive) did not have any explicit mention of TDM, and instead the people wanting to make use of TDM for any purpose had to infer and guess whether they were within the law or not. Article 2 of the InfoSoc Directive gave the copyright owner the right to allow or refuse the reproduction of their work, including direct or indirect, temporary or permanent, in any means and any form, in whole or in part.
However, Article 5(1) of the same Directive contains the one mandatory exception in the whole Directive: when there is a temporary act of reproduction that is absolutely necessary for the only purpose of either a transmission in a network (think of your internet service provider having to reproduce the data for you to see this article), or a lawful use of a work. This exception is also only valid as long as the reproduction of the work does not have independent economic significance. The consequences of this exception in the InfoSoc Directive meant legal uncertainty, as found in Recital 8 of the new DSM Directive, for the many industries that would benefit from TDM. Any commercial entity such as Google or IBM trying to develop an AI through the use of TDM was however nearly assured to be infringing the law if they were going to benefit from it afterwards, as per the requirement of no independent economic significance.
The new DSM Directive changes things for the scientific research community, as Article 3 of the Directive now explicitly allows some forms of TDM for scientific research purposes. This includes techniques for reproduction and extraction, which must be made free by the copyright holders, as well as allowing the storage of the data – this last part is especially important in scientific research, as the results must be reproducible and explainable, which is a lot easier if you have the data you used at hand.
Article 4 of the DSM Directive deals with the rest of the industries that do not use TDM for scientific research purposes, but instead mostly for commercial purposes. There are of course other uses for TDM, such as Smart Disclosure Systems whose aim is to provide consumers with improved access to the data needed to make informed decisions for example. Users who do not have a scientific research purpose may therefore use TDM techniques to reproduce and extract works that they lawfully accessed and even keep the data as long as necessary for the purposes of TDM. This is very similar to Article 3 so far, except in relation to the data retention period.
The biggest difference between the two articles is that copyright holders can opt out and forbid the use of TDM for their work. An appropriate manner for a copyright holder to ‘reserve’ their right would be through the use of a robot.txt protocol (which very basically means that you make the webpage not readable for an algorithm, hence your work will not be included in the TDM process) or a technological protection measure. Other means of reserving copyright could include through contracts or licences, but apart from the first mentioned (robot.txt protocol) nothing is certain yet. This leaves TDM users falling within the scope of Article 4 at the mercy of copyright holders. For now however, the current uncertainty from the wording of the article also exposes content owners to TDM users who may claim that the owner’s reservation was not done in an appropriate manner. This matter will ultimately have to be dealt with and the appropriate manner required to reserve the right further defined in the Court of Justice of the European Union.