Evolving Paradise for Machine Learning—Revisions to the Copyright Act Further Accelerate Development of Japan’s AI
Although a large volume of raw data and training dataset generated based on such raw data is necessary for
trained model generation, there are cases where raw data (written matter, photographs, still
images, etc.) that are also copyrighted products are frequently used.
Although copyrighted products cannot be used (downloading, changing, etc.) without the consent of the copyright holder under copyright laws, in fact, Article 47-7 of Japan’s current Copyright Act contains an unusual provision, even from a global perspective (discussed below in more detail), which allows the use of copyrighted products to a certain extent without the copyright holder’s consent if such use is for the purpose of developing AI.
Grasping this point, Professor Tatsuhiro Ueno of Waseda University’s Faculty of Laws has characterized Japan as a “paradise for machine learning.” This is an apt description.
Column: Machine Learning Paradise (Tatsuhiro Ueno)
However, there is also a “certain limitation” in Article 47-7
The revised Copyright Act is scheduled to come into effect on January 1, 2019. Once it does, Article 47-7
will be repealed and Article 30-4 and Article 47-5, both new provisions, will come into force.
broaden the permitted actions for AI development, development of AI will accelerate even more, creating
possibility of extremely significant business opportunities for AI-related businesses.
This article summarizes permitted actions under Article 47-7 of the current Copyright Act (hereafter referred to as the “current Article 47-7”), limitations in the current Article 47-7, and actions that will become permitted under new Article 30-4 and other provisions of the revised Copyright Act. Further, since these revisions to the Copyright Act were undertaken based on the report of the Subdivision on Copyright of the Agency for Cultural Affairs (April 2017), this report is hereafter referred to as the “2017 report”.
Acts Permitted under the current Article 47-7
- First, let’s summarize the permitted acts under the current Article 47-7.
- Work performed when generating models
The series of steps involved from the collection of raw data to the generation of a model is illustrated in the following diagram.
The “data collection”, “data processing” and “machine learning, DL” used in this diagram are, in specific terms, machine learning and deep learning using datasets consisting of copies and formats of data. Since these actions amount to “reproduction” or “adaptation”, in principle, they are a copyright violation unless the consent of the copyright holder is obtained.
Copyright Act Article 47-7—Savior of Japan’s Machine Learning
However, Article 47-7 of the Copyright Act has come to the rescue.
Please keep Article 47-7 of the Copyright Act in mind since it is an extremely important provision for trained model generation. The text of this provision is as follows:
Article 47-7 To the extent that it is considered to be necessary, it is permissible to record a work onto a recording medium or to make an adaptation of a work (including recording a derivative work created by adaptation) if the purpose of doing so is data analysis (meaning the extraction, comparison, classification, or other statistical analysis of language, sound, or image data, or other elements of which a large number of works or a large volume of data is composed; the same applies hereinafter in this Article) by means of a computer; provided, however, that this does not apply with regard to database works compiled for use by persons who carry out data analyses.
Simply put, copyrighted products can be recorded or adapted to the extent necessary without the consent of
copyright holder if it is for the purpose of “data analysis” (albeit with some exceptions). Therefore, if
machine learning and deep learning are included in the data analysis, the copyrighted product can be freely
recorded or adapted without the consent of the copyright holder if it is for the purpose of machine learning
Thus, as far as I know, “machine learning/deep learning” are included in “data analysis”. In other words, I believe that the majority opinion is that Article 47-7 of the Copyright Act applies to “machine learning/deep learning”. (As I will mention later, this point has been made clearer in the revisions to the Copyright Act.)
Based on this viewpoint, Article 47-7 of the Copyright Act allows the free use, without consent, even of the copyrighted products of others if it is for the purpose of “machine learning/deep learning”. Moreover, the crucial point of this provision is that it is not limited to “use for non-commercial purposes”. This means that this provision also applies to trained model generation for commercial purposes (sales and provision for compensation) and even the “recording and adaptation” of copyrighted products for commercial purposes are possible.
As a side note, although the laws of foreign countries have provisions with the same effect as Article 47-7 of Japan’s Copyright Act, all of which limit use to development for non-commercial purposes and development by research organizations, Article 47-7 of Japan’s Copyright Act can be considered unique from a global perspective since it also applies to commercial purposes.
In short, frankly speaking, given that Article 47-7 of the Copyright Act is a valuable provision to Japan’s machine learning, come to Japan if you develop machine learning.
Frequently Asked Questions
I often mention Article 47-7 of Japan’s Copyright Act in various seminars. Every time I mention this provision, all of the participants invariably express great surprise. Although this is not something that I should take credit for, I have a little sense of satisfaction at these times. Since I am often asked the same questions every time I mention this provision, I have summarized them below. (For your information, the questions and responses below apply equally to the revised Copyright Act.)
1. Although it is legal to use copyrighted products without consent when Article 47-7 of Japan’s Copyright
Act applies, does Article 47-7 also apply to learning processes performed on a foreign server?
This involves the issue of applicable law with respect to the use of the copyrighted products (i.e., the problem of which country’s laws apply to a certain use). Under the Copyright Act, the law of the “place of use” of the copyrighted product applies. However, the interpretation of where the “place of use” is located is a difficult problem, particularly if the act of use involves using the web.
One view is that the place of use is the “location of the server”. However, where an individual in country A uses a server located in country B for learning process, it is difficult to determine whether the copyright laws of country A or the copyright laws of country B apply.
Nonetheless, it is almost certain that Article 47-7 of Japan’s Copyright Act applies where an individual in Japan uses a server located in Japan to download and label data for learning process. For this reason, if you develop machine learning, you should (physically) come to Japan.
2. Does Article 47-7 also apply to the use of copyrighted products whose rights are held by foreign right
holders (for example, Disney, etc.)?
This is also an issue of applicable law. This issue of applicable law is determined based on the “place of use”, and is not related to the “location of the right holder”.
Accordingly, Article 47-7 of Japan’s Copyright Act applies to copyrighted products whose rights are held by foreign rights holders for learning process conducted in Japan, thereby making such use legal.
For this reason, we can definitely say that if you develop machine learning, you should (physically) come to Japan.
3. In the case of joint research related to AI development undertaken with a foreign university or business
operator, which country’s copyright and other laws will apply? (The place where the data is located? Where the
process is performed? The place where it can be viewed? Etc.)
As you may understand from the explanation thus far, the laws of the country where the learning process occurs will apply.
Limitations in the current Article 47-7
However, the current Article 47-7 only applies when the same business operator conducts all of the steps in
series through to the end from raw data collection, database creation, preparation of training dataset, all
way through to machine learning and DL.
This is the conclusion drawn from the fact that the permitted actions under Article 47-7 are limited to “record a work onto a recording medium or to make an adaptation of a work” and the fact that Article 47-7 is not included in Article 47-10 (transfer of copies made pursuant to restrictions on the right of reproduction).
Therefore, Article 47-7 does not apply in the cases below, and returning to the general principle, any use made without the consent of the copyright holder is a copyright violation.
1. An act of preparing a training dataset for another person to generate a model which is sold to an unspecified number of third parties or disclosed on the web, instead of generating a model by yourself
Example: A situation where training dataset is created and sold for generation of image recognition model by copying a large volume of image data available on the web or provided to the public by the right holder.
2. An act of a business operator, who created a training dataset to generate a model on its own and generated
a model, selling the training dataset used in the model generation to an unspecified large number of third parties and making it available on the Web at no charge Example: A situation where a business operator that generated an image generation model sells the training dataset used in that model generation together with that model as a set.
3. An act of sharing of training dataset among a consortium consisting of specific business operators
Example: A situation where a business operator that generates an automatic translation engine utilizing deep learning engages in co-sharing within the consortium of a translation corpus generated by collecting a large volume of natural language data from the web
Further, although examples 1 and 2 may be legally permissible if the action involves “the transfer to a
specific third person”, there was no clear opinion about the extent to what falls under “a specific third
with legal precedent interpreting its scope extremely narrowly.
This issue was pointed out in the Intellectual Property Strategy Headquarters in its “New Information Review Board Journal” (March 2017) and the 2017 report, and the necessity for a response to this issue was shared between these reports.