SERVICE

8 Points about AI Development Agreements that can be learned from the “Contract Guidance on Utilization of AI and Data

In the case of AI software, even if the ownership of intellectual property rights, such as deliverables, and the terms of use are stipulated in the contract, it is possible that the effect of the contract may not extend to a derivative model trained by using different data on the trained model and a distilled model generated by the so-called distillation (the generation of a different trained model using only input-output data).
This is a contractual limitation which is important to know.

1. Among the materials, interim deliverables, and deliverables, know which are or are not covered by intellectual property rights.” “(2) With respect to (1) above, know who has what rights under the default rules (i.e., a legal rule).

Since these two points are easier to understand when they are explained together, I will discuss them together.
The objects that are necessary to examine here (materials, interim deliverables, and deliverables) are the six below.

  1. raw data
  2. training dataset
  3. training program
  4. trained model
  5. trained parameter
  6. know-how

If you want to have these six objects protected under current intellectual property laws, you should consider these three: the Patent Act, the Copyright Act, and the Unfair Competition Prevention Act (trade secrets, etc.). Moreover, although the Unfair Competition Prevention Act is not a law that pertains to intellectual property rights, when the object falls under the trade secret or limited provision data category, [such objects] can be the subject of an injunction or claim for damages as an act of wrongful acquisition or an act of unfair competition. Therefore, the Unfair Competition Prevention Act is treated in the same manner [as one of the current intellectual property laws].
Therefore, our goal here is to fill in the blank parts of the chart below.

1. Raw Data

(1) Whether or not it is covered by intellectual property rights
The answer to this question depends on the type of raw data. For example, certain types of data ( mechanical operating data, sensor data, and factual data) do not involve intellectual property rights, so they can only be protected if they fall under the trade secret category (Unfair Competition Prevention Act, Article 2, paragraph 6) or the limited provision data category (Revised Unfair Competition Prevention Act, Article 2, paragraph 7).
There are no legal default rules for raw data that does not fall under the trade secret or any other category.

(2) Who has what rights under the default rules (i.e., a legal rule)?
Since raw data that does not fall under the trade secret or any other category is not covered by intellectual property rights, no one holds these rights. Therefore, in such a situation, both the user and the vendor have no choice but to stipulate in a contract who can use the raw data and in what manner.

2. Training Dataset

“Training dataset” means the secondary processed data generated by the conversion and preprocessing of raw data to make the learning task easier.

(1) Whether or not it is covered by intellectual property rights
Since the training dataset is really just a way of presenting information, it is generally considered as not being an “invention” that can be the object of patent rights. However, even though individual data may not be copyrightable, if the training dataset falls under the “database works” category (Copyright Act Article 2-2), then it will have copyright protection.
Although “database works” means products that “by reason of the selection or systematic construction of information contained therein, constitute intellectual creations”, in most cases a training dataset whose raw data has been sifted through and organized into a systematic construction for efficient machine learning and deep learning falls under “database works”.
In addition, [the training dataset] will also be protected if it falls under “trade secret” category (Unfair Competition Prevention Act, Article 2, Paragraph 6) or “limited provision data” (Revised Unfair Competition Prevention Act, Article 2, Paragraph 7).

(2) Who has what rights under the default rules (i.e., a legal rule)?
If the training dataset falls under the “database works” category, the copyright holder will be the person who creatively “selected the information” or “systematically constructed” [such information].
Therefore, if the processing act uses only the know-how of the vendor, the vendor will be the copyright holder, and if the vendor and user engage in a creative act together, [the training dataset] will be considered the joint copyrighted product of the vendor and the user, who may both be the joint copyright holders.

3. Training Program

(1) What is a “training program”?
A “training program” is a program that uses a training dataset for learning to generate a trained model.
Although a training program may be developed in various ways, such as using what the vendor already possesses or developing right from the start based on a concrete development plan, in reality, OSS (open source software) is used in most cases.

(2) Whether or not it is covered by intellectual property rights
A “training program” is a program that uses a training dataset for learning to generate a trained model.
Since a training program constitutes a “program” [under the Patent Act], the analysis about whether or not it is covered by intellectual property rights is exactly the same as that for ordinary programs.
In other words, the algorithm portion, if it satisfies the requirements of the Patent Act, will be protected under the Patent Act as an “invention of a product (a computer program)” and the source code portion will be protected under the Copyright Act as a “work of computer programming” (moreover, the foregoing will still apply even if they are converted to the object code. Copyright Act Article 10, Paragraph (1), item (ix)).
In addition, the training program will also be protected under the Unfair Competition Prevention Act if it falls under “trade secret” category (Unfair Competition Prevention Act, Article 2, Paragraph 6).

(3) Who has what rights under the default rules (i.e., a legal rule)?
Legally, since patent rights are granted to inventors (the person who creates) and copyrights are held by creators (the person who creates), the person who invents or creates such program will be granted the patents rights and will hold the copyright [for such program].
Therefore, if the vendor develops the training program from scratch, then under the legal default rule, the vendor would be granted patent rights and would also hold the copyright. Further, when using a training program provided as an OSS, both the vendor and the user need to pay attention to the contents of the license. The reason for this is that, depending on the contents of the OSS license, there may be certain obligations such as the obligation to disclose source code.

4. Trained Model

(1) What is a “trained model”?
The trained model is a deliverable in which parties to the contract are very interested since, like the training dataset, the trained model can be reused.
However, it is necessary in both the contract and negotiations to carefully determine the meaning of the term “trained model”.
More particularly this means that since there are various definitions for “trained model” (such as “functions”, “mathematical model”, “algorithms”, “network structure”, “inference program”, “parameters” and “any combination of these concepts”), the use by the parties of different meanings [for “trained model”] could become the source of great trouble.
Here, similar to the AI Guidelines, “trained model” means an inference program that includes “trained parameters”.

(2) Whether or not it is covered by intellectual property rights
It is fine to think of the inference program part of the trained model in the same way you think of training program.
In other words, if it satisfies the requirements of the Patent Act, the algorithm portion will be granted protection as the “invention of a product (a computer program)” under the Patent Act, and, based on the Copyright Act, the source code will be protected as a “work of computer programming” under the Copyright Act.
For example, let’s assume that you discover a highly innovative, extremely accurate network structure related to a specific development subject. You may be able to submit a patent application for that network structure as an “invention of a product (a computer program)”.
In addition, [the network structure] will be protected under the Unfair Competition Prevention Act if it constitutes a “trade secret” (Unfair Competition Prevention Act, Article 2, Paragraph 6).
Although I will discuss the “trained parameters” of the trained model later, I believe that ultimately the trained parameters will not give rise to any intellectual property rights.

(3) Who has what rights under the default rules (i.e., a legal rule)?
In this case also, similar to the training program, if the vendor develops the “inference program” part from scratch, under the legal default rule, the vendor would be granted patent rights and would also hold the copyright.

5. Trained Parameters

(1) What are trained parameters?
Trained parameters” are the parameters (co-efficient) obtained as a result of learning using the training dataset and the training program.
They are a large volume of strings of numerical values automatically generated by the training program. In the case of deep learning, the major parameters among the trained parameters can be considered to be the ones used for the weighting of each internode link.

(2) Whether or not it is covered by intellectual property rights
As I explained earlier, since the trained parameter is a large string of numerical values automatically generated by a training program, and as such involves no creativity, I believe that it does not constitute either an “invention” or a “work”.
However, the trained parameters can be protected if they fall under the trademark category (Unfair Competition Prevention Act, Article 2, Paragraph 6) or the limited provision data category (Revised Unfair Competition Prevention Act, Article 2, Paragraph 7).

(3) Who has what rights under the default rules (i.e., a legal rule)?
Since a trained parameter that does not fall under the trade secret (or other similar) category involves no intellectual property rights, no one holds any rights [to such trained parameter]. As such, both the user and the vendor have no choice but to stipulate in a contract who can use the trained parameter and in what manner.