8 Points about AI Development Agreements that can be learned from the “Contract Guidance on Utilization of AI and Data
In the case of AI software, even if the ownership of intellectual property rights, such as deliverables, and
the terms of use are stipulated in the contract, it is possible that the effect of the contract may not extend
to a derivative model trained by using different data on the trained model and a distilled model generated by
the so-called distillation (the generation of a different trained model using only input-output data).
This is a contractual limitation which is important to know.
1. Among the materials, interim deliverables, and deliverables, know which are or are not covered by intellectual property rights.” “(2) With respect to (1) above, know who has what rights under the default rules (i.e., a legal rule).
Since these two points are easier to understand when they are explained together, I will discuss them
together.
The objects that are necessary to examine here (materials, interim deliverables, and deliverables) are the six
below.
- raw data
- training dataset
- training program
- trained model
- trained parameter
- know-how
If you want to have these six objects protected under current intellectual property laws, you should consider
these three: the Patent Act, the Copyright Act, and the Unfair Competition Prevention Act (trade secrets,
etc.). Moreover, although the Unfair Competition Prevention Act is not a law that pertains to intellectual
property rights, when the object falls under the trade secret or limited provision data category, [such
objects] can be the subject of an injunction or claim for damages as an act of wrongful acquisition or an act
of unfair competition. Therefore, the Unfair Competition Prevention Act is treated in the same manner [as one
of the current intellectual property laws].
Therefore, our goal here is to fill in the blank parts of the chart below.
1. Raw Data
(1) Whether or not it is covered by intellectual property rights
The answer to this question depends on the type of raw data. For example, certain types of data ( mechanical
operating data, sensor data, and factual data) do not involve intellectual property rights, so they can only
be protected if they fall under the trade secret category (Unfair Competition Prevention Act, Article 2,
paragraph 6) or the limited provision data category (Revised Unfair Competition Prevention Act, Article 2,
paragraph 7).
There are no legal default rules for raw data that does not fall under the trade secret or any other category.
(2) Who has what rights under the default rules (i.e., a legal rule)?
Since raw data that does not fall under the trade secret or any other category is not covered by intellectual
property rights, no one holds these rights. Therefore, in such a situation, both the user and the
vendor have no choice but to stipulate in a contract who can use the raw data and in what manner.
2. Training Dataset
“Training dataset” means the secondary processed data generated by the conversion and preprocessing of raw data to make the learning task easier.
(1) Whether or not it is covered by intellectual property rights
Since the training dataset is really just a way of presenting information, it is generally considered as not
being an “invention” that can be the object of patent rights.
However, even though individual data may not be copyrightable, if the training dataset falls under the
“database works” category (Copyright Act Article 2-2), then it will have copyright protection.
Although “database works” means products that “by reason of the selection or systematic construction of
information contained therein, constitute intellectual creations”, in most cases a training dataset whose raw
data has been sifted through and organized into a systematic construction for efficient machine learning and
deep learning falls under “database works”.
In addition, [the training dataset] will also be protected if it falls under “trade secret” category (Unfair
Competition Prevention Act, Article 2, Paragraph 6) or “limited provision data” (Revised Unfair Competition
Prevention Act, Article 2, Paragraph 7).
(2) Who has what rights under the default rules (i.e., a legal rule)?
If the training dataset falls under the “database works” category, the copyright holder will be the person who
creatively “selected the information” or “systematically constructed” [such information].
Therefore, if the processing act uses only the know-how of the vendor, the vendor will be the copyright
holder, and if the vendor and user engage in a creative act together, [the training dataset] will be
considered the joint copyrighted product of the vendor and the user, who may both be the joint copyright
holders.
3. Training Program
(1) What is a “training program”?
A “training program” is a program that uses a training dataset for learning to generate a trained
model.
Although a training program may be developed in various ways, such as using what the vendor already possesses
or developing right from the start based on a concrete development plan, in reality, OSS (open source
software) is used in most cases.
(2) Whether or not it is covered by intellectual property rights
A “training program” is a program that uses a training dataset for learning to generate a trained
model.
Since a training program constitutes a “program” [under the Patent Act], the analysis about whether or not it
is covered by intellectual property rights is exactly the same as that for ordinary programs.
In other words, the algorithm portion, if it satisfies the requirements of the Patent Act, will be protected
under the Patent Act as an “invention of a product (a computer program)” and the source code portion will be
protected under the Copyright Act as a “work of computer programming” (moreover, the foregoing will still
apply even if they are converted to the object code. Copyright Act Article 10, Paragraph (1), item (ix)).
In addition, the training program will also be protected under the Unfair Competition Prevention Act if it
falls under “trade secret” category (Unfair Competition Prevention Act, Article 2, Paragraph 6).
(3) Who has what rights under the default rules (i.e., a legal rule)?
Legally, since patent rights are granted to inventors (the person who creates) and copyrights are held by
creators (the person who creates), the person who invents or creates such program will be granted the patents
rights and will hold the copyright [for such program].
Therefore, if the vendor develops the training program from scratch, then under the legal default rule, the
vendor would be granted patent rights and would also hold the copyright.
Further, when using a training program provided as an OSS, both the vendor and the user need to pay attention
to the contents of the license. The reason for this is that, depending on the contents of the OSS license,
there may be certain obligations such as the obligation to disclose source code.
4. Trained Model
(1) What is a “trained model”?
The trained model is a deliverable in which parties to the contract are very interested since, like the
training dataset, the trained model can be reused.
However, it is necessary in both the contract and negotiations to carefully determine the meaning of
the term “trained model”.
More particularly this means that since there are various definitions for “trained model” (such as
“functions”, “mathematical model”, “algorithms”, “network structure”, “inference program”, “parameters” and
“any combination of these concepts”), the use by the parties of different meanings [for “trained model”] could
become the source of great trouble.
Here, similar to the AI Guidelines, “trained model” means an inference program that includes “trained
parameters”.
(2) Whether or not it is covered by intellectual property rights
It is fine to think of the inference program part of the trained model in the same way you think of training
program.
In other words, if it satisfies the requirements of the Patent Act, the algorithm portion will be granted
protection as the “invention of a product (a computer program)” under the Patent Act, and, based on the
Copyright Act, the source code will be protected as a “work of computer programming” under the Copyright
Act.
For example, let’s assume that you discover a highly innovative, extremely accurate network structure related
to a specific development subject. You may be able to submit a patent application for that network structure
as an “invention of a product (a computer program)”.
In addition, [the network structure] will be protected under the Unfair Competition Prevention Act if it
constitutes a “trade secret” (Unfair Competition Prevention Act, Article 2, Paragraph 6).
Although I will discuss the “trained parameters” of the trained model later, I believe that ultimately the
trained parameters will not give rise to any intellectual property rights.
(3) Who has what rights under the default rules (i.e., a legal rule)?
In this case also, similar to the training program, if the vendor develops the “inference program” part from
scratch, under the legal default rule, the vendor would be granted patent rights and would also hold the
copyright.
5. Trained Parameters
(1) What are trained parameters?
Trained parameters” are the parameters (co-efficient) obtained as a result of learning using the
training dataset and the training program.
They are a large volume of strings of numerical values automatically generated by the training program. In the
case of deep learning, the major parameters among the trained parameters can be considered to be the ones used
for the weighting of each internode link.
(2) Whether or not it is covered by intellectual property rights
As I explained earlier, since the trained parameter is a large string of numerical values
automatically generated by a training program, and as such involves no creativity, I believe that it does
not constitute either an “invention” or a “work”.
However, the trained parameters can be protected if they fall under the trademark category (Unfair Competition
Prevention Act, Article 2, Paragraph 6) or the limited provision data category (Revised Unfair Competition
Prevention Act, Article 2, Paragraph 7).
(3) Who has what rights under the default rules (i.e., a legal rule)?
Since a trained parameter that does not fall under the trade secret (or other similar) category involves no
intellectual property rights, no one holds any rights [to such trained parameter]. As such, both the user and
the vendor have no choice but to stipulate in a contract who can use the trained parameter and in what manner.