Various know-how is required when developing AI software. For example, this is know-how related to, among other things, the acquisition and selection method of raw data, the processing method into training dataset, an efficient learning method using a training program, and preparation of the trained model in the production environment.
(1) Whether or not it is covered by intellectual property rights
Since know-how is intangible information, it cannot be the object of a copyright; however, if know-how meets the requirements for an “invention”, [such know-how] may be the object of patent rights. If the [know-how] falls under the trade secret category (Unfair Competition Prevention Act, Article Paragraph 6) or the limited provision data category (Revised Unfair Competition Prevention Act, Article 2, Paragraph 7), it will be protected.
(2) Who has what rights under the default rules (i.e., a legal rule)?
Since know-how that does not fall under the trade secret category or the invention category involves no intellectual property rights, no one holds any rights [to such know-how]. As such, both the user and the vendor have no choice but to stipulate in a contract who can use the know-how and in what manner.
If we were to summarize the [information] above, it could look like the chart below. (Postscript added).
By the way, since the Ministry of Economy, Trade and Industry’s “Enhanced Environment for Open Data Distribution Structures” (August 29, 2016, METI, hereafter the “METI material”) has a similar chart pertaining to AI-related intellectual property rights (P. 82, hereafter the “Enhanced Environment Chart”), I would like to provide a simple explanation of the relationship between the summary chart above and the Enhanced Environment Chart. (Mr. Takuji Hashizume, thank you for pointing this out!!)
First, although it depends on the type of raw data, since certain kinds of data (for example, mechanical
operating data, sensor data, and factual data) do not include intellectual property rights, while other
copyrightable data (such as photographs, voices, images, and novels) involve copyrights, I have revised the
summary chart that first appeared with respect to (raw) data for consistency
with the Enhanced Environment Chart on this point.
Next, in the summary chart and the Enhanced Environment Chart, practically the same items are mentioned with respect to “training dataset”.
In addition, if you look at page 78 in the METI material, I think that the “learning” in the Enhanced Environment Chart probably refers to the “training program” in the summary chart.
Further, judging from page 79 in the METI material indicating that “a trained model is an enumeration of numbers calculated by a computer (matrix, etc.)…”, I think that the “ trained model” in the Enhanced Environment Chart refers to the “trained parameter” in the summary chart. (Obviously, this is not an issue of whether the summary chart or the Enhanced Environment Chart is correct; rather, as explained before, it is an issue of defining “trained model”.)
Moreover, although it is noted in the Enhanced Environment Chart (the trained parameter section in the summary chart) that the trained model may be protected under the Patent Act and Copyright Act, I personally believe that this would be very difficult for the reason given earlier (i.e., [it is] “a large string of numerical values automatically generated by the training program”).
Finally, judging from the explanation in page 75 in the METI material to “incorporate the model in the application and use it as software”, I believe that the “use” in the Enhanced Environment Chart probably refers to the “trained model” (an inference program that incorporated trained parameters) in the summary chart. (This is confusing, isn’t it?)
“Enhanced Environment for Open Data Distribution Structures” (August 29, 2016, METI)
By now, you understand the legal default rules for the 6 objects in AI development.
The next important matter is how to craft contract provisions that benefit your company premised on those default rules.
Typical Deadlocked Pattern
In AI development agreements, a typical deadlocked pattern concerning rights and intellectual property is shown below.
The training dataset and the trained model are generated using raw data that is filled with our know-how and secrets and we pay a subcontracting fee for development. We have certain rights, don’t we?
No, you cannot generate a trained model with only raw data. What makes a high-performing model possible is largely thanks to our advanced know-how and intensive labor efforts in both the preprocessing of data and the training process of model. We have certain rights, don’t we?
How should we think about this?
This type of confrontation mainly stems from the [position] of the user and the vendor that the deliverables belong to their respective company, in other words, their persistent claims that the rights to the deliverables should belong to their company.
So, as long as both the vendor and the user continuously persist in their claims of “who has the right” (ownership of intellectual property rights) without any mutual resolution, the negotiations will take a tremendous amount of effort and time with both companies ultimately losing their respective competitiveness. In essence, there should be more contractual provisions than either party initially thought that can simultaneously meet the needs of both parties since the business structures of the person providing the data (the user) and the person generating the trained model (the vendor) are different.