Why Generative AI Businesses are Fighting So Hard for Fair Use
And a little bit of science, too!
📰 HEADLINES: Recent articles with eye-catching headlines reported that OpenAI admits generative AI models require copyrighted works to function. This limitation arises from a fundamental property of AI models.
💬 WHAT OPEN AI SAID: "Because copyright today covers virtually every sort of human expression -- including blogposts, photographs, forum posts, scraps of software code, and government documents -- it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens."
💡 KEY IDEA: This limitation arises because generative AI models, and AI models in general, are not mechanistic. They are empirical.
📖 IDEA DETAILS: An empirical model captures the statistics of the training data. A mechanistic model captures how the underlying process creates the data. A critical feature arising from this difference is that mechanistic models extrapolate well while empirical models fail at extrapolation.
🔑 WHY IT MATTERS: Generative AI models are bad at extrapolation so they will fail to generate output consistent with human expectation unless the user's prompt is well supported by human-created training data. E.g., the directive "in the style of" will only work if there are enough training samples of that style to guide the interpolation.
💰 THE FIGHT: Only by training on a supporting range of art and language artifacts produced by humans can these tools generate interesting, monetizable outputs. That is why companies working on generative AI are fighting to obtain a legal right of FAIR USE to as wide a swath of human works as possible. Without it the business model may not be viable. One alternative route is licensing with companies that already own rights to large repositories of content (think stock photo repositories, social media sites, etc.).
🎮 IN GAMES: For generative AI tools in the game industry, training materials can include art of all types (3D, 2D, concept, environment, etc.), voice recordings, VO scripts, lore, and writing.
⚠️ PREDICTION: A successful fair use argument can result in an essentially involuntary mass transfer of intellectual property rights from millions of individuals to a handful of business entities, or effectively to the public domain. Either way, millions of creative individuals could go uncompensated for their work and AI tools, monetized or not, could create similar or near-reproductions of their work.
🔮 PREDICTION: The generative AI FOMO being amplified by AI maximalists is incredibly strong, as are the possibilities for boosting human productivity. But if history is a guide, the outcome of this latest technology wave (a) will not be what we hope, (b) will not be what we fear, (c) will still surprise us, and (d) will also disappoint us.