California federal court allows AI copyright training claims against Databricks and Mosaic ML to proceed, finding authors stated a sufficient complaint to survive dismissal
A California federal judge has denied a motion to dismiss brought by software and artificial intelligence companies Databricks and Mosaic ML, allowing a proposed class action by authors to proceed. The plaintiffs allege their published works were used without authorisation to train large language models (LLMs) — AI systems trained on vast datasets of text to generate human-like outputs — operated by the defendants. The court found the proposed class of writers had asserted a legally sufficient complaint, declining to strike the case at the pleading stage. The ruling means the parties will proceed to discovery — the pre-trial evidence gathering phase — which will require Databricks and Mosaic ML to disclose details of their training data sourcing, curation, and licensing practices. The decision joins a growing body of US case law testing whether using copyrighted works to train AI systems constitutes infringement under the fair use doctrine (which permits limited use of copyrighted material without authorisation in certain circumstances). While this case is US-domestic, it has direct implications for UK and EU AI developers: the EU AI Act (which entered force in August 2024 and is being phased in through 2026–2027) includes transparency obligations requiring providers of general-purpose AI models to publish summaries of training data, with copyright compliance a key condition. UK courts are also expected to face similar training data disputes as the UK's AI and Copyright consultation — launched by the government in 2024 — moves towards legislative action.
Why this matters
The survival of this claim past the pleading stage is legally significant because it forces disclosure of training data practices — information that could form the evidential basis for similar claims in the UK and EU. For City law firms advising AI developers or corporate clients licensing AI tools, the case reinforces the need for robust data processing agreements and training data IP clearance processes before model deployment. The EU AI Act's general-purpose AI transparency requirements (applicable to models with systemic risk, including the most capable LLMs) are directly engaged: Article 53 of the Act requires training data copyright compliance summaries. UK-based developers face parallel exposure under CDPA 1988 (Copyright, Designs and Patents Act) and the unresolved text-and-data-mining exception regime. Law firms advising AI vendors need to build compliance frameworks that anticipate litigation discovery — the training data decisions made now will be the documents produced in court later.
On the Ground
A trainee on an AI governance mandate would assist with data processing agreement markup to include training data provenance warranties, draft a vendor due diligence questionnaire for a corporate client procuring LLM-based tools (covering training data sourcing and copyright clearance), and prepare a regulatory impact assessment memo on the client's exposure under the EU AI Act's transparency obligations.
Interview prep
Soundbite
Training data discovery in US AI copyright cases will reach UK and EU developers — the documents you create now become the evidence in future litigation.
Question you might get
“How would you advise a UK-based AI startup on structuring its training data licensing programme to minimise copyright infringement exposure in light of both US litigation trends and the EU AI Act's transparency obligations?”
Full answer
A California federal court has allowed authors' copyright claims against Databricks and Mosaic ML to survive dismissal, meaning the companies must now face discovery into their AI training data practices. For lawyers advising AI developers, this ruling signals that training data sourcing decisions are litigation-ready facts — the due diligence frameworks built today will be scrutinised in court tomorrow. The wider picture connects to the EU AI Act's transparency requirements for general-purpose AI models and the UK's unresolved copyright-and-AI legislative process, both of which will generate compliance mandates for firms with technology and IP practices. This suggests that AI governance and IP clearance work will be a sustained growth area for City firms advising technology and financial services clients through 2026 and beyond.
My notes
saved