The law firm of choice for internationally focused companies

+263 242 744 677

admin@tsazim.com

4 Gunhill Avenue,

Harare, Zimbabwe

Two Judges, Same District, Opposite Conclusions: The Messy Reality Of AI Training Copyright Cases – Above the Law

Within
days
of
each
other,
two
federal
judges
in
the
same
district
reached
completely
opposite
conclusions
about
AI
training
on
copyrighted
works.
Judge
William
Alsup
said
it’s
likely
fair
use
as
transformative.
Judge
Vince
Chhabria
said
it’s
likely
infringing
because
of
the
supposed
impact
on
the
market.
Both
rulings
came
out
of
the
Northern
District
of
California,
both
involve
thoughtful
judges
with
solid
copyright
track
records,
and
both
can’t
be
right.

The
disconnect
reveals
something
important:
we’re
watching
judges
fixate
on
their
personal
bugbears
rather
than
grappling
with
the
fundamental
questions
about
how
copyright
should
work
in
the
age
of
AI.
It’s
a
classic
case
of
blind
men
and
an
elephant,
with
each
judge
touching
one
part
of
the
problem
and
declaring
that’s
the
whole
animal.

just
wrote
about
Judge
Alsup’s
careful
analysis
,
which
found
that
training
AI
was
likely
protected
as
fair
use,
but
building
an
internal
digital
library
on
unlicensed
downloaded
works
was
probably
not.
Before
that
piece
was
even
published,
Judge
Vince
Chhabria came
out
with
a
ruling
that
disagrees
.

The
summary:
AI
training
is
likely
infringing.
But
here,
the
plaintiff
authors
failed
to
present
evidence,
and
thus,
their
case
against
Meta
is
dismissed.
Ironically,
Alsup’s
ruling
was
probably
a
win
for
AI
innovation
but
a
loss
for
Anthropic.
Chhabria’s
is
the
opposite:
a
clear
win
for
Meta,
but
potentially
devastating
for
AI
innovation
generally.


Chhabria’s
Flawed
Market
Harm
Analysis

Chhabria’s
ruling
seems
to
overweight
(and,
I
think
incorrectly
predict)
the
“effect
on
the
market”
aspect
of
the
fair
use
analysis:


Because
the
performance
of
a
generative
AI
model
depends
on
the
amount
and
quality
of
data
it
absorbs
as
part
of
its
training,
companies
have
been
unable
to
resist
the
temptation
to
feed
copyright-protected
materials
into
their
models—without
getting
permission
from
the
copyright
holders
or
paying
them
for
the
right
to
use
their
works
for
this
purpose.
This
case
presents
the
question
whether
such
conduct
is
illegal.


Although
the
devil
is
in
the
details,
in
most
cases
the
answer
will
likely
be
yes.
What
copyright
law
cares
about,
above
all
else,
is
preserving
the
incentive
for
human
beings
to
create
artistic
and
scientific
works.
Therefore,
it
is
generally
illegal
to
copy
protected
works
without
permission.
And
the
doctrine
of
“fair
use,”
which
provides
a
defense
to
certain
claims
of
copyright
infringement,
typically
doesn’t
apply
to
copying
that
will
significantly
diminish
the
ability
of
copyright
holders
to
make
money
from
their
works
(thus
significantly
diminishing
the
incentive
to
create
in
the
future).
Generative
AI
has
the
potential
to
flood
the
market
with
endless
amounts
of
images,
songs,
articles,
books,
and
more.
People
can
prompt
generative
AI
models
to
produce
these
outputs
using
a
tiny
fraction
of
the
time
and
creativity
that
would
otherwise
be
required.
So
by
training
generative
AI
models
with
copyrighted
works,
companies
are
creating
something
that
often
will
dramatically
undermine
the
market
for
those
works,
and
thus
dramatically
undermine
the
incentive
for
human
beings
to
create
things
the
old-fashioned
way

I
find
this
entire
reasoning
extremely
problematic,
and
it’s
why
I
mentioned
in
the
Alsup
piece
that
I
don’t
think
the
“effect
of
the
use
upon
the
market”
should
really
be
a
part
of
the
fair
use
calculation.
Because any type
of
competition
can
lead
fewer
people
to
buy
a
different
work.
Or
it
can
inspire
people
to
actually
buy more
works
 because
of
more
interest.
Chhabria’s
example
here
seems
particularly…
weird:


Take,
for
example,
biographies.
If
a
company
uses
copyrighted
biographies
to
train
a
model,
and
if
the
model
is
thus
capable
of
generating
endless
amounts
of
biographies,
the
market
for
many
of
the
copied
biographies
could
be
severely
harmed.
Perhaps
not
the
market
for
Robert
Caro’s
Master
of
the
Senate,
because
that
book
is
at
the
top
of
so
many
people’s
lists
of
biographies
to
read.
But
you
can
bet
that
the
market
for
lesser-known
biographies
of
Lyndon
B.
Johnson
will
be
affected.
And
this,
in
turn,
will
diminish
the
incentive
to
write
biographies
in
the
future.

This
is
where
Chhabria’s
reasoning
completely
falls
apart.
He
admits
in
his
own
example
that
Robert
Caro’s
biography
would
be
fine
because
“that
book
is
at
the
top
of
so
many
people’s
lists.”
But
that
admission
destroys
his
entire
argument:
people
recognize
that
a
good
biography
is
a
good
biography,
and
AI
slop—even
AI
slop
generated
from
reading
other
good
biographies—is
not
a
credible
substitute.

More
fundamentally,
his
logic
would
make
any
learning
from
existing
works
potentially
infringing.

If
you
go
to
Ford’s
Theatre
in
DC,
where
Lincoln
was
shot
and
killed,
you
can
actually
see a
very
cool
tower
of
every
book
 they
could
find
written
about
Lincoln.
Under
Chhabria’s
reasoning,
this
abundance
should
have
killed
the
market
for
Lincoln
biographies
decades
ago.
Instead,
new
ones
keep
getting
published
and
finding
audiences.

If
any
of
the
authors
of
any
of
those
books
read
any
of
the
other
books,
learned
from
them,
and
then
wrote
their
own
take
which
did
not
copy
any
of
the
protectable
expression
of
the
other
books,
would
that
be
infringing?
Of
course
not.
Yet
Chhabria’s
analysis
seems
to
argue
that
it
would
likely
be
so.


Or
take
magazine
articles.
If
a
company
uses
copyrighted
magazine
articles
to
train
a
model
capable
of
generating
similar
articles,
it’s
easy
to
imagine
the
market
for
the
copied
articles
diminishing
substantially.
Especially
if
the
AI-generated
articles
are
made
available
for
free.
And
again,
how
will
this
affect
the
incentive
for
human
beings
to
put
in
the
effort
necessary
to
produce
high-quality
magazine
articles?

This
argument
would
be
more
compelling
if
the
internet
hadn’t
already
been
flooded
with
free
content
for
decades.
Plenty
of
the
internet
(including
this
very
site)
consists
of
freely
available
articles
based
on
our
reading
and
analysis
of
magazine
articles.
This
hasn’t
destroyed
the
market
for
original
journalism—it’s
just
competition.
And,
indeed,
some
of
that
competition
can
actually increase the
market
for
the
original
works
as
well.
If
I
read
a
short
summary
of
a
magazine
article,
that
may
make
me
even
more
likely
to
want
to
read
the
original,
professionally
written
one.

So
I
don’t
find
either
of
these
examples
particularly
compelling,
and
am
a
bit
surprised
that
Chhabria
does.
He
does
admit
that
other
kinds
of
works
are
“murkier”:


With
some
types
of
works,
the
picture
is
a
bit
murkier.
For
example,
it’s
not
clear
how
generative
AI
would
affect
the
market
for
memoirs
or
autobiographies,
since
by
definition
people
read
those
works
because
of
who
wrote
them.
With
fiction,
it
might
depend
on
the
type
of
book.
Perhaps
classic
works
of
literature
like
The
Catcher
in
the
Rye
would
not
see
their
markets
diminished.
But
the
market
for
the
typical
human-created
romance
or
spy
novel
could
be
diminished
substantially
by
the
proliferation
of
similar
AI-created
works.
And
again,
the
proliferation
of
such
works
would
presumably
diminish
the
incentive
for
human
beings
to
write
romance
or
spy
novels
in
the
first
place.

Again,
even
his
murkier
claims
seem
weird.
There
are
so
many
romance
and
spy
novels
out
there,
with
more
coming
out
all
the
time,
and
the
fact
that
the
market
is
flooded
with
such
books
doesn’t
seem
to
diminish
the
demand
for
new
ones.

This
all
feels
suspiciously
like
the
debunked
arguments
during
the
big
internet
piracy
wars
about
how
downloading
music
for
free
would
magically
make
it
so
that
no
one
wanted
to
make
music
ever
again.
The
reality
was
actually
quite
different:
the
fact
that
the
tools
for
production
and
distribution
became
much
easier
and
more
democratic,
meant
that
more
music
than
ever
before
was
actually
produced,
released,
distributed…
and
monetized
in
some
form.

So
the
entire
premise
of
Chhabria’s
argument
just
seems…
wrong.


The
Alsup
vs.
Chhabria
Split

Chhabria
also
takes
a
fairly
dismissive
tone
on
the
question
of
transformativeness.
And
even
though
he
likely
wrote
most
of
this
opinion
before
Alsup’s
became
public,
he
adds
in
a
short
paragraph
addressing
Alsup’s
ruling:


Speaking
of
which,
in
a
recent
ruling
on
this
topic,
Judge
Alsup
focused
heavily
on
the
transformative
nature
of
generative
AI
while
brushing
aside
concerns
about
the
harm
it
can
inflict
on
the
market
for
the
works
it
gets
trained
on.
Such
harm
would
be
no
different,
he
reasoned,
than
the
harm
caused
by
using
the
works
for
“training
schoolchildren
to
write
well,”
which
could
“result
in
an
explosion
of
competing
works.”
Order
on
Fair
Use
at
28,
Bartz
v.
Anthropic
PBC,
No.
24-cv-5417
(N.D.
Cal.
June
23,
2025),
Dkt.
No.
231.
According
to
Judge
Alsup,
this
“is
not
the
kind
of
competitive
or
creative
displacement
that
concerns
the
Copyright
Act.”
Id.
But
when
it
comes
to
market
effects,
using
books
to
teach
children
to
write
is
not
remotely
like
using
books
to
create
a
product
that
a
single
individual
could
employ
to
generate
countless
competing
works
with
a
miniscule
fraction
of
the
time
and
creativity
it
would
otherwise
take.
This
inapt
analogy
is
not
a
basis
for
blowing
off
the
most
important
factor
in
the
fair
use
analysis.

Here
we
see
the
fundamental
disagreement:
Alsup
thinks
transformativeness
is
the
key
factor;
Chhabria
thinks
market
impact
trumps
everything
else.
Both
can’t
be
right,
and
the
fair
use
four-factor
test
gives
judges
enough
wiggle
room
to
justify
either
conclusion.

Chhabria
does
agree
that
training
LLMs
is
transformative:


This
factor
favors
Meta.
There
is
no
serious
question
that
Meta’s
use
of
the
plaintiffs’
books
had
a
“further
purpose”
and
“different
character”
than
the
books—that
it
was
highly
transformative.
The
purpose
of
Meta’s
copying
was
to
train
its
LLMs,
which
are
innovative
tools
that
can
be
used
to
generate
diverse
text
and
perform
a
wide
range
of
functions.
Cf.
Oracle,
593
U.S.
at
30
(transformative
to
use
copyrighted
computer
code
“to
create
a
new
platform
that
could
be
readily
used
by
programmers”).
Users
can
ask
Llama
to
edit
an
email
they
have
written,
translate
an
excerpt
from
or
into
a
foreign
language,
write
a
skit
based
on
a
hypothetical
scenario,
or
do
any
number
of
other
tasks.
The
purpose
of
the
plaintiffs’
books,
by
contrast,
is
to
be
read
for
entertainment
or
education.

But
he
thinks
market
harm
is
more
important—a
conclusion
that
would
gut
much
of
fair
use
doctrine
if
applied
consistently.

Also,
while
Alsup
focused
heavily
on
the
unauthorized
works
that
Anthropic
downloaded
and
then
stored
in
an
internal
“library”
and
Chhabria
goes
into
great
detail
about
how
Meta
used
BitTorrent
to
download
similar
(and
in
some
cases,
identical)
copies
of
books,
he
leaves
for
another
day
the
question
of
whether
that
aspect
is
infringing.

Indeed,
in
some
ways,
these
two
cases
represent
the
old
claim
that
the
fair
use
four
factors
is
just
an
excuse
to
do
whatever
the
judge
wants
to
do
and
then
try
to
work
backwards
to
try
to
justify
it
in
more
legalistic
terms
using
those
for
factors.


The
Plaintiffs’
Spectacular
Failure

Given
all
this,
you
might
think
that
Chhabria
ruled
against
Meta,
but
he
did
not,
mainly
because
the
crux
of his opinion—that
these
AI
tools
will
flood
the
market
and
diminish
the
incentives
for
new
authors—is
so
ludicrous
that
the
plaintiffs
in
this
case barely
even
raised
it
as
an
issue
 and
presented
no
evidence
in
support.


In
connection
with
these
fair
use
arguments,
the
plaintiffs
offer
two
primary
theories
for
how
the
markets
for
their
works
are
affected
by
Meta’s
copying.
They
contend
that
Llama
is
capable
of
reproducing
small
snippets
of
text
from
their
books.
And
they
contend
that
Meta,
by
using
their
works
for
training
without
permission,
has
diminished
the
authors’
ability
to
license
their
works
for
the
purpose
of
training
large
language
models.
As
explained
below,
both
of
these
arguments
are
clear
losers.
Llama
is
not
capable
of
generating
enough
text
from
the
plaintiffs’
books
to
matter,
and
the
plaintiffs
are
not
entitled
to
the
market
for
licensing
their
works
as
AI
training
data.
 As
for
the
potentially
winning
argument—that
Meta
has
copied
their
works
to
create
a
product
that
will
likely
flood
the
market
with
similar
works,
causing
market
dilution—the
plaintiffs
barely
give
this
issue
lip
service,
and
they
present
no
evidence
about
how
the
current
or
expected
outputs
from
Meta’s
models
would
dilute
the
market
for
their
own
works.


Given
the
state
of
the
record,
the
Court
has
no
choice
but
to
grant
summary
judgment
to
Meta
on
the
plaintiffs’
claim
that
the
company
violated
copyright
law
by
training
its
models
with
their
books.

In
short,
the
court’s
ruling
in
this
case
is
that
the
winning
argument
is
the
impact
on
the
market,
while
the
plaintiffs
in
this
case
focused
on
the
claim
that
the
outputs
of
AI
tools
trained
on
their
works
was
infringing.
But,
Chhabria
notes,
that
argument
is
silly.

The
irony
is
delicious:
Chhabria
essentially
handed
the
authors
a
roadmap
for
how
to
beat
AI
companies
in
future
cases,
but
these
particular
authors
were
too
focused
on
their
other
weak
theories
to
follow
it.
It’s
a
clear
win
for
Meta,
but
potentially
devastating
precedent
for
AI
development
generally.

What
we’re
watching
is
how
the
fair
use
four-factor
test
can
be
manipulated
to
justify
almost
any
conclusion
a
judge
wants
to
reach.
Alsup
prioritized
transformativeness
and
found
for
fair
use.
Chhabria
prioritized
market
harm
and
found
against
it
(even
while
ruling
for
Meta
on
procedural
grounds).
Both
wrote
lengthy,
seemingly
reasoned
opinions
reaching
opposite
conclusions
from
largely
similar
facts.

This
case
isn’t
settled.
Neither
is
the
broader
question
of
AI
training
and
copyright.
We’re
still
years
away
from
definitive
answers,
and
in
the
meantime,
companies
and
developers
are
left
navigating
a
legal
minefield
where
identical
conduct
might
be
fair
use
in
one
courtroom
and
infringement
in
another.


Two
Judges,
Same
District,
Opposite
Conclusions:
The
Messy
Reality
Of
AI
Training
Copyright
Cases


More
Law-Related
Stories
From
Techdirt:


Two
Judges,
Same
District,
Opposite
Conclusions:
The
Messy
Reality
Of
AI
Training
Copyright
Cases
We
Have
All
Become
Too
Comfortable
With
Corruption


More
Than
90
Percent
Of
ICE
Detainees
Have
Never
Been
Convicted
Of
Violent
Crimes


Trump
NHTSA
‘Investigates’
Tesla
Robotaxis
Failing
To
Adhere
To
Basic
Austin
Traffic
Laws