It’s A Small (Language Model) World After All – Above the Law

At
this
year’s

ILTACON,
between
the
open
bars
and
the
marketing
bingo
cards,
I
picked
up
on
a
murmur
running
through
the
legal
tech
crowd.
While
OpenAI
and
Anthropic
continue
begging
for
more
and
more
investor
cash
in
the
face
of
consistently
lackluster
earnings,
some
vendors
delivering
advanced
AI
to
the
legal
industry
dropped
hints
about
growing
interest
in
small
models.
It’s
not
that
large
language
models
don’t
work
—

though
they
often
don’t
—
but
they’re
overbloated
science
experiments
that,
as
Goldman
Sachs
observed,

require
exponentially
increased
resources
to
achieve
tiny
linear
gains.
Practical
applications,
at
least
in
legal,
don’t
need
models
that
need
the
human
battery
array
from

The
Matrix
just
to
say,
“here’s
a
haiku
about
ERISA.”

This
week,
a
number
of
developments
from
the
greater
tech
world
tend
to
confirm
that
the
future
is
small.

Small
models
—
for
the
purpose
of
this
discussion
—
are
“small”
only
as
compared
to
the
labyrinthian
architectures
behind
products
like
GPT-5.
That
said,
these
smaller
models
deliver
cheaper
results
without
much
drop
off
in
quality.
Some
could
be
light
enough
to
run
on
institutional
hardware,
meaning
law
firms
and
corporate
clients
can
keep
their
data
in-house
instead
of
shipping
it
off

to
Silicon
Valley
narcs.
For
an
industry
that
still
treats
the
cloud
like
it’s
a
Soviet
spy
balloon
—
an
overreaction,
but
a
persistent
one
—
the
pitch
for
small
models
is
obvious:
more
control,
less
spend,
nearly
the
same
output.

This
week,

Meta
announced
its
small
reasoning
model,
confirming
that
the
race
toward
small
might
be
on.
Its
new
model
is
designed
to
be
hosted
locally
and
will
be
specialized
—
as
small
models
are
by
necessity
—
to
math
and
coding
applications,
but
the
announcement
bucks
what
had
been
a
runaway
train
behind
building
bigger
and
bigger
models.
Going
small
might
also
be
in
Meta’s
best
interest
since
this
week’s
demonstration
of
its
general
AI
offering
imploded
on
stage
during
a
live
demo:

I’ll
bet
Zuckerberg
never
thought
he’d
ever
find
himself
on
stage
thinking
back
fondly
to
the
Metaverse
announcement.
Wifi
problems?
Sure,
bud.

For
some
time
now,
I’ve
been
saying
that
whoever
delivers
the
“American
DeepSeek”
wins
the
long-term
AI
crown.
China-based
DeepSeek
is
still
a
large
model
by
technical
standards,
but
much
smaller
than
the
competition,
and
it
burst
onto
the
scene
this
year
claiming
to
do
basically
everything
the
behemoth
American
models
can
for
a
fraction
of
the
price.
Except
tell
you
what
happened
in
Tiananmen
Square
in
1989,
of
course.
Investors
in
up
to
their
necks
with
the
big
American
foundational
models
tried
to
downplay
DeepSeek’s
cheapness
claims,
arguing
that
the
Chinese
government

must
have
contributed
more
money
under
the
table
to
bring
the
product
to
life.
Though
even
the
most
aggressive
theories
of
Chinese
government
involvement
still
ended
in
a
product
that
cost
a
tiny
fraction
of
what
the
Americans
spent
that
still

outperforming
American
models
on
some
tasks.
Anyone
able
to
replicate
that
without
the
lingering
concern
that
the
product
is
scraping
corporate
secrets
into
a
PRC
database
should
dominate
the
space.

This
week,
in
a
preprint
of
a
peer-reviewed
paper,
DeepSeek
disclosed
the
cost
of
training
its
R1
model
was…

$294,000.
That’s
cheaper
than
a
second-year
associate
once
you
include
the
bonus
and
the
cost
of
every
midnight
Uber
Eats
order
and
2
a.m.
black
car
voucher.
With
cheaper
training
comes
cheaper
operation.
DeepSeek
charges
something
like

$0.0011
per
thousand
tokens,
which
is
a
whopping
27
times
cheaper
than
OpenAI.

But
are
smaller
models
ready
for
the
“agentic”
revolution?
The
answer
is
yes.
And
not
just
because

“agentic”
is
empty
buzzword
that
should
be
purged
from
legal
tech
conversations.
According
to
VentureBeat,
“agentic”
is,
charitably,
“a
largely
nebulous
term
still
to
this
day
in
the
AI
industry.”
Less
charitably,
tech
commentator
Ed
Zitron
describes
it
as
“one
of
the
most
egregious
acts
of
fraud
I’ve
seen
in
my
entire
career
writing
about
this
crap,
and
that
includes
the
metaverse.”
Fundamentally,
it’s
a
batch
file
of
chatbot
prompts
—
which
is
not
necessarily
a
dig,
since
curated
and
vetted
prompts
make
for
better
results
—
but,
in
action,
agents
take
short,
general
prompts
from
the
user
and
from
that
build
a
workflow
—
which
a
chatbot
can
do
—
and
then
use
that
workflow
to
generate
results,
often
by
pinging
outside
resources.
It
can
save
some
time
over
repeatedly
prompting
a
bot,
but
it’s
not
a
robot
lawyer
run
amok
like
the
“agent”
branding
might
suggest.

They
also
fail
a
lot.
According
to
Salesforce,
the
company

putting
more
eggs
in
agentic
AI
than
anyone,
agents
“achieve
around
a
58
percent
success
rate
on
tasks
that
can
be
completed
in
a
single
step
without
needing
follow-up
actions
or
more
information”
and
this
falls
“to
35
percent
when
a
task
requires
multiple
steps.”
This
is

their
own
research!

However,
designed
by
the
right
hands,
these
systems
can
produce
better
and
faster
results
than
a
user
working
alone.
But,
again,
do
they
need
large
models
to
pull
this
off?

Also
this
week,
Alibaba’s
AI
research
team
dropped

Tongyi
DeepResearch,
“on
par
with
OpenAI’s
DeepResearch
across
a
comprehensive
suite
of
benchmarks.”
Per

VentureBeat:

The
new Tongyi DeepResearch Agent
is
setting
off
a
furor
among
AI
power
users
and
experts
around
the
globe
for
its
high
performance
marks:

according
to
its
makers,
its
the
“the
first
fully
open-source
Web
Agent
to
achieve
performance
on
par
with
OpenAI’s
Deep
Research
with
only
30B
(Activated
3B)
parameters.”

That
is…
small.
By
way
of
comparison,
GPT-4
supposedly
ran
on
2
trillion
parameters.
Compared
to
an
activated
3
billion,
that’s
an
ominous
666x
difference.

Look,
large
models
played
their
part.
Without
them,
we
probably
wouldn’t
have
these
workable
smaller
models.
The
real
trick
of
a
large
model
is
that
it’s
nearly
impossible
to
properly
weight
a
model
to
get
the
most
efficient
results.
But
once
the
model
is
massive,
it
will
develop
smaller
sub-models
doing
the
real
work
on
various
queries.
The
premise
of
the

Lottery
Ticket
Hypothesis
is
that
once
you
have
a
big
enough
model,
you
can
start
paring
down
to
find
the
ideally
weighted
model
that
wouldn’t
have
been
uncovered
but
for
the
original
massive
investment.
At
that
point,
you
can,
as
the
joke
goes,
build
the
whole
plane
out
of
the
black
box
—
market
a
smaller
model
that
does
everything
an
application
actually
needs
and
nothing
more.

As
an
industry,
AI
can
start
cashing
in
those
winning
tickets
instead
of
doubling
down
on
lotto
scratchers.

This
is
especially
true
in
legal,
where
our
applications
don’t
require
paving
over
the
Mohave
with
server
farms,
we
just
need
something
smart
enough
to
speed
up
the
job.
When
you’re
summarizing
depositions,
you’re
not
going
to
find
yourself
hurting
because
the
underlying
model
wasn’t
trained
on
a
10-year-old
TypePad
blog
post
about
birdwatching.
For
our
profession,
small
is
both
beautiful
and
indispensable.

And
cheaper.
Did
we
mention
cheaper
yet?
Because
it’s
cheaper.

The
AI
landscape
isn’t
going
to
shift
overnight,
but
as
this
week
suggests,
the
tide
might
be
turning.
It’s
hard
to
imagine
OpenAI
going
belly
up
in
a
few
months
(unless
you
actually
look
at
their
revenues
and
expenditures).

But
it
was
also
hard
to
imagine
a
world
without
Napster
or
MySpace.

Joe
Patrice is
a
senior
editor
at
Above
the
Law
and
co-host
of

Thinking
Like
A
Lawyer.
Feel
free
to email
any
tips,
questions,
or
comments.
Follow
him
on Twitter or

Bluesky
if
you’re
interested
in
law,
politics,
and
a
healthy
dose
of
college
sports
news.
Joe
also
serves
as
a

Managing
Director
at
RPN
Executive
Search.

+263 242 744 677

4 Gunhill Avenue,

It’s A Small (Language Model) World After All – Above the Law