Understanding AI Hallucinations: Making Sure You Don’t End Up At The Wrong Stop – Above the Law

We
talk
a
lot
about
the
ethical
duty
of
lawyers
and
legal
professionals
to
understand
the
risks
and
benefits
of
relevant
technology.
But
when
it
comes
to
using
GenAI,
that
might
not
be
enough.
If
we
want
to
prevent
the
increasing
number
of
hallucinations
and
inaccurate
citations
that
are
bedeviling
lawyers
and
even
judges,
we
need
to
understand
how
and
why
GenAI
systems
fail.

That
was
the
point
of
a

recent
paper
by
a
group
of
scientists
and
engineers:

Dylan
Restrepo,

Nicholas
Restrepo,

Frank
Huo,
and

Neil
Johnson.
The
paper
carried
the
lengthy
title,

When
AI
Output
Trips
to
Bad
but
Nobody
Notices:
Legal
Implications
of
AI’s
Mistakes.
In
addition
to
their
own
calculations
and
analysis,
the
group
also
consulted
a
couple
of
lawyers:

Daniela
Restrepo
and

Jean
Paul
Roekaert.
I
can’t
vouch
for
the
mathematical
calculations
but
what
they
conclude
squares
with
my
own
experience.

The
Basic
Premise

The
group
concludes
at
the
outset
that
rather
than
a
random,
unpredictable
glitch,
a
physics-based
analysis
demonstrates
that
hallucination
is
a
“foreseeable
engineering
risk.”
Meaning,
of
course,
the
circumstances
generating
its
occurrence
can
be
at
least
a
little
predictable.

According
to
the
paper,
GenAI
systems
have
“a
deterministic
mechanism
at
its
core
that
can
cause
output
to
flip
from
reliable
to
fabricated
at
a
calculable
step.”
And
that
step
unfortunately
comes
when
the
lawyer’s
need
is
the
greatest.

The
group’s
analysis
starts
from
the
proposition
that
we
should
know
by
now:
GenAI
is
“a
probabilistic
text
generator
engineered
to
predict
the
next
most
plausible
token
in
a
sequence,
without
any
internal
concept
of
legal
truth.”
It
is
not,
argues
the
group,
a
database
of
verified
legal
authorities.
(The
group
focused
on
the
publicly
available
systems
and
not
on
the
closed
systems
that
claim
to
rely
on
verified
legal
authorities.)

What
This
Means

Because
it’s
predicting,
not
analyzing,
GenAI
does
well
when
faced
with
inquiries
about
valid
legal
principles,
logical-sounding
arguments,
undisputed
case
facts,
procedural
history,
and
the
like.
But
when
faced
with
something
novel
and
complex,
the
tool
is
pushed
“into
a
region
where
training
data
is
sparse.”
In
an
effort
to
please
and
respond,
it
is
then
prone
to,
well,
make
stuff
up.

The
paper
puts
it
this
way:

The
tool
is
therefore
most
prone
to
failure
exactly
when
the
lawyer’s
need
is
greatest:
on
a
difficult
point
of
law
with
sparse
precedent.
The
act
of
researching
an
unsettled
legal
issue
via
an
LLM
becomes
the
principal
trigger
for
the
tipping
instability.

These
are
important
points
since
lawyers
live
in
a
world
where
a
hallucination,
an
error,
can
have
devastating
consequences.
So,
as
we

have
discussed,
given
that
risk,
GenAI
outputs
must
be
checked
over
and
over,
often
mitigating
the
cost
savings
of
using
the
tools
in
the
first
place.
But
if
we
understand
why
the
errors
occur
and
more
importantly
when,
we
can
better
and
more
safely
use
the
tools.

A
Blessing…And
a
Curse

If
true,
then
the
group’s
findings
are
a
blessing
since
it
suggests
a
sliding
scale
of
verification:
less
where
the
output
focuses
on
well-known
information
and
much
more
when
it
strays
into
the
novel.
Saves
time
and
energy.

But
for
those
uninformed
of
this
predictability,
the
fact
that
failure
can
occur
at
a
certain
point
can
be
a
curse.
Why:
a
lawyer
with
a
legal
project
often
starts
with
undisputed
facts,
then
seeks
information
on
what
the
law
generally
is
with
respect
to
the
issues
at
hand.
And
then
goes
to
more
complex,
ambiguous
areas
thinking
it’s
okay.

The
example
given
in
the
paper
is
a
statute
of
limitations
question.
A
lawyer
starts
their
use
of
ChatGPT
by
plugging
in
undisputed
facts.
They
then
seek
the
general
law
with
respect
to
the
limitation
period.
All
well
and
good:
the
lawyer
gets
correct
responses
and
then,
in
the
words
of
the
paper,
“gains
confidence
in
the
tool.”
So,
the
lawyer
then
begins
asking
for
more
ambiguous
information
about
how
that
law
can
be
used
to
leverage
the
facts
or
to
develop
arguments.

If
the
lawyer
takes
all
the
outputs
and
prepares
a
brief
based
on
the
information
obtained,
they
(or
their
supervisor)
might
be
tempted
to
spot
check
the
first
few
paragraphs,
find
nothing
amiss
and,
when
pressed
for
time,
conclude
the
rest
of
the
outputs
are
also
fine
when
they
are
not.

So,
the
blessing
becomes
a
curse:
“AI’s
period
of
correct
output
increases
rather
than
decreases
the
risk
of
harm,
because
it
builds
the
user’s
trust
just
before
the
fabrication
appears.”

What
To
Do

So,
what
do
we
make
of
all
this?
Again,
I’m
no
scientist
but
I
do
know
from
experience
that
the
more
general
information
I
seek
from
GenAI,
the
more
prone
it
is
to
be
correct.
When
I
stray
into
more
ambiguous
areas
where
there
is
less
known
about
a
subject,
the
more
errors
I
tend
to
get.

For
example,
I
once
asked
for
information
about
a
well-known
painter.
I
got
great
information.
But
when
I
asked
about
another
painter
in
the
same
school
of
painting
who
was
relatively
obscure,
the
tool
just
made
up
a
name.
Or
when
I
asked
what’s
the
subway
stop
to
take
to
catch
the
Q70
bus
to
LaGuardia
Airport,
it
got
it
right.
When
I
asked
the
best
route
from
my
hotel
(which
involves
more
ambiguity),
it
sent
me
to
the
wrong
stop.
It
did
say
sorry
when
I
pointed
out
the
error
(after
some
argument).

The
point
being
for
lawyers
and
legal
professionals
is
to
understand
that
“AI
possesses
no
independent
legal
agency:
it
is
a
computational
tool.”
Granted,
it
is
a
computational
tool
with
which
you
can
converse
like
a
human.
It
reacts
in
human
ways.
It’s
tempting
to
anthropomorphize
it.

But
that’s
where
we
go
wrong.
Instead,
we
need
to
start
with
thinking
of
it
not
as
a
person
but
a
product
with
a
foreseeable
engineering
risk.
Like
a
sharp
knife
or
an
ATV.
A
risk
that
appears
to
materialize
when
faced
with
novelty
and
ambiguity.
But
it’s
that
novelty
and
ambiguity
that
creates
the
greatest
risk
of
hallucination,
according
to
the
paper.

For
lawyers,
that
means
if
you
are
going
to
use
this
sharp
knife,
you
better
know
how
and
in
what
circumstances.
You
need
to
know
how
to
do
that
safely.

The
paper
says
it
the
best:
“The
duty
of
technological
competence,
as
expressed
in
ABA
Model
Rule
1.1
and
its
state-level
counterparts,
must
evolve.
It
is
no
longer
sufficient
for
a
lawyer
to
know
how
to
operate
a
piece
of
software.
Competence
now
requires
a
practical
understanding
of
how
that
software
can
fail.”
That
it
is
clearly
right
about.

Want
to
use
GenAI?
Use
it
to
access
known
information
that
would
be
time
consuming
or
difficult
to
otherwise
get.
Ask
it
to
do
a
lot
of
things
where
accuracy
isn’t
that
important.
But
don’t
ask
novel
or
unsettled
legal
questions,
without
checking
and
double
checking
what
you
get
back.
Else
you
might
get
off
at
the
wrong
subway
stop.

Or
much
worse.

Stephen
Embry
is
a
lawyer,
speaker,
blogger,
and
writer.
He
publishes TechLaw
Crossroads,
a
blog
devoted
to
the
examination
of
the
tension
between
technology,
the
law,
and
the
practice
of
law.

+263 242 744 677

4 Gunhill Avenue,

Understanding AI Hallucinations: Making Sure You Don’t End Up At The Wrong Stop – Above the Law