GenAI Hallucinations? Lawyers Aren’t Perfect Either

By Anonymous.

Lawyers make mistakes. That’s why they take out professional liability insurance. This raises the question in the wake of the Stanford University study into the accuracy of Thomson Reuters’ and LexisNexis’ generative AI tools: what are we comparing these systems against when it comes to hallucinations and inaccuracy?

Or put it another way: yes, generative AI systems create errors, but are lawyers without such technology more accurate?

The regulation around this topic makes it clear that legal perfection is not expected. For example, in England & Wales the Solicitors Regulation Authority (SRA) requires firms to hold professional indemnity insurance. And it’s this ongoing need for insurance coverage that is a key reason why the SRA will step in when a law firm collapses. In short, regulators expect lawyers to make errors.

So when we’re comparing the accuracy rates of generative AI or another overhyped and under-explained technology with what lawyers do, then we’re comparing two inherently flawed processes. Humans are imperfect and so is generative AI. The question then is: how flawed are they and what level is acceptable?

What is not properly quantified at present is how many mistakes ‘a’ lawyer makes, or how many mistakes a legal ‘team’ providing advice make. Clearly it’s not zero. So how good are human lawyers?

Can we get some hints from looking at how law firms are charged for insurance?

Going through the renewal process for professional indemnity cover (in England & Wales) you’re evaluated on your ‘gross fee income’ and this is then adjusted by the kind of work done and, obviously, your claims history. This rounds out at about 1.5% of turnover for larger firms in England & Wales. Even assuming actuarially fair insurance, this does not imply a 98.5% accuracy rate for legal work overall.

A lot of the mistakes of the sort that are being spotted in the Stanford study are unlikely to contribute to this figure as the structure of the work mitigates them: supervisors, managing associates and partners will catch their juniors’ errors; opposing counsel will query language that doesn’t make sense to them; and in the most embarrassing case, clients will spot things that seem off.

Also, many mistakes are never spotted. Or, if they are, they get smoothed over. Looking through old work bibles usually leads to comments like ‘I’m not sure I would have drafted it that way’ and then a resolution not to use those documents in the future.

I.e. Insurance failure rates and Professional Indemnity Insurance claims may dramatically understate the frequency of mistakes.

This means that although regulators expect lawyers to make significant errors, many of those errors are 1) picked up internally, 2) highlighted by opposing counsel, 3) picked up by the client, or 4) manage to slip through the net and no-one notices….until someone eventually does.  

This in turn raises the question of what is ‘good enough’ for a client, given that they must be accepting work product that is often imperfect?

It’s clear that generative AI tools are not ‘good enough’ on their own. It’s also not clear what ‘good enough’ actually looks like. We cannot write off these tools until we know what ‘good enough’ is for the tools, and for humans.

[ Note: Anonymous works in the field of legal technology, but asked not to be named due to their role. Although it can be said that they are not at any of the organisations mentioned in the Stanford study. ]

If you would like to respond to the points raised here in a following think piece, please let Artificial Lawyer know.

From this site’s perspective this piece raises plenty of questions, such as:

  • Errors will creep into legal work, but how much supervisory effort to correct those mistakes is acceptable? As mentioned in last week’s AL piece on this subject, if juniors using generativeAI (or with no technology) are making a very high level of mistakes then a law firm would grind to a halt, as senior lawyers could not be expected to double-check everything, continually, especially the base layer of key facts, for every piece of work. I.e. there would be no point in having those junior lawyers in the first place, if that were the situation. So, what is an acceptable level of supervision and at what level does it break the law firm model?
  • Lawyers are assumed to have ‘passed the test’ once they qualify, and certainly once they progress through level after level of associate life in a firm. But, beyond that there are no accuracy records. Everything is taken on trust. And maybe that’s a sensible system, as you can’t expect lawyers to sit the Bar exam, or something similar, every year to prove their professional accuracy. Moreover, one could say that firms sort this out themselves, letting go junior lawyers who are not good with details. Although, we have no scientific evidence this is so. There is no public data on this. Instead we assume that ‘good lawyers’ tend not to make mistakes and those who are senior make very few – and we live with that.
  • The challenge with genAI, as with a lot of legal tech, is that is makes us question and think again about how human lawyers do what they do, and what ‘good’ looks like. If we are to judge AI, as the author notes, then it seems fair to judge human accuracy too.
  • One other point is how these things blend together. A lawyer may make a mistake on their own, then add a mistake from a genAI system, which is then missed by someone more senior, and so this is fed into the final work product. Likewise, they may be on the ball, spot the genAI error on their own, and everything is fine. Or, in a more positive light, the genAI system correctly finds an answer that the junior lawyer would not have found on their own, this information gets ferried upwards and into the final work product and everyone is happy, including the client. All three scenarios are possible. The question is how frequent is each one?
  • To conclude, accuracy and AI will remain a central issue. Why? Because if we want genAI to become all it can be in the legal field then we have to figure out the above questions and then we will be on solid ground for truly changing how the legal world works.