Crosby Starts Contract Benchmark, Launches Agent Research Group

NewMod law firm Crosby has launched ‘Multi-turn Negotiation Bench’, a contract negotiation benchmark for AI outputs. They are also rolling out Crosby Intelligence, a research arm focused on legal agents.

First the benchmark, also called Redline: Crosby and micro1 will be publishing a benchmark ‘evaluating how frontier models perform the workflow of senior commercial lawyers in live settings’.

‘This benchmark measures contract negotiation as a sequence of judgment calls rather than a collection of isolated clause edits. Each turn requires the attorney or model to decide what matters, what to leave alone, how hard to push, and how to adapt as the negotiation evolves,’ they added.

They explained that this is because ‘contract redlining is not a single-turn drafting task: it requires understanding deal context, each party’s commercial leverage, making legally sound edits, anticipating the counterparty’s response, and preserving momentum toward deal execution’.

So far, their findings are that ChatGPT 5.5 performed the best overall on the benchmark with a score of 50.5%, trailed by Gemini 3.5 Flash at 45.1% and Claude Opus 4.8 at 44.4%. Claude Fable 5’s overall score was 47.3% – although access was soon cut off, so they want to come back to that one to do multiple tests with Fable in the future to get a more balanced result.

Interestingly, and despite what some may think, all the models came in not that far apart from each other – at least relatively. Crosby also found that for now human lawyers were still better at finding ‘new routes to resolution’ in a negotiation, whereas AI tools tended to get stuck with their initial positions – which to AL suggests that ‘the judgment layer’ is still missing from such tools. So, lawyers can breath a sigh of relief.

That said, Crosby intentionally employs skilled lawyers to handle this very element. So, traditional firms may not be able to feel so relieved about the types of contract work that they handle at scale and at a fixed fee – and at a high velocity.

Meanwhile, Crosby Intelligence is a team of researchers, applied AI engineers and technical lawyers ‘building the agentic attorneys for Crosby’s law firm’.

‘Crosby Intelligence will release benchmarks for legal domains where judgment matters most. Its focus is building agents to simulate negotiations, and moving time-to-signature from weeks to hours’, they said.

And this not only makes sense, but is likely to bear fruit. Crosby works on fixed fees, with AI at its core. In such an environment the more a workflow can be automated – as long as it’s got the judgment layer to make sure it’s accurate – then the better.

Traditional law firms – despite the desire of many legal tech providers – remain not that engaged with agents. And this is for the same old reasons as ever – time, time, and time again. Crosby doesn’t have that problem, no more than Henry Ford had a problem with making his factories more efficient.

They added that over the coming months, Crosby Intelligence will continue publishing benchmarks on the field’s hardest open problems, host a monthly speaker series with leading scholars and practitioners, and in conjunction with OpenAI, fund two fellows pursuing frontier research.

New York-based Crosby has raised over $85 million in funding from Sequoia Capital, Index Ventures, Lux Capital, Elad Gil, and Bain Capital Ventures among others.

Is this a big deal?

Well, if you thought that NewMods such as Crosby would be happy to just sit quietly in their niche then you’d be wrong. They’re marking out their territory in the contract space with both ‘value-adds’ such as the benchmark, and also focusing more on agentic flows. And that cuts right across the market.

And as AL notes above, a NewMod has nothing to fear from automating workflows. In fact, that’s what it wants and needs to do in order to make its business model really fly. If you are on fixed fees and heavily leveraging AI, then the more you automate – with quality control and human judgement built in – then it only improves your profit margins.

Equally, the opposite is true: fixed fees + inefficiency = decreasing profits.

The mention of OpenAI is interesting as well, given the LLM-maker is moving into the legal vertical. Although, it’s a huge area and there’s plenty of room for lots of players. Moreover, the most efficient may well be the ones who come out on top in the long run.

More about Crosby here.

Discover more from Artificial Lawyer

Subscribe to get the latest posts sent to your email.

Share this:

Discover more from Artificial Lawyer

Discover more from Artificial Lawyer