BRIEFING: AI-related litigation set to get messier: lessons from major new report

Published on Sun, 29/03/2026 - 12:12

The International AI Safety Report 2026 provides a forensic assessment of AI and a wealth of information about the associated emerging risks. Reuben Vandercruyssen and Lydia Savill from Hogan Lovells highlight the key takeaways for risk professionals and explain the implications for UK litigation risk.

Unpredictability is the recurring theme of the International AI Safety Report 2026 – a comprehensive review of the latest scientific evidence regarding the capabilities and risks associated with general purpose AI.

The report finds that AI systems’ performance is “jagged”, with leading systems “still failing at some seemingly simple tasks”. Model behaviour, it warns, can be “hard to understand or predict”, making it “challenging to foresee or confidently rule out specific failures”.

Understanding where that unpredictability sits is key for risk professionals assessing their AI-related legal exposures.

Key dispute angles

1) What was promised – jagged capability and overstated claims

Uneven performance is a breeding ground for expectation disputes. If capability claims (accuracy, reliability, “near-human” support) encouraged reliance, they may sit at the heart of breach, warranty and misrepresentation claims.

Two practical points follow:

What was promised will matter more than what is technically possible. In tech contracts, courts are typically concerned with the hierarchy of obligations: if the deal contains outcome-focused performance standards or warranties, meeting minimum technical specifications or “industry practice” may not be enough if the promised result is not achieved. Pre-contract capability statements made in a procurement context (RFPs, pilots, presentations) can also become actionable.
The “everyone knows AI is probabilistic” defence is not a silver bullet. Generic caveats may carry limited weight where they conflict with specific performance representations, and risk allocation can be scrutinised as unreasonable under the Unfair Contract Terms Act. If capability assertions were made without reasonable grounds, the dispute can shift from “it didn’t quite deliver” to credibility – including the potential for fraud allegations in more extreme cases.

2) The evidential battleground – evaluation, benchmarks and methodology

The report highlights that pre-deployment performance tests of AI models do not reliably predict real-world performance.

Benchmark results are used to justify procurement and valuation – but courts tend to look past headline metrics and ask whether the supplier’s work was reasonably capable of achieving the stated purpose in the conditions of deployment (and, depending on the contract, whether there was a fitness-for-purpose commitment rather than a best-efforts narrative).

In practice, the fight will rarely be only about benchmark scores. It is about whether evaluation and validation were adequate, transparent and fit for purpose. Key questions that may be asked include:

Real-world testing – was there genuine user testing and validation against real data, real users and real operating conditions, rather than reliance on lab-style benchmarks?
Use-case alignment – were generic benchmarks treated as a proxy for a specific workflow without proper justification and feedback-driven testing?
Governance and disclosure – what did the supplier (and the deployer) know about unsupported uses, known limitations and failure modes, and what was actually said during procurement, diligence and sign-off?

3) Agents, autonomy and attribution

The report flags the surge in AI agents – systems designed to browse the internet and execute multi-step tasks with reduced human oversight. More autonomy means fewer natural points for human intervention, and a shorter window to catch and correct errors.

The litigation risk is not “AI consciousness”. It is that agents can act in ways which are difficult to reconstruct: what was real, what triggered what, and who (or what) caused the outcome.

The report also notes that attributing liability for harms caused by agents can be difficult, especially where it is hard to identify when and how failures occurred. That creates predictable evidential pressure points – including “black box” causation arguments – but uncertainty alone is unlikely to excuse a failure to meet ordinary standards of care.

Liability allocation – why this may get messier

The report cautions that existing liability frameworks may not be capable of adequately addressing AI-related harms.

It points to the practical difficulty that “harms can be difficult to trace to specific design choices”, and that “responsibility is distributed across model developers, application builders, deployers, and users”.

That is a counterweight to more optimistic domestic commentary – including UK Jurisdiction Taskforce work – which suggests that existing frameworks can address most AI harms. The report also keeps targeted adaptation of legal frameworks on the table, including clearer responsibility for generating or disseminating unauthorised or undisclosed synthetic content.

Practical takeaways for risk professionals

Prepare to stand behind capability claims in court – ensure marketing and governance statements can be supported by your testing and evaluation.
Contract for the evaluation gap – define performance carefully and include audit and incident-reporting obligations.
Think about attribution – especially for agents: logging, approval gates and clear human-in-the-loop triggers.
Stress-test governance under failure scenarios – the best evidence is often what your organisation did when things started to go wrong.
Sanity-check insurance coverage and notification triggers – AI incidents can cut across product liability, cyber, professional indemnity, general liability and commercial crime insurance policies. Know what cover you have and when to notify to avoid coverage disputes.

Conclusion: clear direction of travel

The report does not offer tidy answers, but it does point to a clear direction of travel for disputes.

As general-purpose AI is deployed more widely – and relied on in higher-stakes workflows – uneven capability, imperfect evaluation and rising autonomy are likely to generate more contested questions about what was promised, and where responsibility sits.

In that context, litigation risk and outcomes will turn on the quality of the evidence trail.

Reuben Vandercruyssen is a senior associate at Hogan Lovells, and Lydia Savill is a partner and co‑head of the firm’s Global Insurance Disputes practice.