Beyond the Algorithm: Rethinking Liability and Trust in AI Radiology

Artificial intelligence is maturing from proof-of-concept prototypes to daily clinical practice. Nowhere is this evolution more visible than in AI radiology, where algorithms interpret X-rays, CT, and MRI images alongside, and sometimes ahead of, human readers. Yet as technical performance improves, new questions emerge about accountability when the machine and the physician disagree. A fresh randomized trial in NEJM AI adds empirical heft to these concerns and provides a springboard for strategic reflection on how innovators such as AZmed should navigate the shifting medico-legal landscape.¹

‍

What the new evidence shows

In the study by Bernstein et al. (NEJM AI, Vol 2, No 6, May 22 2025), 1,334 U.S. adults read vignettes in which a radiologist missed either a brain bleed or a lung cancer. Five experimental arms varied whether an algorithm was also used and whether its output matched the radiologist. When the AI in radiology flagged the abnormality that the human missed (“AI disagree”), 72.9% of jurors judged the radiologist liable for the brain bleed versus 50.0% when the system also missed it (“AI agree”). The pattern held for cancer (78.7% vs 63.5%). Importantly, providing error metrics softened judgments: disclosing a 50% false-discovery rate cut liability for brain-bleed cases from 72.9% to 48.8%.

Key takeaway: disagreement between human and algorithm amplifies perceived negligence, but transparent performance data can restore balance.

‍

Implications for clinical practice

These findings reverberate well beyond the courtroom. They suggest that AI for radiology introduces a “second observer effect”: once an algorithm is in the loop, clinicians are evaluated against a higher technological bar. Radiologists who silently over-rule a machine may appear reckless even when their clinical judgment is correct. Conversely, blind trust in software invites its own hazards. The optimal path is documented reasoning, recording why an imaging finding was accepted or rejected, and systematic disclosure of algorithmic accuracy.

From a risk-management standpoint, healthcare organizations should:

Mandate structured justification fields in reporting systems whenever the radiologist overrides AI advice.
Log algorithm confidence and known error rates in the picture archiving and communication system (PACS) so that context follows the image through the enterprise.
Provide jurors and patients with calibrated information on the uncertainties inherent to both human and machine interpretation, mirroring the “FDR/FOR” data that reduced liability in the trial.

‍

A fast-moving regulatory backdrop

The legal nuance lands amid explosive regulatory activity. As of February 2025, the U.S. FDA had cleared 758 radiology-focused AI/ML products, by far the largest share of any specialty.² The agency’s January 2025 draft guidance on “AI-Enabled Device Software Functions” signals closer scrutiny of post-market performance monitoring and emphasizes the need for human oversight.³

Meanwhile, generative models are carving a different niche. Large language models now compose preliminary impressions and patient instructions, reducing documentation time but largely sidestepping diagnostic claims, and, therefore, heavier regulatory burdens.⁴ For innovators, the message is clear: the bar for autonomous diagnostic algorithms is rising, while supportive, non-diagnostic tools enjoy a shorter path to adoption.

‍

The AZmed vantage point

Rayvolve®, the company’s flagship AI suite, processes tens of thousands of X-rays each day, automatically triaging studies by inserting bounding boxes around suspicious findings and flagging urgent cases for the radiologist’s immediate review.

From an AZmed perspective, the new NEJM AI data underscore two priorities:

Radical transparency. Beyond publishing accuracy numbers, AZmed should embed real-time, case-specific confidence scores and highlight known failure modes (e.g., overlap with plaster casts or atypical pediatric ossification centers).
Human-machine dialogue. Building a lightweight “reason for rejection” module will allow users to annotate why they disregarded an alert, generating forensic breadcrumbs that de-risk litigation and feed continuous learning loops.

‍

Strategic recommendations

Codify accountability: Adopt a shared-decision protocol where the radiologist, the algorithm, and, when appropriate, the multidisciplinary team each sign digital footprints.
Educate stakeholders: Use grand-rounds, juror simulations, and patient-facing infographics to demystify algorithmic statistics. The 24-point drop in perceived liability when FOR data were shown (brain bleed scenario) is a compelling argument for such outreach.
Monitor in production: Pair every AI model with a “shadow audit” that flags distribution drift; regulators increasingly expect this continuous vigilance.³
Leverage generative AI responsibly: Deploy large language models for report drafting and communication, but fence them off from unsupervised diagnostic claims to stay within current regulatory comfort zones.⁴

‍

Conclusion

The future of AI radiology is not merely smarter algorithms but smarter governance. The Bernstein trial makes plain that public perception, and by extension juror opinion, hinges on how harmoniously human expertise and machine intelligence mesh. For AZmed, the mandate is twofold: keep pushing technical boundaries while embedding design features that make accountability explicit and understanding intuitive. Do this well, and AI in radiology will not only accelerate diagnoses but also elevate trust across the care continuum.

‍

Read the full study here.

‍