Blog

May 4, 2026

Trauma X-ray Review in Small ERs

In the emergency department, most suspected fractures are assessed with plain radiography. This test is fast, low-cost, and readily available at initial presentation. However, errors often occur in the interpretation of these radiographs, particularly when fractures are missed because the clinical presentation and the interpretation of the radiograph do not match. This diagnostic discrepancy is one reason emergency X-ray interpretation remains vulnerable in high-volume settings.

These discrepancies often occur with subtle fractures on radiographs; these fractures are typically at higher risk of being missed. Non-displaced fractures, poorly projected fractures, and fractures obscured by anatomic overlap all increase the likelihood of an error occurring.

Numerous publications have consistently reported the same reasons for missed fractures on radiographs: perceptual errors, incomplete projections, anatomic overlap, distracting background pathology, and failure to adjust clinically when the radiographic findings do not align with the clinical picture.¹ ² ³

Missed fractures can occur in any trauma imaging pathway; however, the risk is higher when trauma radiographs are interpreted by a general radiologist, emergency physician, junior reader, or trauma review team composed of varying levels of expertise than when interpreted by a musculoskeletal subspecialty radiologist at the site of care.

At smaller institutions, the initial reading of a trauma radiograph is typically performed in the emergency department due to time constraints, limited patient history, and variable levels of expertise with little or no immediate musculoskeletal subspecialty support.

The central question is whether artificial intelligence can be used to help mitigate the number of missed fractures by improving the review of trauma X-rays or standardizing trauma X-ray interpretation through the reduction of variation in the workflow process and levels of expertise.¹ ⁴ ¹⁰

The current guidance for AI in radiology is that, before use in a given clinical setting, AI should be examined for its intended use, fit with the workflow process, performance in testing conditions that match actual use, and whether human supervision would still be required.

When evaluating an AI model for emergency radiology, rather than looking at the performance of an AI model alone, the real test is whether the use of AI in X-ray interpretation supports clinicians where missed findings are most likely to occur. In this context, the value of emergency radiology AI depends on whether it provides clinical support for X-ray interpretation in the settings where errors are most likely.⁴

‍

The complexity and difficulty in detecting fractures on X-ray imaging

Fracture detection on X-rays is not equally difficult across all cases. When looking at long-bone fractures, the decision of whether or not a fracture is present is usually easy to identify.

The degree of difficulty may increase when an X-ray shows subtle cortical disruption, a small avulsion-type fracture, indirect signs without a visible fracture line, a non-displaced pelvic fracture, significant overlap of multiple structures, an incomplete exam, e.g., not obtaining all appropriate views, or cognitive errors due to preconceived notions of normal anatomy or perceived mental reinforcement from viewing apparently normal images.

These factors all lead to missed fractures in emergency radiology.¹ ²

Fractures of certain anatomical structures have a higher chance of being missed than other fracture types. Occult scaphoid fractures are a common clinical example, which may be missed on initial X-ray and later appear as bone edema on MRI.

Other common examples of missed fractures include radial head fractures that may be identified as a posterior fat-pad sign only with no direct evidence of fracture on plain film, Lisfranc joint injuries due to absence of adequate weight-bearing radiographs or lack of visual signs of diastasis, and pelvic ring injuries in elderly patients considered to have osteopenia or when bowel gas obscures potential fracture.

All these examples illustrate the challenges in detecting small, indirect fractures during an emergency, especially when there is incomplete clinical history and a high workload.

A Dutch study of 25,957 fractures in the emergency department revealed that 289 fractures were missed initially and that the rate of missed fractures varied by age and location.

The management of fractures was also affected by delayed diagnosis, showing that patterns of missed fractures are not random and that certain parts of the body have a greater risk of being missed.³

Although trauma case review is primarily concerned with identifying fractures, the reviewer must also decide if additional imaging is necessary and whether CT or MRI is warranted, even when the radiograph is negative.

This decision will ultimately be determined by clinical judgment, which is based on the context in which the radiograph is interpreted and not by simply labeling the image. In trauma imaging, radiographic interpretation must therefore be connected to the clinical picture.¹ ⁴

‍

Where variability enters first-pass review

Most emergency departments have their first review, or first pass, of trauma radiographs completed either by a general radiologist, an emergency physician, or a junior doctor, with a subspecialist providing confirmation often a few hours after the first pass.

In many instances, patients have been managed before subspecialist confirmation of the initial read, which greatly narrows the window for timely intervention if a subtle fracture is missed initially.¹ ⁴

A number of factors increase variation among first-pass reviews, including but not limited to reader experience, the timing of the shift, and the volume of work in the radiology department.

When a reader has a routine caseload, they may be less thorough with atypical injuries, and the quality of the triage note will also affect how injury searches are performed by that reviewer.

If there is a lack of documentation to support the clinical history of an injury, it is challenging for that reviewer to identify a potential area of injury, thus increasing the likelihood of missing an injury that is not in the initial area of focus.

These factors have been identified as contributing causes to missed fractures in both community and academic emergency departments. They also contribute to variation in the emergency imaging workflow.¹ ²

Compared with larger tertiary centers that have continuous musculoskeletal coverage, smaller emergency departments have a higher degree of variability.

In smaller or more general emergency departments, the initial radiologic interpretation will often be the only interpretation performed prior to the provider making the initial clinical decision.

An assistive tool does not need to outperform the musculoskeletal subspecialist for it to be of benefit in these settings. Instead, it just has to be able to reasonably improve the initial interpretation enough to identify an overlooked injury or prompt a second-look review of the studies that were cleared too quickly.¹ ⁴ ⁶

‍

Evidence supporting the use of AI for fracture detection in trauma radiographs

Multiple systems and studies support the use of artificial intelligence for reviewing trauma radiographs. In a study performed in 2020 by Jones et al., a deep learning model for finding fractures in musculoskeletal radiography performed well.

With missed fractures being relatively frequent, there should be continued efforts to improve detection.⁶

There continues to be a growing body of literature on this subject, including systematic reviews and meta-analyses. In 2022, a meta-analysis in Radiology compared the diagnostic performance of AI systems to that of clinicians in numerous studies on trauma radiograph review.

This analysis found that AI diagnostic performance was comparable to clinicians in the majority of the studies. However, bias was prevalent among a majority of the studies, indicating that additional external validation was needed.

Although the technical promise of using AI-assisted X-ray review is apparent, AI-assisted X-ray review should not be treated as validated for clinical use until it has been demonstrated in real-world settings.⁷

The relevant issue regarding the use of AI-assisted radiograph interpretation is how AI assistance affects clinician performance while the clinician actively reviews the radiograph versus how algorithms perform in isolation from the clinician.

A systematic review and meta-analysis of the literature published in 2026 showed that AI assistance improved pooled fracture detection sensitivity from 77% to 87%, pooled specificity from 88% to 92%, and summary AUC from 0.90 to 0.95.

In spite of the high degree of study heterogeneity and high risk of bias across many of the studies included in the meta-analysis, the overall direction of the effect showed that clinicians assisted by AI were more accurate than clinicians without AI assistance.⁸

Not all external evidence supports using AI to assist in evaluating musculoskeletal radiographs in order to detect fractures.

In a 2025 publication, it was reported that an AI model demonstrated lower sensitivity and specificity when evaluating pelvic, hip, and extremity fractures in a clinical population, with CT as the reference standard, when compared to the radiologist.

Performing well on the ML development dataset does not guarantee that the AI will have comparable performance across all scanner fleets, patient populations, or fracture types.

This is one of the reasons that multi-organization recommendations suggest local validation of AI versus utilizing published benchmarks when applying to various health care delivery systems.⁴ ⁹

‍

Why reader-assistance studies matter

Reader-assistance studies provide a more robust assessment of AI tools compared to performance benchmarks alone. AUC or sensitivity alone, as calculated in a held-out test set, demonstrates the model's discriminative capacity.

However, it does not demonstrate how the model changes a clinician's behavior when interpreting studies. The practical question is whether using an AI model would alter a clinician's interpretation of the study results, e.g., would they stop to re-evaluate a subtle finding? Would they flag this case as a follow-up?

This question cannot be answered with algorithm benchmarking, and therefore, reader-assistance studies are needed to fully assess how clinician behavior changes when using a tool. This is why reader assistance is central to evaluating X-ray interpretation AI.⁴ ⁸ ¹⁰

Reader-assistance studies provide data in addition to the algorithm's performance measures to identify differences among reader types impacted by AI.

A musculoskeletal radiologist specializing in reading extremity X-rays, who reads cases of extremity radiographs daily, is typically at a pattern-recognition level where there is little potential for AI to further improve their interpretation of these studies.

In contrast, an emergency physician or general radiologist who is reading a trauma series outside their main subspecialty may have less pattern-recognition experience and could potentially gain more benefit from an AI tool through highlighting a cortical irregularity or raising suspicion regarding an area of interest that they may not have prioritized otherwise.

Therefore, the clinical relevance of reader-assistance results, which demonstrate greater improvements in general radiology readers and non-MSK radiologists than in subspecialists, provides greater support for the frontline trauma image interpretation process than the performance measures of the model.

As you consider an AI for X-ray interpretation, rather than asking “what is the performance of the model alone?” you should be asking “how does the clinician perform with the model?”¹⁰

‍

AZmed evidence base

AZmed’s largest validation study for the Rayvolve AI Suite evaluated 258,373 radiographs taken at 100 institutions from 26 countries across 5 continents between January 2022 and April 2025.

For AZtrauma, AUC performance was reported as 98.3%, sensitivity as 97.4%, and specificity as 96.4%.

The size and geographic diversity of this dataset provide robust evidence for technical generalizability across different equipment, patient groups, and imaging protocols.⁵

Also, AZmed performed an external prospective registry study at the Technical University of Munich that represents a higher bar by providing an external, direct, head-to-head comparison of 3 commercial emergency imaging AI systems using real-world trauma image review data from 1,037 adult patients, 2,926 radiographs, and 22 separate anatomical regions of interest.

In this study, Rayvolve had the highest AUC of 84.88% and achieved the highest sensitivity for all fractures of 79.48%. One other system achieved somewhat higher specificity than Rayvolve.

Also, variability was seen in terms of performance between fractures, dislocations, and joint effusions. The expected variability of performance in different injury categories is due to varying image characteristics associated with the injuries, fracture morphologies, and miss rates for those fractures.¹¹

AZmed completed the most clinically relevant study addressing the question posed here. In a 2024 Academic Radiology reader study evaluating a deep learning fracture detection tool using 2,626 de-identified radiographs of extremities, 24 readers, including 8 emergency physicians, 8 non-musculoskeletal radiologists, and 8 musculoskeletal radiologists, read the radiographs with and without AI assistance.

AI assistance resulted in improvements in overall sensitivity from 86.5% to 95.5% and average reader accuracy, while average reading time decreased by 27%.

AI assistance was found to have more of an impact upon diagnosis by emergency physicians and non-musculoskeletal radiologists, compared with musculoskeletal radiologists.¹⁰

One of the main findings from this study was the variation between the different reader groups. AI assistance provided its largest effect among the reader groups most likely to perform first-pass trauma assessments in mixed-expertise settings.

Additionally, the increase in accuracy was not uniform across all readers; rather, the increase from the introduction of AI-assisted reading tools appears to be due primarily to the readers with the highest chance of delivering an incorrect diagnosis before AI intervention.

In 2025, AZmed published an additional reader study in pediatric populations that demonstrated the same pattern of improvement in accuracy. Reader assessments on a total of 3,016 pediatric musculoskeletal radiographs collected from 4 U.S.-based imaging centers were conducted with and without the use of AI assistance.

Reader accuracy was increased across all groups of readers, and reader completion time was decreased by 26.1% compared to baseline measurements.

Due to different physical development stages, challenges to fracture identification can be vastly different when comparing pediatric and adult imaging, thus supporting the idea that AI-assisted image evaluations may provide better results across clinical populations despite variation in anatomical development due to age.¹²

‍

Limitations of AI

As indicated above, there are instances when fractures will remain undetectable after multiple careful reads of the initial radiograph.

For example, patients with scaphoid fractures, early stress fractures, non-displaced radial head fractures, or occult hip fractures may show no signs of their injury on the initial radiograph.

If a practitioner has a high suspicion of a fracture and the radiograph appears negative, the appropriate action is to obtain CT, MRI, or repeat imaging, not to provide reassurance based on the output of an algorithm. This is a key limit of AI support for trauma X-ray review.¹ ⁴ ⁹

Other limitations to the detection of injuries by both radiologic professionals and AI systems include failure to provide lateral projections in addition to an adequate anteroposterior projection, incomplete shoulder series, and inadequate image quality due to motion of the patient during the study.

Therefore, in order for both radiologic professionals and AI systems to receive accurate diagnostic and clinical information from plain radiographic images, adequate imaging is required.

Adequate imaging depends on 3 factors: 1) the quality of the X-ray study made by the radiologic technologist, 2) the clinical condition of the patient, and 3) the quality of the radiological equipment used to perform the examination.

Increased sensitivity may create increased false-positive rates, additional reviews, and changes in reading patterns.

All of these considerations should be taken into account when making the decision to use AI technology to provide safe and effective AI support for X-ray review and safer X-ray interpretation.

The clinical impact will vary by the setting in which it will be used. A higher rate of false-positive interpretations will create increased resource requirements to provide appropriate care within the emergency department’s X-ray review process.⁴ ⁸ ¹¹

Clinical guidelines for the use of AI technology contain key requirements: a clinically valid test based on a well-defined intended-use definition, comprehensive and qualified local validation, regular human supervision, and ongoing post-deployment monitoring.

A significant evidence base exists to support the validity of AI technology for clinical use based on large multi-institutional retrospective study databases.

Despite this, large retrospective databases alone are not sufficient to provide conclusive support for the technical validity of AI technologies, and therefore, further studies are necessary, including prospective comparison studies and controlled reader studies.

Ultimately, no AI technology can totally and accurately predict or account for the influence of local variables on the clinical outcome of injuries detected by X-ray evaluations.⁴

‍

Conclusion

AI technology provides significant value for improving X-ray evaluations of trauma in small or mixed-expertise emergency departments when used as a support tool to improve clinical decision-making along with clinical judgment, rather than completely replacing it. This is especially relevant for small ER trauma imaging and radiology review in trauma settings where immediate subspecialty support is limited.

There is sufficient evidence to support this role, including studies of standalone AI performance, systematic reviews of AI-assisted fracture detection, and controlled reader studies among both adult and pediatric populations.

This evidence supports the conclusion that AI technology will reduce missed fractures most effectively in the clinical settings in which first-pass review is most exposed to missed findings: non-subspecialist readers, time-pressured workflows, and environments without immediate subspecialty coverage. In these conditions, non-specialist trauma X-ray review may benefit most from AI support for trauma X-ray review.

The results of the reader study from AZmed, the Munich prospective comparison, and the overall body of meta-analytic evidence provide strong clinical evidence that the clinical value of AI is most meaningful when it improves detection among the readers most likely to miss an injury.

The strength of AI technology in enhancing clinical evaluation of injuries lies primarily in the detection of minor injuries that may not be easily identified on traditional X-rays, such as a cortical step-off, a posterior fat-pad sign, or a scaphoid waist hairline fracture.⁵ ⁸ ¹⁰ ¹¹

‍

References

[1] Pinto A, Berritto D, Russo A, et al. Traumatic fractures in adults: missed diagnosis on plain radiographs in the Emergency Department. Acta Biomed. 2018. https://pmc.ncbi.nlm.nih.gov/articles/PMC6179080/

[2] Wei CJ, Tsai WC, Tiu CM, et al. Systematic analysis of missed extremity fractures in emergency radiology. Acta Radiol. 2006. https://journals.sagepub.com/doi/10.1080/02841850600806340

[3] Mattijssen-Horstink L, Langeraar JJ, Mauritz GJ, et al. Radiologic discrepancies in diagnosis of fractures in a Dutch teaching emergency department: a retrospective analysis. Scand J Trauma Resusc Emerg Med. 2020. https://sjtrem.biomedcentral.com/articles/10.1186/s13049-020-00727-8

[4] Brady AP, Allen B, Chong J, et al. Developing, purchasing, implementing and monitoring AI tools in radiology: practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. 2024. https://insightsimaging.springeropen.com/articles/10.1186/s13244-023-01541-3

[5] Cohen E, Ouertani MS, Beaumel P, et al. Performance of a complete AI radiographic suite across 258,373 X-rays from 26 countries: A worldwide evaluation. Radiography. 2026. https://www.sciencedirect.com/science/article/abs/pii/S1078817426000374

[6] Jones RM, Sharma A, Hotchkiss R, et al. Assessment of a deep-learning system for fracture detection in musculoskeletal radiographs. npj Digital Medicine. 2020. https://www.nature.com/articles/s41746-020-00352-w

[7] Kuo RYL, Harrison C, Curran TA, et al. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis. Radiology. 2022. https://pmc.ncbi.nlm.nih.gov/articles/PMC9270679/

[8] Qin H, Ding Y, Ju J, et al. Enhanced fracture detection on radiographs with AI assistance for clinicians: a systematic review and meta-analysis. Annals of Medicine. 2026. https://pmc.ncbi.nlm.nih.gov/articles/PMC12795274/

[9] Delabrousse É, Marty M, Gervaise A, et al. Comparison between artificial intelligence solution and radiologist for the detection of pelvic, hip and extremity fractures on radiographs in adult using CT as standard of reference. 2025. https://www.sciencedirect.com/science/article/pii/S2211568424001979

[10] Fu T, Viswanathan V, Attia A, et al. Assessing the Potential of a Deep Learning Tool to Improve Fracture Detection by Radiologists and Emergency Physicians on Extremity Radiographs. Academic Radiology. 2024. https://www.sciencedirect.com/science/article/pii/S1076633223005950

[11] Luiken I, Lemke T, Komenda A, et al. Evaluation of commercial AI algorithms for the detection of fractures, effusions, and dislocations on real-world clinical data: A prospective registry study. Radiography. 2025. https://pubmed.ncbi.nlm.nih.gov/41066829/

[12] Raj S, Sadegi B, Simon J, et al. Enhancing Pediatric Fracture Detection: Multicenter Evaluation of a Deep Learning AI Model and Its Impact on Radiologist Performance. Academic Radiology. 2025. https://www.sciencedirect.com/science/article/pii/S1076633225010748

‍

Regulatory information

US - Medical device Class II according to the 510K clearances. Rayvolve: is a computer-assisted detection and diagnosis (CAD) software device to assist radiologists and emergency physicians in detecting fractures during the review of radiographs of the musculoskeletal system. Rayvolve is indicated for adult and pediatric population (≥ 2 years).

Rayvolve PTX/PE: is a radiological computer-assisted triage and notification software that analyzes chest x-ray images of patients 18 years of age or older for the presence of pre-specified suspected critical findings (pleural effusion and/or pneumothorax). Rayvolve LN: is a computer-aided detection software device to assist radiologists to identify and mark regions in relation to suspected pulmonary nodules from 6 to 30mm size of patients of 18 years of age or older

EU - Rayvolve: Medical Device Class IIa in Europe (CE 2797) in compliance with the Medical Device Regulation (2017/745). Rayvolve is a computer-aided diagnosis tool, intended to help radiologists and emergency physicians to detect and localize abnormalities on standard X-rays.

Caution: The data mentioned are sourced from internal documents, internal studies and literature reviews. It is for distribution to Health Care Professionals only and should not be relied upon by any other persons. Carefully read the instructions for use before use. Please refer to our Privacy policy on our website For more information, please contact contact@azmed.co.

AZmed 10 rue d’Uzès, 75002 Paris - www.azmed.co - RCS Laval B 841 673 601

‍