AI Scribes in the ED: Do They Actually Work?
The research says AI scribes save time and reduce burnout. But the tools available today weren't built for emergency medicine — and the gap between promising data and practical reality is wider than the marketing would suggest.
If you're an ER doc in 2026, you've heard the pitch by now. Some company wants to sell you an AI scribe that will "transform your documentation workflow" and "let you focus on what matters." The marketing copy writes itself because the problem is real and everyone knows it. The documentation climate keeps changing, every week there's new stuff leadership wants you documenting. The 2023 E/M updated documentation/billing guidelines were supposed to help reduce the amount of irrelevant stuff you had to document. Whether you think it did or didn't is up to you, and you're right either way. And now everyone and their mother-in-law has a scribe they want to sell you. I do, too; I'm not going to lie to you.
But like me, you've been burned by health tech promises before. The EHR was supposed to save time (cue the frustrated, sarcastic laughs from everyone who worked in the pre-EMR era). Voice dictation helped but arguably contributed to the ever-growing list of BS you need to document to keep your job, and the thing still misspells in ways that make your notes occasionally look like they were written by a second grader (do you really proofread all your notes…every time???).
So when someone tells you an AI is going to write your notes for you, you're right to be skeptical. I was as well, and continue to be. In fact, building an AI scribe has made me even more critical of AI scribes in the ED. Is this thing really saving me time, or am I going from dictating notes that occasionally have dictation errors to proofreading long, drawn-out notes filled with hallucinations and material irrelevant to the chief complaint and the emergency medical conditions I'm evaluating for? What is being done with the data I'm feeding into this machine — do I actually trust this thing to not be mining my data? Then there's the big challenge that in my opinion no current scribe manages to pull off in the ED…does it write MDMs that reflect the care I provided, are helpful to future providers who read it, does it optimize billing, and does it defend me from a medmal standpoint if something goes sideways?
In other words…do AI scribes in the ED actually work?
I've spent a lot of time with this question — as a user, as a builder, and as someone who's followed the research over the past year or so as it's blown up. Here's my honest answer:
The Data Says: Yes…Kind of…With Caveats
Let's start with what the evidence actually shows, because there's real data now — not just whitepapers from vendors.
The biggest signal comes from The Permanente Medical Group, which rolled out ambient AI scribes across their system and tracked over 2.5 million patient encounters over one year. The results: an estimated 15,791 hours of physician time saved, 84% of docs reporting improved patient communication, and 82% reporting improved work satisfaction. Patients noticed too — 47% said their doctor spent less time looking at the computer [1]. That's not a pilot with 12 enthusiastic early adopters. That's a system-wide deployment with real numbers.
In the ED specifically, Preiksaitis et al. published the first major study of ambient AI scribes in emergency medicine. Among nearly 9,000 eligible encounters, AI-assisted documentation was associated with a 28% reduction in on-shift documentation time and 16% reduction in total EHR time [2]. That's meaningful.
A separate head-to-head comparison from Mayo Clinic's ED pitted AI scribes directly against human scribes across 710 encounters. Note quality scores were comparable for adult patients. The AI didn't embarrass itself next to a trained human — and that's honestly impressive given how young this technology is [3].
And the UCLA randomized trial published in NEJM AI tested two commercial products — Nabla and Microsoft DAX — across 238 physicians and over 72,000 encounters. Nabla reduced documentation time by roughly 10% compared to controls, with modest improvements in burnout and cognitive workload scores [4].
So yes. The technology works. It saves time (or at least did in those trials).
But…
Does the tech work? Yes. Does it show promise? You bet. Are the companies marketing these scribes being fully transparent about everything? Not even close.
Hallucinations are real. AI scribes fabricate information. Not often, but they do. This differs from a dictaphone, which simply misspells things but doesn't flat-out invent things. The reason comes down to a fundamental difference in what these tools are actually doing. A dictaphone is transcribing — it listens to what you said and tries to write it down. It can mishear you, but it can't put words in your mouth. An AI scribe is generating — it takes a transcript and produces new text: interpreting, reorganizing, summarizing, and writing prose that goes beyond what was literally said. That generative step is where fabrication enters the picture. Think of it this way: your "low risk chest pain MDM" dotphrase is completely deterministic. When you pull it into the note you get the same thing every single time. An AI scribe doesn't work like that — it's producing original text based on patterns it learned from its training data, which means it can handle a remarkable range of clinical contexts, but it can also fill in gaps with what it "expects" to be there rather than what actually happened. LLMs also have an adjustable "temperature" that controls how much creative latitude the model takes — set it too low and you get repetitive, rigid output; set it too high and you've got a model that says more nonsense than your last conscious sedation. But even a conservatively configured model can hallucinate, because the underlying issue isn't randomness — it's that the model is generating text, not retrieving it.
Published hallucination rates hover around 1–3% per data point [5], which sounds low until you remember that a 1% error rate across a busy ER shift means one or two notes per night might contain something the patient never said and you never did. In a comparative analysis of commercial scribe tools, error rates ranged from 12.2% to 24.4% per encounter when including omissions and additions alongside outright fabrications [6]. Documented examples include scribes fabricating diagnoses the patient doesn't have, inventing medication instructions that were never discussed, and misattributing statements between the patient and the physician. One study documented the AI replacing a procedure the physician actually mentioned with a more common one from its training data — the AI substituted what it "expected" to hear for what was actually said.
Not really a hallucination, but in my opinion just as big of an annoyance, is over-attribution to the subjective history. A patient tells you "I had this same pain last year and my neighbor said it sounded like a hernia" and the AI documents "Recurrent abdominal pain; patient reports history consistent with hernia per collateral." Now you've got a note that implies a hernia was diagnosed by a credible source and that this is a recurring problem — neither of which is true. That's not the patient's fault — they didn't go to med school, they just want answers — but it's the kind of thing that muddies the clinical picture and forces you to go back and clean it up. Take an AI scribe into an encounter with a patient with a pan-positive ROS, and you're better off just using regular old dictation.
In the emergency department, where you're managing 15 or more patients simultaneously and you might not review every note with a fine-tooth comb before signing, that's a real risk. And it's a different risk than not proofreading your dictated notes. It doesn't mean the technology is useless. In my opinion, it means the technology needs to get better, adapt to your style, and document in a way that makes review — and correction — fast and easy.
Adoption in the ED is still low. The Preiksaitis study found that only 11.2% of eligible ED encounters used the ambient AI. And usage was heavily skewed toward lower acuity encounters [2]. Meaning: the docs who tried it were cherry-picking the easy patients. The complex, the critically ill, the ones who actually generate the most documentation burden — weren't being captured.
Most of the scribes on the market were designed for the outpatient world, where you have one patient at a time, a structured conversation, and a predictable workflow. Many of them have added an "ED Template" or some other bolt-on feature, which simply attempts to reformat the standard outpatient SOAP note to an ED one, without considering that the documentation workflow and standards could not be more different than for primary care clinic. You're toggling between patients, getting interrupted by nurses and consultants, ducking out of a resus to look at the patient's CT angio, managing a different resuscitation in another room while your AI scribe is supposedly "listening" to a conversation that ended twenty minutes ago. The workflow assumptions baked into most ambient AI products don't map to how emergency medicine actually works.
Note quality varies. The Mayo study found that while AI and human scribe notes scored similarly for adults, AI notes scored lower for pediatric patients [3]. A Frontiers in Artificial Intelligence study found that AI-generated notes were more thorough and better organized than physician-authored notes, but also less succinct and prone to hallucination [7]. So you trade one problem (documentation burden) for a different one (editing burden). For some docs, that's still a net win. But it's not the "just hit record and forget about it" magic that some vendors sell.
The revenue question is complicated. There's a policy brief in npj Digital Medicine flagging that insurers are already responding to AI-assisted documentation by tightening audits and auto-downcoding E/M claims [8]. Cigna started automatically reducing many level 4–5 E/M claims by one level in October 2025 unless documentation clearly supports the higher complexity. If AI scribes drive more thorough documentation that captures legitimate complexity, that should be defensible. But if every note suddenly looks like a level 5, expect pushback from payers. This is an evolving space and anyone who tells you they have the billing angle fully figured out is lying.
Why Most of These Tools Weren't Built for You
Here's the thing that frustrates me most about the current landscape.
Ambient AI scribes have been called the fastest-adopted generative AI solution in healthcare [9]. The big players are all in — Nuance (Microsoft DAX), Abridge, Nabla, Suki, and others. Health systems are deploying these enterprise-wide to thousands of clinicians. Kaiser Permanente has over 7,000 physicians using the technology across 2.5 million encounters in a single year [1].
But the vast majority of this adoption, this research, and this development is happening in the ambulatory space. Primary care. Specialty clinics. Outpatient visits where you sit down with one patient, have a structured conversation, and move on.
Emergency medicine is fundamentally different:
We don't see one patient at a time. We're running a department. An "encounter" with a single patient might be spread across four separate conversations over three hours, interrupted by twelve other things. Most AI scribes assume a single continuous conversation per encounter. That's not how the ED works.
Our documentation needs are different. An outpatient note and an EM note serve different purposes, face different legal scrutiny, and have different billing structures. The MDM is the most important section in all of ED clinical documentation, and it's the hardest for AI to get right because it requires understanding why you made decisions, not just what you did.
And your medicolegal exposure is higher. We all know this already, but it bears repeating: EM docs get sued more than almost any other specialty. Your note isn't just a clinical document — it's your defense. An AI hallucination in a primary care note about a wellness visit is unlikely to matter. An AI hallucination in an EM note about a chest pain rule-out could end your career.
Another reason EM docs don't get shown as much love in this developing space is that we're a small market share — EM docs are about 4–5% of all US physicians, versus roughly 24–48% being primary care (depending on how you define primary care). You think Microsoft is going to go through the enormous hassle of building a scribe from the ground up for the difficulties and nuance of a market that's one-fifth the size of a much easier market? Fat chance.
What an EM-Specific AI Scribe Needs to Get Right
So where does this leave us? AI scribes work — the data is clear on that. But the tools that exist today were largely built for a workflow that isn't yours. The question isn't whether AI can help with EM documentation. It's whether anyone has built the right version of it.
Here's why that question matters more than the skeptics think: physicians who actually stick with AI scribes see real, measurable improvements in their quality of life. One multicenter study across six health systems found that consistent AI scribe users had 74% lower odds of experiencing burnout compared to baseline [10]. A separate study documented significant reductions in cognitive task load and after-hours charting time [11]. These aren't marginal gains. For a specialty where more than half of us are burned out, and where the documentation burden is a primary driver, that's a lifeline.
The problem is that those numbers come from physicians who stuck with the tools long enough to get comfortable — and in the ED, most don't. The 11.2% adoption rate [2] isn't a reflection of the technology's ceiling. It's a reflection of how poorly the current products fit the EM workflow. Docs try them, hit the wall of hallucinations and generic MDMs and bloated notes, and go back to dictation. The positive outcomes data represents the survivors, not the population.
I wrote about this gap in detail — the specific failure modes that make current tools fall short in the ED. But to summarize what an EM-specific scribe needs to get right:
It needs to earn trust through accuracy, not just speed. If the physician's review process feels like a forensic audit rather than a quick scan, the tool has failed regardless of how fast it generates the draft. That means the system needs to know what it doesn't know — omitting uncertain information rather than guessing — and it needs to be engineered for the acoustic reality of a busy department, not a quiet clinic room.
It needs clinical intelligence about what belongs in the note and what doesn't. Patient speculation isn't the same as clinical history. A nurse's hallway update about a different patient isn't documentation for this one. The system has to make those distinctions or the physician ends up editing more than they would have written.
And the MDM — the section that carries the billing weight and the medicolegal exposure — can't be an afterthought. It needs to reflect the physician's actual clinical reasoning: what was considered, what was ruled out and why, what decision tools were applied, what the risk stratification looked like. An MDM that reads like a generic template isn't worth the time it takes to review it.
None of this is impossible. But it requires building for emergency medicine from the ground up, not retrofitting an outpatient product and calling it done.
The Bottom Line
AI scribes in the ED work. Not perfectly. Not for every patient. Certainly not without physician oversight. But the data supports real time savings, real improvements in satisfaction, and real reductions in the documentation burden that's been crushing our specialty.
The technology will get better. The hallucination rates will drop. The ED-specific workflows will improve.
My advice: be an early adopter, but be a skeptical one. Try the tools. Read your notes before you sign them (insert mandatory "you should be doing this regardless"). Push vendors on accuracy, on hallucination rates, on how their product handles the chaos of the ED. And demand that the tools you use were built by people who actually understand what your job looks like.
Because the documentation burden isn't going to fix itself. I'm sorry to say it but we're not going back to paper charts, or even the spirit of paper charts. And you deserve a solution that was designed for the reality you work in — not a polished outpatient product with "emergency medicine" slapped on the marketing page.
References
- The Permanente Medical Group. Ambient Artificial Intelligence Scribes: Learnings After 1 Year and Over 2.5 Million Uses. NEJM Catalyst. 2025.
- Preiksaitis C, Alvarez A, Winkel M, et al. Ambient Artificial Intelligence Scribe Adoption and Documentation Time in the Emergency Department. Annals of Emergency Medicine. 2026.
- Morey J, Jones D, Walker L, et al. Ambient Artificial Intelligence Versus Human Scribes in the Emergency Department. Annals of Emergency Medicine. 2025.
- Tierney AA, Garimella G, et al. Ambient AI Scribes in Clinical Practice: A Randomized Trial. NEJM AI. 2025.
- Niu B, et al. A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation. npj Digital Medicine. 2025.
- Arko IV L, Hudelson C, Kumar J, et al. Documenting Care With AI: A Comparative Analysis of Commercial Scribe Tools. Studies in Health Technology and Informatics. 2025.
- Nguyen BL, et al. Assessing the Quality of AI-Generated Clinical Notes: Validated Evaluation of a Large Language Model Ambient Scribe. Frontiers in Artificial Intelligence. 2025.
- Li E, et al. Policy Brief: Ambient AI Scribes and the Coding Arms Race. npj Digital Medicine. 2025.
- Peterson Health Technology Institute. Adoption of AI in Healthcare Delivery Systems: Early Applications & Impacts. 2025.
- Olson KD, Meeker D, Troup M, et al. Use of Ambient AI Scribes to Reduce Administrative Burden and Professional Burnout. JAMA Network Open. 2025.
- Schneider KR, Swann-Thomsen HE, Ribbens TG, et al. The Impact of Artificial Intelligence Scribes on Physician and Advanced Practice Provider Cognitive Load and Well-Being. Journal of the American Medical Informatics Association. 2026.
Sampson is an AI scribe built specifically for emergency medicine — by an ER doc, for ER docs. Not a general-purpose tool. Not an outpatient product with an EM template bolted on. The real thing. Try it.