A med spa owner called us a few weeks ago, half excited and half panicked. A vendor had just demoed an AI phone system that, in the demo, sounded incredible. Smooth voice, instant answers, booked a fake appointment without a hitch. She was about to sign. Then she asked us one question that saved her a year of regret: how do I know it will actually work on my real patients, not the script in the demo?
That is the right question, and almost nobody asks it. Which is strange, because this same week the largest health systems in the country are asking exactly that, loudly and with budgets behind it.
Even the giants refuse to trust AI on faith
On the clinical side, the news this June makes the point better than we could. Modern Healthcare reported that Mass General Brigham, one of the most respected academic health systems in the United States, unveiled a new benchmark built to measure how healthcare AI models actually perform. In plain English, they built a test to grade AI, because taking a vendor's word for it was not good enough.
At the same time, Fierce Healthcare reported that Yale New Haven Health is rolling out new AI radiology tools across its network after Microsoft announced it is sunsetting an older product many hospitals relied on. Big systems are not just adopting AI, they are constantly testing it, switching it, and grading it.
Sit with that for a second. Organizations with research departments, compliance teams and money to burn still assume AI has to prove itself before they rely on it. They build benchmarks precisely because AI has a famous flaw: it can be confidently, fluently wrong. And in healthcare, a confident wrong answer is not a cute bug. It is a real problem.
Now look at how AI gets sold to a small practice. A polished demo, a friendly rep, a monthly fee, and a quiet assumption that you will just trust it. No benchmark. No proof. No test on your real patients. The asymmetry is almost funny. The people with the most to lose test the most, and the people being sold the most test the least.
The AI in the headlines is not the AI being pitched to you
First, clear up a confusion the sales pitches love to blur. The AI in hospital headlines reads scans, flags tumors and analyzes records under heavy clinical oversight. The AI being pitched to your front desk answers the phone, books appointments and chats on your website. Same family of technology, completely different jobs.
This matters because vendors borrow the shine of the medical headlines to sell you a glorified chatbot. The fact that AI can read a mammogram tells you nothing about whether this particular phone bot can understand a nervous patient mumbling a hard to spell last name at 8pm. You have to judge the tool on the job it will actually do in your office, which for most practices is one thing: capturing the patient and getting them booked without a human having to chase them.
We have written before about where AI genuinely helps a smaller office in AI for small medical practices, and about the trust side of it in how patients feel about doctors using AI. The short version: the wins are real, but only if the tool is actually good.
How to grade a healthcare AI tool in five minutes
You do not need a research team or a benchmark like Mass General Brigham's. You need a scaled down version of the same instinct: do not trust it, test it. Here is the five minute version you can run on any vendor before you sign anything.
1. Test it on your messiest real situations, not the demo script
The demo is rigged. Of course it works when the vendor asks it perfect questions. So break it on purpose. Throw your real Tuesday at it: a patient asking about a service you do not offer, someone trying to reschedule and cancel in the same breath, a caller who switches to Spanish halfway through, a name like Nguyen or Szczepanski. Watch what happens. Good AI stays calm, asks a clarifying question and gets it right. Weak AI guesses, books the wrong thing, or makes something up.
2. Check whether it knows when to get out of the way
The best healthcare AI is not the one that tries to answer everything. It is the one that knows its limits. When a caller has a clinical question, an emergency, or a situation it cannot handle, does it hand off cleanly to a human or to a clear next step, or does it bluff? An AI that bluffs about anything medical is a liability. An AI that says, in effect, let me get you to the right person, is one you can trust at the front of your practice.
3. Ask for proof from practices like yours
Vendors love to talk about technology. Ask them about results instead. How many calls does it answer for a real dental office or med spa? How many appointments does it actually book a month? What happens to the calls it cannot handle? If they can only show you the demo and not a single real practice outcome, you are the test case, and you are paying for the privilege.
4. Confirm it was built for healthcare and protects patient data
A generic chatbot with a stethoscope icon slapped on is not a healthcare tool. Patient information is protected health information, and a tool that was not designed with that in mind can quietly create a HIPAA problem. We dug into this in HIPAA compliant AI for healthcare marketing. Ask directly: is this built for healthcare, and how do you handle patient data? A good vendor answers without flinching.
5. Make sure you can actually test it yourself
This is the big one. If a vendor will not let you pick up the phone and try the AI yourself, right now, with your own weird questions, ask why. The whole point of the Mass General Brigham benchmark is that real testing beats a sales pitch. The same rule scales down perfectly. If you cannot test it in a few minutes, that hesitation is your answer.
The one test that matters most
Forget the clever voice and the long feature list. The only question that pays your rent is this: does it reliably capture the patient and get them booked, and does it know when to hand off to a human? An AI that books the appointment and never invents an answer is worth more than a brilliant one that loses a new patient at 8pm because it could not understand them. Judge it on booked patients and clean handoffs, nothing else.
Our honest opinion: most AI pitched to practices is hype with a healthcare logo
Here is where we plant a flag. A lot of what gets sold to practices as healthcare AI is a generic chatbot wearing a costume. It demos beautifully and falls apart on a real Tuesday afternoon. We have watched practices sign up, hand their phones to a bot that frustrates callers, and lose new patients for months without ever realizing the AI was the leak. That is worse than no AI at all, because a missed call from a human at least leaves a voicemail. A new patient who hangs up on a clumsy bot is just gone.
We are not anti AI. We build with it every day. We are against using your patients as the test group for a tool nobody bothered to grade. The reason this matters so much for a front desk is speed: most patients who reach a voicemail or a confusing system simply call the next practice on the list and never come back. We covered that hard truth in how fast you should respond to a new patient. Bad AI does not just fail to help. It actively loses you the people your marketing worked so hard to bring in.
How EtherealMinds approaches AI for practices
We treat AI the way the big systems do: it has to earn its place. Our AI receptionist, Emma, has one job and does it well. She answers every call and message instantly, day or night, books the appointment into your calendar, follows up with leads who slipped away, and hands off to a human the moment a situation calls for one. She is built for healthcare, not borrowed from a generic call center, and she logs where every booking came from so it plugs straight into your patient acquisition system.
And we hold ourselves to the rule in this whole article: do not trust it, test it. You can try our AI receptionist live right now. Pick up your phone, ask it your messiest real questions, try to trip it up, switch languages, throw it a hard name. We would rather you stress test it for ten minutes than take our word for anything. That is the same instinct Mass General Brigham just turned into a benchmark, scaled down to something a busy practice owner can do over a coffee.
The AI that touches your patients sits next to your website, your social media and your ads as the front door of your practice. It should be held to the highest standard in the building, because it is the first voice a new patient hears. So the next time someone pitches you AI, do not ask how smart it is. Ask if you can test it, right now, on your worst day. The good ones say yes immediately. The rest suddenly want to schedule another call.
Test our AI receptionist before you trust it
Book a free strategy call and we will show you exactly where your front desk is leaking patients, then let you stress test Emma yourself, live, with your own hardest questions. No scripted demo, no jargon, no pressure. If it does not impress you, you will know in five minutes.
Book a free strategy call →