Standardized onboarding
New clinicians ramp on the same vetted cases, so every hire starts from a consistent, certifiable baseline instead of whatever walks through the door.
ChatGeneT runs realistic, multi-turn consultations so junior clinicians can practice the hardest part of medicine: asking the right questions. Patients that lead with their worries, open up unevenly, and stay true to their history, available on demand.
Each session is a full multi-turn encounter. The simulated patient leads with what worries it most, volunteers history unevenly, asks its own questions, and sometimes holds back, exactly the way a real person does in the room. It is a safe place to build judgment before a clinician ever sits across from a patient.
New clinicians ramp on the same vetted cases, so every hire starts from a consistent, certifiable baseline instead of whatever walks through the door.
Objective, repeatable evaluation across the same scenarios turns informal judgment into a measurable, defensible signal of readiness.
Lifelike practice sharpens history-taking and reasoning, raising the quality and efficiency of real consultations once clinicians are on the floor.
Prompting a model to act sick is not enough. We learn how real patients actually talk, then build that behavior into the simulator and hold it to a measurable bar before it ships.
We distill patient dialogue strategies from real doctor and patient conversations: how patients open, what they volunteer, when they push back, and how they show worry.
Those strategies, paired with structured case records, generate training dialogues. The simulator is fine-tuned entirely on this curated, fully anonymized data.
Tight evaluation holds the simulator to a 0.31% hallucination rate and a 0.87 anthropomorphism score, so dialogue stays faithful to the record and human in feel.
Annotation and review become a repeatable pipeline, so case coverage expands and quality improves with every release.
Most medical AI is judged on the diagnosis. Our work focuses on the step before it: the questions. What we found reshapes how clinicians should be trained and assessed.
Inquiry quality sets the ceiling. A clinician with excellent diagnostic instinct still fails when the questioning is poor, and sharp questioning is wasted on weak reasoning. The weaker of the two decides the outcome.
Accuracy climbs as a clinician asks more, but only up to the point a real patient will stay engaged. Beyond that, people disengage.
Opening the encounter and surfacing the main concern the patient came in with.
Pinning down the character, timing, and severity of what the patient has already raised.
Probing related signs that widen or narrow the differential before committing.
Drawing out background and risk factors that can shift the diagnosis entirely.
Where a clinician spends their questions, across these four types, measurably changes the diagnosis they reach. ChatGeneT makes that skill practiceable and measurable.
Share of replies that contradict the patient’s own record. Lower is better, and ours sits far below earlier systems.
How human the patient feels: emotion, initiative, and natural phrasing, scored from 0 to 1.
Real patients sometimes sidestep a question. We preserve that instead of forcing tidy answers, so practice matches the clinic.
High satisfaction from clinicians training on the simulator.
Standardized onboarding and assessment at scale.
Deployed for onboarding and assessment across partner sites.
Real consultation behavior distilled into the training corpus.
“Realistic multi-turn patient dialogue gave our junior clinicians a consistent way to practice and be assessed, training and assisted consultation finally on the same standard.”
We work with hospitals to roll out ChatGeneT for onboarding and competency assessment. Reach out to see the simulator in action.