4 min read
Speech2FHIR: How Speech AI Generates Authentic, Interoperable Patient Data
nursIT Editorial Team
May 23, 2026 10:48:07 AM
Why voice AI in care must be able to do more than transcribe
AI-supported documentation is one of the great hopes in the healthcare sector. But the crucial question is not: Can AI listen? It is: Can it generate reliable, structured and interoperable data from a conversation?
The idea sounds impressively simple: a nurse talks to a patient. The AI listens, recognizes the relevant information and automatically transfers it to the right place in the care documentation. Less typing. Less rework. Fewer media breaks. More time for care.
But there is a crucial difference between a good idea and a viable solution in everyday clinical practice. After all, medical and nursing documentation does not consist of pretty summaries of conversations. It must be complete, comprehensible, verifiable and structured. Above all, it must be able to be processed where care actually takes place: in the clinical workplace system, in the HIS, in the FHIR-based data room and, in future, in cross-sector care processes.
This is precisely where the DMEA presentation "From Conversations to FHIR Questionnaire Responses" by Dr. Thomas Hartkens and Mubeen Ahmed Soomro came in. In the embedded video, they not only show what voice AI can achieve in documentation, but above all how to make its quality measurable. The basis was a specific use case: an AI was to generate a structured FHIR QuestionnaireResponse from normal nursing-patient conversations - not a free text protocol, but a standardized response structure to a comprehensive FHIR medical history form.
The market is on the move - but hype is not enough
The timing could hardly be more appropriate. AI-supported documentation is no longer a marginal topic. Large technology providers and specialized start-ups are investing heavily in so-called ambient documentation and voice AI solutions. 2025 Reuters reported on a financing round of 250 million US dollars for the company that uses AI to create medical documentation from doctor-patient conversations.
The political direction is also clear. In its digitalization strategy, the Federal Ministry of Health formulates the goal that AI-supported documentation should become standard in healthcare and nursing care; more than 70 percent of facilities should be actively using it by 2028. At the same time, the interoperability of documentation via syntactically and semantically interoperable data formats is explicitly emphasized. At European level, the European Health Data Space sets the framework for the secure exchange and reuse of electronic health data across borders.
The market situation is therefore clear: voice AI is coming. The only open question is in what quality - and with what structural foundation.
After all, only converting speech into text in healthcare only solves the first half of the problem. Speech-to-text is helpful. But it often remains unstructured text. Modern healthcare needs more: speech-to-FHIR.
The actual goal: not text, but usable data
FHIR is not a technical detail, but the decisive difference between digital storage and digitally usable care. The FHIR resource QuestionnaireResponse describes structured responses to defined questionnaires and can map complete or partial responses to a questionnaire. It is used for medical histories, assessments, admission forms and other structured surveys.
This is exactly where the use case from the presentation becomes exciting. nursIT works with consistently FHIR-based care documentation. In the study, an AI model was not just supposed to summarize a conversation, but to correctly fill out a very extensive FHIR questionnaire with more than 160 items. The AI therefore had to recognize relevant information from a natural conversation, assign it to the appropriate fields, use answer options correctly and at the same time avoid inventing information that did not appear in the conversation.
This is the real challenge of clinical AI: it must not only be linguistically convincing. It must be documentable.
The test: a normal conversation, a long FHIR questionnaire
A specific use case was examined in the presentation: A Large Language Model was to generate a structured FHIR QuestionnaireResponse from a normal conversation between a nurse and a patient.
The task was deliberately challenging. The medical history form used comprised more than 160 items. The conversations did not take place using a form, but as a natural conversation. The AI therefore had to recognize for itself which information was relevant, to which field it belonged and when a field had to be left blank.
Evaluation was not based on gut feeling. A manual ground truth was created for each interview and then compared with the AI answer. This involved not only correct or incorrect answers, but also missing entries and hallucinations - i.e. information that did not appear in the interview. This difference is particularly important in clinical documentation. Omitted information is problematic. Invented information can be even more so.
What the results show
Dedicated commercial AI solutions, GPT-4 variants and open source models such as LLaMA and Mistral were compared. The team also tested various prompting strategies and investigated how stable the models react with identical input.
One result was particularly striking: in the scenario examined, specialized providers were not automatically better than general language models or open models. Their advantage may lie in integration, support and product maturity - but when it comes to the pure filling of the FHIR structure, there was no simple correlation along the lines of: specialized product equals better response.
Equally important: open source models proved to be a realistic alternative. This is more than just a technical side note for the German healthcare market. Operation in a German cloud, data protection, control over infrastructure and possible on-premise scenarios are decisive criteria for many institutions.
The presentation thus makes a sober point: the question is no longer just whether AI can solve such tasks in principle. The greater challenge lies in integrating it properly into clinical workflows.
From speech-to-text to speech-to-FHIR
This perspective is central to nursIT. With careIT Voice, nursIT offers an approach that goes beyond traditional speech input: documentation should not only be transcribed, but transferred directly into FHIR-based structures. Accordingly, careIT Voice is a speech-based FHIR documentation - in short: Speech2FHIR.
The difference is significant. Speech-to-text turns speech into text. Speech-to-FHIR turns speech into structured information that can be reused in digital processes.
This means that AI documentation does not become an additional interface alongside the system, but part of an interoperable care infrastructure. Information can flow into assessments, fill in forms, support handovers and be used across sectors in the future.
The DMEA presentation shows an important building block for this: it is not the most beautiful demo that counts, but the measurable quality. Which models deliver stable results? Which errors occur? What role does prompting play? And how can conversation content be structured in such a way that it really helps in everyday clinical practice?