On July 1, 2026, researchers Ananya Mantravadi, Harshit Rajgarhia, Prasanna Desikan, and Abhishek Mukherji published a paper titled World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments. This study explores the application of reinforcement learning (RL) in clinical tasks within FHIR environments, highlighting the challenges and potential solutions.
Understanding Reinforcement Learning in Clinical Protocols
Reinforcement learning is increasingly recognized for its potential in clinical protocol-execution tasks such as checking lab values and placing structured FHIR orders. The authors argue that these tasks are natural candidates for RL due to their ability to leverage world feedback. The study emphasizes the need for a robust feedback channel and sufficient base capabilities for effective RL implementation.
The research audits MedAgentBench versions 1 and 2, revealing a concerning 41.7% silent-finish ceiling, indicating that inaction becomes the dominant strategy for RL. To address this, the authors developed MedAgentBench-v3 (MAB-v3), which features 508 tasks and improves the ceiling to 8.9%.
Identifying Barriers to Effective Learning
The study highlights two significant barriers encountered during the training of Qwen3-8B: the capability ceiling and the format-knowledge barrier. Specifically, 10 out of 20 task types demonstrated 0% base performance, resulting in zero gradient. Furthermore, 3 out of 20 types necessitated exact clinical codes that are not easily discoverable through exploration.





