On July 1, 2026, researchers Karthikeya Aditya Vissa and colleagues published a paper titled Beyond Next-Token Prediction: An RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows. The study explores the limitations of large language models in executing specific API tasks within niche enterprise SaaS workflows, proposing a solution through Reinforcement Learning with Verifiable Rewards (RLVR).
Understanding the Challenges of API Integration
Large language models (LLMs) are primarily designed for next-token prediction, which can lead to significant issues when applied to specific APIs. In environments like Jira and Confluence, this can result in failures such as missing required fields or hallucinated tools. These silent failures undermine the effectiveness of LLMs in real-world applications.
The research highlights the need for a tailored approach to enhance the performance of these models when interacting with APIs, particularly in enterprise settings. The authors argue that the existing objectives of LLMs do not align with the requirements of precise API interactions.
RLVR: A Novel Approach for Improvement
The authors developed a proof-of-concept using RLVR, which applies reinforcement learning directly in the target environment. This method evaluates the model's performance based on tool-call traces rather than relying on human feedback or external judges. The study involved creating five synthetic environments that closely emulate the Jira REST v3 and Confluence v2 APIs.



