Indi-RomCoM Benchmark Evaluates LLMs

On June 29, 2026, researchers Avisha Das, Mihir Parmar, Mohana Ramnath, and Pulkit Verma introduced the Indi-RomCoM benchmark aimed at evaluating Large Language Models (LLMs) on Romanized Indic-English instructions. This benchmark is essential for understanding how well LLMs can handle code-mixed communication prevalent in multilingual communities.

Understanding Romanized Code Mixing

Romanized Code Mixing (RCM) is a form of communication where bilingual speakers fluidly blend local languages with English using Roman script. This method has become increasingly common in diverse linguistic environments. Despite the growing use of RCM, the performance of LLMs in this context remains largely unexplored.

The Indi-RomCoM benchmark addresses this gap by providing a systematic evaluation framework that includes seven instruction-following tasks across four widely spoken Indic languages. The benchmark also incorporates three levels of controlled code-mixing intensity, allowing for a comprehensive assessment of LLM capabilities.

Performance Evaluation of LLMs

The researchers conducted extensive evaluations of various LLMs, including proprietary, open-weight, and Indic-focused models. These evaluations were carried out under zero- and few-shot settings. The findings revealed that LLMs consistently underperform when handling RCM instructions, with performance deterioration correlating with increased code-mixing density.

Interestingly, the study noted that reasoning tasks exhibited less performance degradation compared to detection tasks, such as toxicity detection. The generated explanations provided necessary context, helping to mitigate some of the challenges faced by LLMs in processing code-mixed content.

Implications for Multilingual Systems

The introduction of the Indi-RomCoM benchmark is a significant step towards the development of inclusive multilingual systems. By facilitating evaluations that consider the unique challenges posed by RCM, the research aims to foster improvements in LLM performance across diverse language settings.

As the demand for effective communication tools in multilingual communities grows, understanding and addressing the limitations of LLMs in processing code-mixed instructions will be crucial. The Indi-RomCoM benchmark is a vital resource for researchers and developers working to enhance the capabilities of LLMs in these complex linguistic environments.

🤖 This article was rewritten by Feed and Figures' editorial AI from a report originally published by arXiv NLP. Facts and quotes are preserved from the original; the rewrite focuses on clarity and structure. For the unedited original, see the source link below.

Indi-RomCoM Benchmark Evaluates LLMs on Romanized Indic-English Instructions

Understanding Romanized Code Mixing

Performance Evaluation of LLMs

Implications for Multilingual Systems

Related stories

FLARE-AI Launches to Report and Track AI Misbehavior Amid Growing Concerns

When Calibration Rankings Reverse: Evaluating LLMs with Accuracy-Controlled Framework

Using AI Agents for Black-Box Audits of Personalization Algorithms at Scale

Production Skill Description Optimization: Key Insights from a New Study