On June 29, 2026, researchers Tanvir Ahmed Sijan and colleagues published a study evaluating the robustness of event detection systems for the Bangla language, focusing on how these systems perform with noisy text. This research highlights the challenges faced in low-resource languages, where traditional models are often tested on clean, curated datasets.
Understanding the Need for Robust Event Detection
Event detection (ED) systems are crucial for processing and understanding real-time information, especially in a multilingual context. The study introduces a new Bangla news event ontology along with a benchmark containing 9,979 annotated sentences across 40 event subtypes. This benchmark includes clean news text, Automatic Speech Recognition (ASR) transcripts, and orthographically corrupted text, providing a comprehensive view of the challenges in real-world applications.
The researchers systematically assessed the performance of fine-tuned encoder-only models, such as BanglaBERT and XLM-R, against instruction-tuned decoder-only large language models (LLMs), including Llama 3 and Gemma 3. Their findings reveal significant differences in how these models handle various text conditions.
Key Findings on Model Performance
The research indicates a clear architectural trade-off: encoder models perform well on clean text but struggle significantly when faced with noise. In contrast, decoder-only LLMs demonstrate greater robustness, particularly when event triggers are corrupted. This finding is critical for future developments in event detection systems, particularly for languages like Bangla that may not have extensive resources.





