Arabic-Russian Parallel Corpus launched on June 29, 2026, aims to enhance scientific communication between Arabic and Russian researchers. Developed by M. K. Arabov, this initiative addresses language barriers that hinder the exchange of sustainability-related research.
Significance of the Arabic-Russian Benchmark
Russian and Arabic are vital for global scientific discourse. However, language differences restrict collaborations and slow down progress in critical areas like sustainability. The newly created benchmark comprises a hybrid parallel corpus with approximately 27,000 sentence pairs, sourced from scientific abstracts and various texts, including religion and news.
This initiative is crucial for fostering international partnerships and innovation, aligning with the United Nations Sustainable Development Goals (SDGs) 9 and 17.
Multilingual Models and Performance Metrics
Three multilingual language models were fine-tuned using Low-Rank Adaptation (LoRA): mT5-base (580M parameters), NLLB-200-distilled-1.3B (1.3B), and Qwen2.5-7B-Instruct (7B). The Qwen2.5-7B model, fine-tuned with QLoRA at rank 8, achieved impressive results: BLEU score of 23.15, chrF score of 43.89, BERTScore of 0.906, and COMET score of 0.758.





