Rabbi Dr. Ari Berman, President and Rosh Yeshiva | Yeshiva University
Rabbi Dr. Ari Berman, President and Rosh Yeshiva | Yeshiva University
In a bid to bridge the communication divide in artificial intelligence, researchers at the Katz School have developed a new framework to enhance AI's understanding of low-resource languages. AI's current limitations in dealing with languages that lack extensive digital text or annotated datasets have posed challenges in multilingual interaction. Led by Dr. David Li, the project was presented at the IEEE SoutheastCon 2025 in Charlotte, N.C.
“Our work centers on a field called cross-lingual natural language understanding, which involves building systems that can learn from high-resource languages and apply that knowledge to others,” said Dr. Li. The approach combines innovative data techniques to transfer knowledge from well-documented languages to those with less digital presence, enhancing AI's flexibility across various languages.
Traditional AI models like XLM-RoBERTa and mBERT exhibit capability in understanding patterns across languages; however, their effectiveness diminishes without abundant training data. The lack of diverse datasets makes it challenging for AI to grasp the subtleties of less common languages.
The researchers introduced a four-part strategy aimed at improving multilingual settings' efficiency and accuracy. Their methodology encompasses enhanced data augmentation, contrastive learning, dynamic weight adjustment, and adaptation layers. This approach not only creates diverse training examples but also enables AI systems to align features and adjust to specific language tasks.
Testing on multilingual datasets such as XNLI, MLQA, and XTREME demonstrated the framework's superiority over traditional methods, especially in low-resource scenarios. “In all cases, our new framework outperformed traditional methods, especially in low-resource settings,” said Hang Yu, reflecting on the combined impact of contrastive learning and augmentation.
The efficiency of the framework, requiring minimal increase in computational resources, makes it viable for real-world applications. An ablation study confirmed key elements like contrastive learning and cross-lingual feature mapping are crucial for the model's success.
“This research isn’t just academic. In real-world scenarios—such as disaster response, global health communications or inclusive tech development—understanding low-resource languages can have life-saving consequences,” stated Ruiming Tian. This technological advancement also holds promise for cultural preservation and offers AI tools access to otherwise neglected languages.
Dr. Li emphasized the broader implications of their work: “As AI becomes more deeply embedded in daily life, from phone apps to government services, ensuring it works well for everyone is both a technical and moral challenge.”
The team's efforts bring AI a step closer to inclusivity, demonstrating its potential to understand and communicate across all languages, regardless of resource availability.