Rabbi Dr. Ari Berman, President and Rosh Yeshiva | Yeshiva University
Rabbi Dr. Ari Berman, President and Rosh Yeshiva | Yeshiva University
Hang Yu, a student in the Katz School's M.S. in Artificial Intelligence, will present his research at the 2025 IEEE Conference in January. The study, co-authored with Dr. David Li, focuses on improving video quality assessment (VQA) using deep learning techniques.
The research introduces a dual-path deep learning framework aimed at ensuring flawless video streaming under varying network conditions. Traditional methods for assessing video quality often fall short when dealing with complex real-world challenges. The new approach by Katz School researchers uses artificial intelligence to analyze data and identify subtle distortions, enhancing the viewing experience.
Video Quality Assessment presents several challenges, including balancing sharp detail with broader motion context. Models focusing too much on fine details may miss larger motion contexts, while those emphasizing motion might overlook critical details in fast-moving videos.
The researchers employ the SlowFast model architecture for video analysis. This dual-speed processor captures detailed information through a "slow" pathway analyzing video at a lower frame rate and a "fast" pathway that observes overall motion and flow. This combination ensures both fine details and large-scale context are considered.
Additional tools developed by the team include PatchEmbed3D for understanding spatial and temporal dynamics, WindowAttention3D for maintaining local details, Semantic Transformation and Global Position Indexing for consistency, and Cross Attention and Patch Merging to enhance communication between pathways.
To train their system, the team used PLCC Loss and Rank Loss functions along with cosine annealing for efficient learning. Dr. David Li noted that testing on public datasets showed the model outperforms existing methods both numerically and subjectively.
The two-stage training process was crucial to success; it taught broad patterns first before fine-tuning intricate detail recognition. Future exploration could involve more sophisticated strategies or addressing specific challenges like compression artifacts.
“This research bridges the gap between technology and human experience,” said Hang Yu, highlighting its potential impact across fields such as gaming and virtual reality where video quality is vital.