98%
921
2 minutes
20
The evolution of Large Language Models (LLMs) has significantly advanced artificial intelligence, driving innovation across various applications. Their continued development relies on a deep understanding of their capabilities and limitations. This is achieved primarily through rigorous evaluation based on diverse datasets. However, assessing state-of-the-art models in Arabic remains a formidable challenge due to the scarcity of comprehensive benchmarks. The absence of robust evaluation tools hinders the progress and refinement of Arabic LLMs and limits their potential applications and effectiveness in real-world scenarios. In response, we introduce the GATmath (7k questions) and GATLc (9k questions), two Arabic, large-scale, and multitask reasoning and language understanding benchmarks. Derived from the General Aptitude Test (GAT) examination, each dataset covers multiple categories, demanding skills in reasoning, semantic analysis, language comprehension, and mathematical problem-solving. To the best of our knowledge, our dataset is the first comprehensive and large-scale reasoning dataset specifically tailored to the Arabic language. We conducted a comprehensive evaluation and analysis of seven prominent LLMs on our datasets. Remarkably, even the highest-performing model attained a mere 66.9% and 64.3% accuracy, underscoring the considerable challenge posed by our datasets. This outcome illustrates the intricate nature of the tasks within our datasets and highlights the substantial room for improvement in the realm of Arabic language model development.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404542 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0329129 | PLOS |
J Safety Res
September 2025
Department of Construction Engineering and Management, North China University of Water Resources and Electric Power, Zhengzhou 450046, China. Electronic address:
Introduction: This study aims to provide a comprehensive review of the application of eye-tracking technology in construction safety, establishing a theoretical foundation and benchmark to guide future research and innovation in the field.
Method: This study identified 116 relevant papers published between 2003 and 2023 indexed by Web of Science (WoS), Scopus, and the American Society of Civil Engineers (ASCE) Library. The analysis of the 116 papers revealed trends about the dates of the publication of the papers, the locations of the research, the journals and conference proceedings that published the studies, and the extent of the collaboration between authors, which indicate that eye-tracking technology has become an important tool to enhance construction safety.
J Safety Res
September 2025
Institute for Traffic Medicine, Daping Hospital, Army Medical University, Chongqing, China.
Introduction: The continuous progression of autonomous driving technology is propelling the automotive industry into an unprecedented era, with the intelligence and driving safety capabilities of autonomous vehicles serving as crucial benchmarks for assessing industry development. However, crashes involving autonomous vehicles have raised concerns among both government authorities and the general public regarding this technology. Consequently, conducting a comprehensive analysis of crash causes and key causal factors holds immense significance for technological progress, personnel safety, and shaping the future direction of the automotive industry.
View Article and Find Full Text PDFNeural Netw
September 2025
School of Automation, Southeast University, Nanjing, 210096, China; Advanced Ocean Institute of Southeast University Nantong, Nantong, 226010, China. Electronic address:
Unmanned Aerial Vehicle (UAV) tracking requires accurate target localization from aerial top-down perspectives while operating under the computational constraints of aerial platforms. Current mainstream UAV trackers, constrained by the limited resources, predominantly employ lightweight Convolutional Neural Network (CNN) extractor, coupled with an appearance-based fusion mechanism. The absence of comprehensive target perception significantly constrains the balance between tracking accuracy and computational efficiency.
View Article and Find Full Text PDFPLoS One
September 2025
School of Computer Science, CHART Laboratory, University of Nottingham, Nottingham, United Kingdom.
Background And Objective: Male fertility assessment through sperm morphology analysis remains a critical component of reproductive health evaluation, as abnormal sperm morphology is strongly correlated with reduced fertility rates and poor assisted reproductive technology outcomes. Traditional manual analysis performed by embryologists is time-intensive, subjective, and prone to significant inter-observer variability, with studies reporting up to 40% disagreement between expert evaluators. This research presents a novel deep learning framework combining Convolutional Block Attention Module (CBAM) with ResNet50 architecture and advanced deep feature engineering (DFE) techniques for automated, objective sperm morphology classification.
View Article and Find Full Text PDFIEEE Trans Comput Biol Bioinform
September 2025
Accurately identifying associations between human genes (proteins) and clinical phenotypes is critical for advancing drug development and precision medicine. While the human phenotype ontology (HPO) standardizes clinical phenotypes, current computational approaches for predicting human protein-phenotype associations suffer from two limitations: (1) underutilization of multimodal protein-related information and (2) lack of state-of-the-art deep learning representations tailored to diverse data modalities, such as text and sequence. To overcome these limitations, we introduce MultiFusion2HPO, a novel multimodal model that integrates diverse features and advanced learning methods from multiple data sources to enhance the prediction of human protein-HPO associations.
View Article and Find Full Text PDF