Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

This article describes the Extended Quranic Treebank (EQTB), a comprehensive, multi-layered, and computationally accessible linguistic resource for Classical Arabic (CA), meticulously developed to overcome the documented limitations of the original Quranic Treebank. Leveraging foundational data from established Quranic digital resources, EQTB features systematically expanded orthographic representations generated via algorithmic processing and validation; rigorously refined morphological annotations based on expanded expert-informed schemas, automated re-annotation, and manual curation; and critically, a novel, complete syntactic layer constructed through algorithmic conversion of prior graphical data, Deep Learning-based parsing achieving full coverage under a hybrid constituency-dependency framework, and expert validation. Encompassing the entire Quran (∼132,736 tokens), the dataset is structured in an adapted CoNLL-X format across 43 columns, detailing multiple orthographies, fine-grained morphology (45 tags), and complete hybrid syntax (140 tags/labels), complemented by auxiliary lexicons and schemas. EQTB offers significant reuse potential, providing crucial training/evaluation data for diverse CA NLP tasks (parsing, morphology, diacritization), supporting linguistic research, and enabling the development of advanced pedagogical tools and language technologies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361616PMC
http://dx.doi.org/10.1016/j.dib.2025.111940DOI Listing

Publication Analysis

Top Keywords

quranic treebank
12
classical arabic
8
complete multi-layered
4
quranic
4
multi-layered quranic
4
treebank dataset
4
dataset hybrid
4
hybrid syntactic
4
syntactic annotations
4
annotations classical
4

Similar Publications

This article describes the Extended Quranic Treebank (EQTB), a comprehensive, multi-layered, and computationally accessible linguistic resource for Classical Arabic (CA), meticulously developed to overcome the documented limitations of the original Quranic Treebank. Leveraging foundational data from established Quranic digital resources, EQTB features systematically expanded orthographic representations generated via algorithmic processing and validation; rigorously refined morphological annotations based on expanded expert-informed schemas, automated re-annotation, and manual curation; and critically, a novel, complete syntactic layer constructed through algorithmic conversion of prior graphical data, Deep Learning-based parsing achieving full coverage under a hybrid constituency-dependency framework, and expert validation. Encompassing the entire Quran (∼132,736 tokens), the dataset is structured in an adapted CoNLL-X format across 43 columns, detailing multiple orthographies, fine-grained morphology (45 tags), and complete hybrid syntax (140 tags/labels), complemented by auxiliary lexicons and schemas.

View Article and Find Full Text PDF