98%
921
2 minutes
20
This Research presents the first-ever, high-quality, automatically annotated Kurdish stance detection dataset in the Sorani dialect to fill the gap of lacking annotated resources for Kurdish, a low-resource language in Natural Language Processing (NLP). The dataset consists of 2,174 Kurdish news articles-1,410 economic and 764 political-that were originally published in 2024 and 2025, which are recent and topically relevant. By selecting these texts from well-known Kurdish news agencies, content validity and linguistic purity were preserved throughout. Necessary preprocessing techniques are applied. Annotation is carried out in two steps. First, a pattern-recognition method with 2,456 phrases and keywords was applied to determine if the subject of every text fell into the economics or politics category. Next, the position of every article was annotated with an extended lexicon of 4,243 adjectives and verbs, categorized under support, oppose, and neutral. Wherever direct matches were not possible, semantic similarity and zero-shot classification were used as fallback measures. In order to verify the automatic annotation, a team of domain experts manually assessed a representative sample of the annotated texts, with a high inter-annotator agreement score confirming the validity of the approach. The dataset is made available in XLSX (Excel) format, facilitating ease of use and versatility for a variety of research tasks in NLP. Due to its annotated and organized corpus, this dataset is a solid starting point for researchers who are building Kurdish language processing models. The dataset is released publicly to allow other researchers to build upon it and push the limits of NLP system performance on low-resource languages.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12266528 | PMC |
http://dx.doi.org/10.1016/j.dib.2025.111839 | DOI Listing |
Nat Commun
September 2025
Institute of Computational Biology, German Research Center for Environmental Health, Helmholtz Zentrum München, Neuherberg, Germany.
Atherosclerosis, a major cause of cardiovascular diseases, is characterized by the buildup of lipids and chronic inflammation in the arteries, leading to plaque formation and potential rupture. Despite recent advances in single-cell transcriptomics (scRNA-seq), the underlying immune mechanisms and transformations in structural cells driving plaque progression remain incompletely defined. Existing datasets often lack comprehensive coverage and consistent annotations, limiting the utility of downstream analyses.
View Article and Find Full Text PDFMar Pollut Bull
September 2025
Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8563, Japan. Electronic address:
Existing studies have identified a substantial amount of invisible floating debris in low-visibility marine environments, in addition to debris on the surface and seabed. These suspended pollutants represent a persistent and dynamic threat to marine ecosystems and maritime safety. Although sonar technology facilitates debris monitoring in low-visibility waters, the automatic extraction of small and weakly contrasted debris targets remains a critical challenge.
View Article and Find Full Text PDFNeural Netw
September 2025
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China. Electronic address:
Automatic segmentation of retinal vessels from retinography images is crucial for timely clinical diagnosis. However, the high cost and specialized expertise required for annotating medical images often result in limited labeled datasets, which constrains the full potential of deep learning methods. Recent advances in self-supervised pretraining using unlabeled data have shown significant benefits for downstream tasks.
View Article and Find Full Text PDFPLoS One
September 2025
School of Computer Science, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
Background: When analyzing cells in culture, assessing cell morphology (shape), confluency (density), and growth patterns are necessary for understanding cell health. These parameters are generally obtained by a skilled biologist inspecting light microscope images, but this can become very laborious for high-throughput applications. One way to speed up this process is by automating cell segmentation.
View Article and Find Full Text PDFJ Magn Reson Imaging
September 2025
Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan.
Background: Automated cardiac MR segmentation enables accurate and reproducible ventricular function assessment in Tetralogy of Fallot (ToF), whereas manual segmentation remains time-consuming and variable.
Purpose: To evaluate the deep learning (DL)-based models for automatic left ventricle (LV), right ventricle (RV), and LV myocardium segmentation in ToF, compared with manual reference standard annotations.
Study Type: Retrospective.