98%
921
2 minutes
20
One of the first steps in many text-based social science studies is to retrieve documents that are relevant for an analysis from large corpora of otherwise irrelevant documents. The conventional approach in social science to address this retrieval task is to apply a set of keywords and to consider those documents to be relevant that contain at least one of the keywords. But the application of incomplete keyword lists has a high risk of drawing biased inferences. More complex and costly methods such as query expansion techniques, topic model-based classification rules, and active as well as passive supervised learning could have the potential to more accurately separate relevant from irrelevant documents and thereby reduce the potential size of bias. Yet, whether applying these more expensive approaches increases retrieval performance compared to keyword lists at all, and if so, by how much, is unclear as a comparison of these approaches is lacking. This study closes this gap by comparing these methods across three retrieval tasks associated with a data set of German tweets (Linder in SSRN, 2017. 10.2139/ssrn.3026393), the Social Bias Inference Corpus (SBIC) (Sap et al. in Social bias frames: reasoning about social and power implications of language. In: Jurafsky et al. (eds) Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, p 5477-5490, 2020. 10.18653/v1/2020.aclmain.486), and the Reuters-21578 corpus (Lewis in Reuters-21578 (Distribution 1.0). [Data set], 1997. http://www.daviddlewis.com/resources/testcollections/reuters21578/). Results show that query expansion techniques and topic model-based classification rules in most studied settings tend to decrease rather than increase retrieval performance. Active supervised learning, however, if applied on a not too small set of labeled training instances (e.g. 1000 documents), reaches a substantially higher retrieval performance than keyword lists.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9762672 | PMC |
http://dx.doi.org/10.1007/s42001-022-00191-7 | DOI Listing |
Food Res Int
November 2025
School of Preclinical Medicine, Chengdu University, Chengdu, Sichuan 610106, China. Electronic address:
Background: Type 2 Diabetes Mellitus (T2DM) is a chronic metabolic disease characterized by insulin resistance and progressive decline in pancreatic beta cell function. It is a public health problem of great magnitude that has been increasing globally over the last 4 decades. The latest research has found that sugar-sweetened beverages (SSBs), as an important dietary risk factor, are closely related to the occurrence and development of T2DM.
View Article and Find Full Text PDFAcad Psychiatry
September 2025
University of South Carolina School of Medicine, Greenville, SC, USA.
Objective: Application review is a lengthy time commitment. The objective of this study is to retrospectively compare the list of recommended applicants as generated by two processes: (1) faculty holistic review and (2) keyword search via Thalamus Cortex, residency application management software, to see how much overlap exists between the two strategies.
Methods: Faculty at the training program completed the traditional application review performed by manual, holistic review of each eligible application, and submitted scores on their top 10-15 applicants to the program director (PD).
Klin Mikrobiol Infekc Lek
June 2025
Department of Infectious Diseases and Travel Medicine, Second Faculty of Medicine, Charles University and University Hospital Motol, Prague, Czech Republic, e-mail:
Skin and soft tissue infections (SSTIs) represent a diverse spectrum of conditions, including erysipelas, cellulitis, cutaneous abscesses, necrotizing fasciitis, and myonecrosis. Erysipelas and cellulitis are the most common community-acquired SSTIs. Erysipelas is typically caused by pyogenic streptococci, while cellulitis often has a staphylococcal etiology.
View Article and Find Full Text PDFEpidemiol Rev
August 2025
INSPIIRE, Université de Lorraine, F-54000, Nancy, France.
This systematic review aimed to identify effect modification and interaction factors that moderate the association between socioeconomic status (SES) and smoking behavior among adolescents. We searched PubMed, Embase, PsycINFO, and Web of Science using keywords including "adolescents," "smoking," "inequality," "effect modification," and "interaction." Peer-reviewed articles published in English or French between January 1, 2011, and December 31, 2021, were included, alongside relevant studies identified from reference lists.
View Article and Find Full Text PDFMedicina (Kaunas)
August 2025
Department of Diabetes, Nutrition and Metabolic Diseases, University of Medicine and Pharmacy "Carol Davila", 030167 Bucharest, Romania.
: The management of type 2 diabetes (T2D) extends beyond glycemic control, requiring a more global strategy that includes optimization of body composition, even more so in the context of sarcopenia and visceral adiposity, as they contribute to poor outcomes. Past reviews have typically been focused on weight reduction or glycemic effectiveness, with limited inclusion of new therapies' effects on muscle and fat distribution. In addition, the emergence of incretin-based therapies and dual agonists such as tirzepatide requires an updated synthesis of their impacts on body composition.
View Article and Find Full Text PDF