J Comput Soc Sci
December 2024
Unlabelled: This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis and to establish a baseline performance benchmark that demonstrates the models' effectiveness. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets.
View Article and Find Full Text PDFSoc Netw Anal Min
August 2024
Unlabelled: Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored.
View Article and Find Full Text PDFSome major social media companies are announcing plans to tokenize user engagements, derived from blockchain-based decentralized social media. This would bring financial and reputational incentives for engagement, which might lead users to post more objectionable content. Previous research showed that financial or reputational incentives for accuracy decrease the willingness to share misinformation.
View Article and Find Full Text PDFMany NLP applications require manual text annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using four samples of tweets and news articles ( = 6,183), we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, stance, topics, and frame detection.
View Article and Find Full Text PDFInt J Data Sci Anal
December 2021
Unlabelled: The COVID-19 pandemic resulted in an upsurge in the spread of diverse conspiracy theories (CTs) with real-life impact. However, the dynamics of user engagement remain under-researched. In the present study, we leverage Twitter data across 11 months in 2020 from the timelines of 109 CT posters and a comparison group (non-CT group) of equal size.
View Article and Find Full Text PDFWe study how easy it is to distinguish influence operations from organic social media activity by assessing the performance of a platform-agnostic machine learning approach. Our method uses public activity to detect content that is part of coordinated influence operations based on human-interpretable features derived solely from content. We test this method on publicly available Twitter data on Chinese, Russian, and Venezuelan troll activity targeting the United States, as well as the Reddit dataset of Russian influence efforts.
View Article and Find Full Text PDF