A large-scale dataset of AI-related tweets: Structure and descriptive statistics.

Nathalie de Marcellis-Warin , Daniel Kouloukoui , Thierry Warin

Data Brief

HEC Montréal, Full Professor at Department of International Business, 3000 de la Cote-Sainte-Catherine Ch, Montreal, Quebec, H3T2A7, Canada.

Published: October 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

This article presents a curated and anonymized dataset of tweets related to artificial intelligence (AI), comprising 893,076 entries collected using the Twitter API between January 1, 2017, and July 19, 2021. These tweets were extracted from a larger initial corpus using the keyword "Artificial Intelligence" and subsequently filtered to ensure data quality, multilingual coverage, and public accessibility. The final dataset includes structured metadata such as media elements (images, videos, and URLs), user engagement metrics (likes, retweets, replies), hashtags, language codes, and temporal indicators (hour and weekday of posting). While additional linguistic features-such as text length and tokenization-were used in internal analyses, they are not included in the public release. This dataset offers a robust foundation for research on the evolution of public discourse surrounding AI, including sentiment analysis, topic modeling, social engagement dynamics, and policy-relevant evaluations. It is openly available through established repositories and adheres to the FAIR principles, facilitating transparency, reproducibility, and interdisciplinary applications in computational social science, natural language processing, and AI governance research.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361611	PMC
http://dx.doi.org/10.1016/j.dib.2025.111960	DOI Listing

Publication Analysis

Top Keywords

large-scale dataset

dataset ai-related

ai-related tweets

tweets structure

structure descriptive

descriptive statistics

statistics article

article presents

presents curated

curated anonymized

Similar Publications

ResDeepGS: A deep learning-based method for crop phenotype prediction.

Methods

September 2025

School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China; Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China. Electronic address:

Chaokun Yan , Jiabao Li , Qi Feng , Junwei Luo , Huimin Luo

Genomic selection (GS) is a breeding technique that utilizes genomic markers to predict the genetic potential of crops and animals. This approach holds significant promise for accelerating the improvement of agronomic traits and addressing food security challenges. While traditional breeding methods based on statistical or machine learning techniques have been useful in predicting traits for some crops, they often fail to capture the complex interactions between genotypes and phenotypes.

View Article and Find Full Text PDF

Similar Publications

Unveiling the Pathogenicity of Allosteric Protein Mutations via Multifaceted Feature Ensembling.

Methods

September 2025

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. Electronic address:

Huiling Zhang , Xijian Li , Junwen Huang , Yuetong Li , Shaozhen Cai

Allosteric proteins play a central role in biological processes and systems. Identifying the biological impact of mutations on allosteric proteins and the phenotypes they influence during disease initiation and progression presents a significant challenge. In theory, computational methods have the potential to facilitate the interpretation of genetic variants in allosteric proteins on a large scale.

View Article and Find Full Text PDF

Similar Publications

EndoChat: Grounded multimodal large language model for endoscopic surgery.

Med Image Anal

August 2025

The Chinese University of Hong Kong, 999077, Hong Kong Special Administrative Region of China. Electronic address:

Guankun Wang , Long Bai , Junyi Wang , Kun Yuan , Zhen Li

Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making. In the context of robotic-assisted surgery, MLLMs can serve as effective tools for surgical training and guidance. However, there is still a deficiency of MLLMs specialized for surgical scene understanding in endoscopic procedures.

View Article and Find Full Text PDF

Similar Publications

Benchmarking AI-driven acoustic monitoring for floating marine debris: Challenges in deep learning-based debris extraction.

Mar Pollut Bull

September 2025

Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8563, Japan. Electronic address:

Xiaoteng Zhou , Katsunori Mizuno

Existing studies have identified a substantial amount of invisible floating debris in low-visibility marine environments, in addition to debris on the surface and seabed. These suspended pollutants represent a persistent and dynamic threat to marine ecosystems and maritime safety. Although sonar technology facilitates debris monitoring in low-visibility waters, the automatic extraction of small and weakly contrasted debris targets remains a critical challenge.

View Article and Find Full Text PDF

Similar Publications

Enhancing fake news detection with transformer-based deep learning: A multidisciplinary approach.

PLoS One

September 2025

Department of Computer Science, COMSATS University Islamabad, Sahiwal, Pakistan.

Nabeel Raza , Said Jadid Abdulkadir , Yawar Abbas Abid , Sami S Albouq , Ayed Alwadain

The widespread dissemination of fake news presents a critical challenge to the integrity of digital information and erodes public trust. This urgent problem necessitates the development of sophisticated and reliable automated detection mechanisms. This study addresses this gap by proposing a robust fake news detection framework centred on a transformer-based architecture.

View Article and Find Full Text PDF

Similar Publications