A large-scale dataset of AI-related tweets: Structure and descriptive statistics.

Data Brief

HEC Montréal, Full Professor at Department of International Business, 3000 de la Cote-Sainte-Catherine Ch, Montreal, Quebec, H3T2A7, Canada.

Published: October 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

This article presents a curated and anonymized dataset of tweets related to artificial intelligence (AI), comprising 893,076 entries collected using the Twitter API between January 1, 2017, and July 19, 2021. These tweets were extracted from a larger initial corpus using the keyword "Artificial Intelligence" and subsequently filtered to ensure data quality, multilingual coverage, and public accessibility. The final dataset includes structured metadata such as media elements (images, videos, and URLs), user engagement metrics (likes, retweets, replies), hashtags, language codes, and temporal indicators (hour and weekday of posting). While additional linguistic features-such as text length and tokenization-were used in internal analyses, they are not included in the public release. This dataset offers a robust foundation for research on the evolution of public discourse surrounding AI, including sentiment analysis, topic modeling, social engagement dynamics, and policy-relevant evaluations. It is openly available through established repositories and adheres to the FAIR principles, facilitating transparency, reproducibility, and interdisciplinary applications in computational social science, natural language processing, and AI governance research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361611PMC
http://dx.doi.org/10.1016/j.dib.2025.111960DOI Listing

Publication Analysis

Top Keywords

large-scale dataset
4
dataset ai-related
4
ai-related tweets
4
tweets structure
4
structure descriptive
4
descriptive statistics
4
statistics article
4
article presents
4
presents curated
4
curated anonymized
4

Similar Publications

ResDeepGS: A deep learning-based method for crop phenotype prediction.

Methods

September 2025

School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China; Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China. Electronic address:

Genomic selection (GS) is a breeding technique that utilizes genomic markers to predict the genetic potential of crops and animals. This approach holds significant promise for accelerating the improvement of agronomic traits and addressing food security challenges. While traditional breeding methods based on statistical or machine learning techniques have been useful in predicting traits for some crops, they often fail to capture the complex interactions between genotypes and phenotypes.

View Article and Find Full Text PDF

Allosteric proteins play a central role in biological processes and systems. Identifying the biological impact of mutations on allosteric proteins and the phenotypes they influence during disease initiation and progression presents a significant challenge. In theory, computational methods have the potential to facilitate the interpretation of genetic variants in allosteric proteins on a large scale.

View Article and Find Full Text PDF

EndoChat: Grounded multimodal large language model for endoscopic surgery.

Med Image Anal

August 2025

The Chinese University of Hong Kong, 999077, Hong Kong Special Administrative Region of China. Electronic address:

Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making. In the context of robotic-assisted surgery, MLLMs can serve as effective tools for surgical training and guidance. However, there is still a deficiency of MLLMs specialized for surgical scene understanding in endoscopic procedures.

View Article and Find Full Text PDF

Benchmarking AI-driven acoustic monitoring for floating marine debris: Challenges in deep learning-based debris extraction.

Mar Pollut Bull

September 2025

Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8563, Japan. Electronic address:

Existing studies have identified a substantial amount of invisible floating debris in low-visibility marine environments, in addition to debris on the surface and seabed. These suspended pollutants represent a persistent and dynamic threat to marine ecosystems and maritime safety. Although sonar technology facilitates debris monitoring in low-visibility waters, the automatic extraction of small and weakly contrasted debris targets remains a critical challenge.

View Article and Find Full Text PDF

The widespread dissemination of fake news presents a critical challenge to the integrity of digital information and erodes public trust. This urgent problem necessitates the development of sophisticated and reliable automated detection mechanisms. This study addresses this gap by proposing a robust fake news detection framework centred on a transformer-based architecture.

View Article and Find Full Text PDF