Related papers: The Pushshift Reddit Dataset
This contribution argues that Reddit, as a massive, categorized, open-access dataset, is a useful data source, for "almost any topic". Hence, it can be used in data science, e.g. for knowledge exploration. This statement is backed-up with…
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and…
Online forums provide rich environments where users may post questions and comments about different topics. Understanding how people behave in online forums may shed light on the fundamental mechanisms by which collective thinking emerges…
Stress is a nigh-universal human experience, particularly in the online world. While stress can be a motivator, too much stress is associated with many negative health outcomes, making its identification useful across a range of domains.…
There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing…
Music engagement spans diverse interactions with music, from selection and emotional response to its impact on behavior, identity, and social connections. Social media platforms provide spaces where such engagement can be observed in…
Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia…
Social media is very popular for facilitating conversations about important topics and bringing forth insights and issues related to these topics. Reddit serves as a platform that fosters social interactions and hosts engaging discussions…
Social media data has become a vital resource for studying mental health, offering real-time insights into thoughts, emotions, and behaviors that traditional methods often miss. Progress in this area has been facilitated by benchmark…
Existing image editing models struggle to meet real-world demands. Despite excelling in academic benchmarks, they have yet to be widely adopted for real user needs. Datasets that power these models use artificial edits, lacking the scale…
In the context of modern life, particularly in Industry 4.0 within the online space, emotions and moods are frequently conveyed through social media posts. The trend of sharing stories, thoughts, and feelings on these platforms generates a…
In the past few years, Reddit -- a community-driven platform for submitting, commenting and rating links and text posts -- has grown exponentially, from a small community of users into one of the largest online communities on the Web. To…
As researchers use computational methods to study complex social behaviors at scale, the validity of this computational social science depends on the integrity of the data. On July 2, 2015, Jason Baumgartner published a dataset advertised…
Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID,…
Messaging platforms, especially those with a mobile focus, have become increasingly ubiquitous in society. These mobile messaging platforms can have deceivingly large user bases, and in addition to being a way for people to stay in touch,…
Selective exposure is the main driver for the economy of attention when consuming online content. We select information adhering to our system of beliefs and ignore dissenting information. However, even personal interest is likely to play a…
We present a method for mapping Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and…
As millions of people use ChatGPT for tasks such as education, writing assistance, and health advice, concerns have grown about how personal prompts and data are stored and used. This study explores how Reddit users collectively negotiate…
High-Frequency Trading (HFT) is pivotal in cryptocurrency markets, demanding rapid decision-making. Social media platforms like Reddit offer valuable, yet underexplored, information for such high-frequency, short-term trading. This paper…
Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research. Social media are unique and important sources of information about SUDs,…