Related papers: Word statistics in Blogs and RSS feeds: Towards em…
Collective human behaviors are analyzed using the time series of word appearances in blogs. As expected, we confirm that the number of fluctuations is approximated by a Poisson distribution for very-low-frequency words. A non-trivial…
To uncover underlying mechanism of collective human dynamics, we survey more than 1.8 billion blog entries and observe the statistical properties of word appearances. We focus on words that show dynamic growth and decay with a tendency to…
We observe the statistical properties of blogs that are expected to reflect social human interaction. Firstly, we introduce a basic normalization preprocess that enables us to evaluate the genuine word frequency in blogs that are…
To elucidate the non-trivial empirical statistical properties of fluctuations of a typical non-steady time series representing the appearance of words in blogs, we investigated approximately five billion Japanese blogs over a period of six…
Background: Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication.…
On-line communities offer a great opportunity to investigate human dynamics, because much information about individuals is registered in databases. In this paper, based on data statistics of online comments on Blog posts, we first present…
What dynamics govern a time series representing the appearance of words in social media data? In this paper, we investigate an elementary dynamics, from which word-dependent special effects are segregated, such as breaking news, increasing…
Ultraslow diffusion (i.e. logarithmic diffusion) has been extensively studied theoretically, but has hardly been observed empirically. In this paper, firstly, we find the ultraslow-like diffusion of the time-series of word counts of already…
The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional n-gram language models are usually derived using the assumption of a constant word rate. In this…
The distribution of frequency counts of distinct words by length in a language's vocabulary will be analyzed using two methods. The first, will look at the empirical distributions of several languages and derive a distribution that…
This paper describes the analysis of quantitative characteristics of frequent sets and association rules in the posts of Twitter microblogs related to different event discussions. For the analysis, we used a theory of frequent sets,…
Online social media such as the micro-blogging site Twitter has become a rich source of real-time data on online human behaviors. Here we analyze the occurrence and co-occurrence frequency of keywords in user posts on Twitter. From the…
In many complex systems studied in statistical physics, inter-arrival times between events such as solar flares, trades and neuron voltages follow a heavy-tailed distribution. The set of event times is fractal-like, being dense in some time…
We study the dynamics of public media attention by monitoring the content of online blogs. Social and media events can be traced by the propagation of word frequencies of related keywords. Media events are classified as exogenous - where…
Current models for opinion dynamics typically utilize a Poisson process for speaker selection, making the waiting time between events exponentially distributed. Human interaction tends to be bursty, though, having higher probabilities of…
It is part of our daily social-media experience that seemingly ordinary items (videos, news, publications, etc.) unexpectedly gain an enormous amount of attention. Here we investigate how unexpected these events are. We propose a method…
Inspired by previous works on human dynamics, we collect the temporal statistics of the article creation by three Western scientists and an Eastern writer. We investigate the distributions of the time intervals between the creations of…
The massive diffusion of online social media allows for the rapid and uncontrolled spreading of conspiracy theories, hoaxes, unsubstantiated claims, and false news. Such an impressive amount of misinformation can influence policy…
Recent observations in the theory of verse and empirical metrics have suggested that constructing a verse line involves a pattern-matching search through a source text, and that the number of found elements (complete words totaling a…
Weblog is the fourth way of network exchange after Email, BBS and MSN. Most bloggers begin to write blogs with great interest, and then their interests gradually achieve a balance with the passage of time. In order to describe the…