Online Contributions to the Global Community: The Power Laws for New Media
Goran S. Milovanović

[note: version of “Online Contributions to the Global Community: The Power Laws for New Media” in Serbian is available on Blic blog]
Greek philosopher Aristotle lived and developed his philosophy in the 4th century B.C., between 384. and 322. before the dawn of the new era. Aristotle’s philosophy made such a profound influence on later developments of Western thought that one could reasonably ask whether any other author was ever more influential than him. According to standard editions Corpus Aristotelicum encompasses some 47 works without accounting for fragments [1]. According to UNESCO [2], 206,000 books were published in 2005. only in UK, and 172,000 new titles were published in the USA; the data available from bowkerinfo.com [3] witness for 561,590 new titles in the USA in 2008. Until the moment this text was finished, there were around 744,899 books published worldwide [4]. My question: who will read Aristotle in ten thousand years from now?
Focus for a moment on your memory of some history textbook, or a historical atlas, where you were able to find illustrated timelines of the development of human kind. Less central events would be marked in small, while essential changes would receive large, bold letters, standing against the tickmarks that referred to years and centuries. The largest font would refer to discoveries of writing and Arabic numerals, a discovery of clock, money, the discovery of philosophy and drama in ancient Greece, the codification of Roman Law, the discovery of linear perspective in Renaissance, the discovery of calculus and probability theory, of print, novel, telegraph, steam engine, airplane, nuclear fission, computer… soon the years and decades that span through our lifetimes will be accompanied by enormous, bold letters: The Internet. In the summer of 2010, two billions people (around 1,966,514,816, more precisely), or 28,7% percent of total World population (around 6,845,609,960, more precisely) were online [5].
Of course, the expansion of people online was followed by an expansion of online information. People create and publish texts and photography, they make videos and share music, remixes, animations. The explosion of social media like Facebook [6] and Twitter [7] naturally extended the technological capacities of any person with decent Internet access. The revolution of social media created Web 2.0: if you have a Google account, you can automatically use your Google or Youtube profile, or share photos and graphics on Picasa; Tumblr, Word Press, Blogger and who knows how many more platforms enable for free blogging; add Digg or a few more social networks perhaps, Delicious for your bookmarks or Flickr for photos; link everything to your Facebook and Twitter accounts. Anyone is able to create contents and share them with whoever they choose to share with ease. Wikipedia enables you to start your own or edit an already existing entry. The direct consequence of this revolution is stunning: the number of authors who contribute to the knowledge base of humanity with their text, image, sound or video exploded in comparison to our understanding of authorship in any media only ten years ago.
In 2009, American scientist Denis G. Pelli and professor of graphic arts Charles Bigelow published a simple but very convincing analysis of the development of the concept of authorship and publishing [8]. They compared the historical line of the increase in authorship of books with those of the new media. According to their findings, the number of book authors grew approximately tenfold each century since the beginning of the XV century. In 2000, the growth of authorship reached a rate of approximately one million authors yearly, accounting for some 0,01% of World population (almost seven billions). The current trend of authorship in new media (social networks) follows a tenfold rate of increase yearly, which makes it one hundred times steeper than the increase in book authorship over centuries. The extrapolation of the Twitter authorship growth curve, with all users with 100 or more followers being considered as authors, predicts a 100% rate of participation in authorship in 2013; after strengthening this criterion to 1000 or more followers, the Pelli-Bigelow model predicts that full participation in Twitter authorship will be reached in only a year later.
Hundreds of millions of people are authors on Twitter, Facebook, Flickr, etc. But who, and when, reads, listens to, and thinks about all this production? Is this growth of intellectual product leading humanity into an unbearable information noise where problems arise after having too much, instead of having too few information? My answer is: no. In spite of this eruption of knowledge and information, or the impression of constant inflow of “the new” that follows new media, our contributions to the global human knowledge and information base still follow a regularity recognized a time ago: the power law.
Power laws describe a regular relationship between the frequency of some event and the rank of that frequency. Take for example the number of followers a user has on Twitter. The number of followers expresses the frequency of an event, while its rank is obtained simply by ordering users on the basis of that number. Or, one can count how many Twitter users have between 1 and 100 followers, 101 and 200 followers, and so on, and than order this intervals on the basis of how many users fall into each of them. Our intuition tells us that most people will be found in the interval between having 1 and 100 followers, followed by the number of people who have between 101 and 200 followers, etc. The following graphs depicts this relationship in four datasets: the number of followers on Twitter, the number of views received by the most popular entries on (English language) Wikipedia, the number of sold copies of the best selling books in the history of publishing, and the number of unique visitors that viewed the most visited websites.

Figure 1. The power laws depicting (a) the distribution of the number of followers on Twitter (K = thousands of followers), (b) the distribution of the number of views per entry for the 995 most visited Wikipedia entries (M = millions of views), (c) the distribution of the number of sold copies for the best selling books in the history of publishing, and (d) the distribution of the number of unique visitors for 1000 most visited websites on the Internet. All x-axes represent ranks, and all y-axes represent frequencies. The sources for the datasets depicted in Figure 1 and the statistical properties of corresponding power laws are presented and discussed in note [9].
Let’s pay attention to the first graph in Figure 1. It depicts the relationship between the number of Twitter followers and the corresponding rank. Spot the lonely point in the upper left of the graph: that point informs us that most of the 11.5 users from the sample used to create the graph received the rank of 1, having less than 100 followers on Twitter. They account for around 94% of the total number of users. That means that all “interesting” Twitters – those having 101 or more followers – fall into some 6% of the total number of Twitters; those with more than 3901 followers, for example, comprise only slightly more than 0,11% of users. To illustrate the skewness of the distribution, pop singer Lady Gaga is ranked 1 with more than 6.5 million followers, followed by her colleague Britney Spears with something more than 6 million Twitters following. This should illustrate how rarely are interesting phenomena found in the huge amount of information characteristic of contemporary online developments. As one can see on another graph, the same regularity is found for the number of most visited Wikipedia entries (the most visited entry is one on the actress Brittany Murphy, with around 5 million views, followed by the entry on the “Avatar” movie which received around 4.5 millions of views). The same regularity seems to hold for the number of sold copies of the best selling books (a caveat against this claim for the bestsellers dataset: see the discussion in note [9]) and the number of unique visitors of the most visited webpages. The power law for media: only a small fraction of the total population of events comprises of cases that receive a huge fraction of reception.
The presence of power laws is ubiquitous in human societies. Their historical mention is most often linked to the name of the famous Italian economist and sociologist Vilfredo Pareto (born Wilfried Fritz Pareto) [10] who noticed that 80% of land in Italy was owned by 20% of population in 1906. The 80/20 ratio is often mentioned in relation to power laws but it is not obligatory in any way (as the Twitter example illustrates). Albert James Lotka [11, 12], a US demographer of Hungarian origin, was the first to realize that around 60% of authors in the fields of physics and chemistry published only a single paper, with the number of publications per author sharply decreasing so that only a small fraction of them publishes a large number of papers. The distribution of personal contribution to any intellectual discipline is known as Lotka’s curve and presents an instantiation of the power law for intellectual contribution. Power laws describe relationships in the distribution of wealth, income, publishing and authorship, word frequency in language, achievements in sports and human production of any other kind: they represent one of the fundamental statistical laws that govern our societies in general.
In order to provide a more complete illustration of the power law we proceed by comparing its theoretical model with the more common model of the so called bell-curve. Let’s study Figure 2. The bell-curve model (whose proper name is normal, or Gaussian distribution) describes a probability distribution where the majority of cases group around a single point - the distribution’s mean - with the probability of extreme cases – those falling far from the mean – rapidly decreasing (precisely: this decrease of probability in normal distribution is exponential, which is the fastest possible way for a variable to decrease). The Gaussian curve represents a boring world: the huge majority of events are close to the average while extreme events present only extreme rarities. At the opposite, the probability distribution that describes the power law, the Pareto distribution (it is termed Pareto in the case of continuous, and Zipf’s distribution in the case of discrete data), has the property of allocating high probabilities to only a small fraction of events and letting extreme events to appear along its “long tail” (or “heavy tail”) with enough probability left for them to occur less rarely than on the Gaussian distribution.

Figure 2. Gaussian (normal) distribution is depicted on the left, and the Pareto distribution on the right panel. The way to think about probability distributions is the following one: the area under the curve represents the probability, and this area measures 1 in total. The probability of a specific event, defined as an interval between any to points on the X-axis, is defined by a sub-area under the curve bounded by the corresponding interval (technically, it is defined by a definite integral of the probability density function). Extreme events are placed away from the mean of the symmetrical Gaussian distribution and receive extremely low probabilities. The Pareto distribution, on the other hand, “treats exceptions as a rule”: because of the allocation of probability on the long tail of the distribution, extreme events receive enough probability to occur more often than in the processes governed by the Gaussian model.
Every now and then, more often than in the boring world of the Gaussian distribution, events from the Pareto distribution arise to shake our world. A libertarian, political scientist and himself an important author Charles Murray has quantified and statistically analyzed the history of human intellectual contribution in numerous areas, and documented his research in the monumental “Human Accomplishment”, [12], a book that receives highest recommendations for anyone interested in the matter. History brought a tremendous number of musicians, with only a small fraction of them composing their own music, and only from time to time but often enough to occupy our attention it gave Bach, Mozart, Beethoven, and Wagner. A huge number of people are in photography; in the last two decades, thanks to digital photography, Adobe Photoshop, Gimp, Flickr and photoblogs, we are overwhelmed with photo production; again, the works of Cartier-Bresson, Salgado, Josef Koudelka and Robert Capa remain as highlights in this artistic endeavor. Among the thousands of tennis players who change ranks on the ATP list Roger Federer appears and wins 16 Grand Slam tournaments. In the age of modern literature, among millions of authors and titles, Borges, Kafka, Robert Musil and Hermann Broch created works which are of indispensable value for the study of literature. In the 4th century B.C., Aristotle, whose works are being (un)carefully collected, archived and systematized for centuries, is still read by thousands of scholars, two and a half millenia after his death. Google any of these or other names that contributed significantly in any area of human accomplishment, count the number of search results that you get, and you will soon be able to sketch by hand any power law of your interest easily.
The huge inflow of knowledge and information enabled by the Internet and social media will not and can not cause an informational chaos which would leave us without criteria for our choices and judgments. Will Aristotle be read in ten thousand years from now? My answer: very probably, yes. The power law, which arises as a consequence of complex interactions among people, our peer-to-peer exchanges of judgments of quality (guess the distribution of Facebook “likes” or “retweets” on Twitter), described by networks in graph theory [13], is there to rescale our attention which would otherwise scatter randomly among countless authors, contributions and events. Everything above this process summarizes the most significant contribution of the Internet to the human kind: the opportunity for all to participate, create, publish and choose.
Sources and Notes
[1] Aristotle ( Ἀριστοτέλης), a Greek philosopher, was a student of Plato and mentor of Alexander the Great. More about the systematization of Corpus Aristotelicum can be found on http://en.wikipedia.org/wiki/Corpus_Aristotelicum. It is widely considered that only around one-third of his original writings were collected during the course of history.
[2] Source: http://www.worldometers.info/books/, accessed: September 27, 2010, 12:59:54 AM.
[3] Source: http://www.bowkerinfo.com/bowker/IndustryStats2010.pdf, accessed: September 27, 2010, 03:29:31 PM.
[4] Source: http://www.worldometers.info/books/, accessed: September 28, 2010, 03:34:37 PM.
[5] Source: Internet World Stats Usage and Population Statistics, http://www.internetworldstats.com/stats.htm, accessed: September 27, 2010, 11:44:23 PM. The data present estimates of June 30, 2010.
[6] According to statistics available from http://www.facebook.com/press/info.php?statistics (June 30, 2010), more than 500 people actively use this social network while half of them log on every day; the average user has 130 friends; there are more than 900 millions of objects (pages, groups, events, etc) that users interact with; an average use creates 90 pieces of content monthly with more than 30 billions (!) content objects (links, news, blogs, notes, photo albums and similar) shared each month.
[7] Twitter is used by 190 millions monthly who tweet 65 millions tweets daily, Source: http://techcrunch.com/2010/06/08/twitter-190-million-users/, accessed: September 28, 2010, 03:51:45 PM.
[8] Pelli, D. G. & Bigelow, C. (2009). A Writing Revolution. SEEDMAGAZINE.COM, October 20, 2009. Source: http://seedmagazine.com/content/article/a_writing_revolution/, accessed: September 28, 2010, 03:54:25 PM.
[9] Datasets and the Estimation of Power Laws
(A) Twitter dataset. Source: http://www.sysomos.com/insidetwitter/, accessed September 27, 2010, around 23:00:00 PM. The sample size in this research encompassed 11.5 millions Twitter users. The data on the graph are frequencies and corresponding ranks based on categories of number of followers per user binned in step size of 100 followers and ranging from less than 100 to more than 3901 followers. The graph and the estimation of the power law rely on 36 data points after excluding the categories with more than 3501 followers (these categories received 0% of users after rounding up to two decimal places).
(B) Wikipedia dataset. Source: http://stats.grok.se/en/201009/, accessed: September 27, 2010, around 23:00:00 PM. This statistical inventory encompasses 1000 most viewed entries on English language Wikipedia. Five most frequently visited pages were excluded from the analysis because of their reference to system messages and pages that do not refer to the content of the encyclopedia. The power law was estimated from 995 observations.
(C) Bestsellers dataset. Source: http://en.wikipedia.org/wiki/List_of_best-selling_books, accessed September 27, 2010, around 23:00:00 PM. This dataset is the one of the lowest quality among the datasets presented in Figure 1; the comments in the source of this dataset frequently encompass warnings that the data are based on unreliable estimates. The graph and the estimation of the corresponding power law rely on 103 observations.
(D) Websites dataset. Source: http://www.google.com/adplanner/static/top1000/, accessed: September 27, 2010, around 23:00:00 PM. The data are based on the number of unique visitors received by 1000 most visited websites on the Internet.
(E) Estimation of power laws. For the illustrative purposes in this article the power laws fits to empirical data were performed by a standard least-squares procedure in logarithmic coordinates. However, because of the above mentioned properties of heavy-tailed phenomena, the fitting of power laws is a technically challenging task and anyone who attempts to establish them on firm ground should go for a maximum-likelihood estimation procedure. A recent paper that introduced maximum-likelihood estimates for power laws is A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data”, SIAM Review 51(4), 661-703 (2009), available online at: http://arxiv.org/abs/0706.1062. Since power laws take linear forms in logarithmic spaces, the fitting of power laws to four datasets was performed by a linear regression of log(frequency) on log(rank). The fitting procedures resulted as the following: (a) Twitter set, R2 = .967, F(1,34) = 987.96, p < .01, slope = -2.25, intercept = 6.352; (b) Wikipedia set, R2 = .997, F(1,993) = 306313.14, p < .01, slope = -.488, intercept = 6.694; (c) Bestseller set, R2 = .948, F(1,101) = 1832.768, p < .01, slope = -.758, intercept = 8.578; (d) Websites set, R2 = .997, F(1,998) = 347470.336, p < .01, slope = -.765, intercept = 8.877. The R2 values are squared correlation coefficients, bounded above by 1, and describe the strength of linear relationship among the variables. F values are used to test the statistical significance of the linear relationship, while p stands for the probability of the Type I error (and calls for prolonged explanation which we will not undertake here). The slope of the linear function describes the rate of change in one variable followed by a change in the second while the intercept represent the point where the linear function intersects with the Y-axis. Figure 3. presents the signatures of power laws (their linear forms in logarithmic coordinates). Once again, the bestsellers dataset provides only data of low quality and the estimated corresponding power law is the most suspicious one; it is questionable whether this dataset would prove to obey a power law under a maximum-likelihood estimation procedure.

Figure 3. Power laws for Twitter, Wikipedia, Bestsellers and Websites datasets plotted in logarithmic coordinates.
[10] Wikipedia entry on Vilfredo Pareto: http://en.wikipedia.org/wiki/Vilfredo_Pareto, Pareto’s principle: http://en.wikipedia.org/wiki/Pareto_principle, New School University: http://homepage.newschool.edu/het//profiles/pareto.htm, accessed: September 28, .2010, 05:15:42 PM.
[11] Wikipedia entry on Alfred James Lotka: http://en.wikipedia.org/wiki/Alfred_J._Lotka, accessed 28.9.2010, 05:17:15 PM.
[12] Murray, C. (2003). Human Accomplishment: The Pursuit of Excellence in the Arts and Sciences, 800 B.C. to 1950.HarperCollins. Wikipedia on Murray’s research: http://en.wikipedia.org/wiki/Human_Accomplishment, accessed 28.9.2010, 05:20:43 PM.
[13] Barabási, A.L. (2002). Linked: The New Science of Networks. Perseus Publishing, Cambridge, Massachusetts.