AA #5a: Measuring User Influence in Twitter: The Million Follower Fallacy

25 Apr

The final article analysis measures user influence in Twitter. I thought that this paper would be particularly interesting given that we have been using twittedr over the course of the past semester, and thus we have (most likely if not definitely) notice that there are users who will follow us on a whim, follow us, then drop off of Twitter, follow then leave the listing, or could not be less interested. This paper goes into understanding how influential an individual can be by quantifying the behaviors of users.

Twitter data that was used measured three primary factors: indegree, retweets, and mentions. The researchers were allowed to gather data from the Twitter website at scale, and the Twitter API was also used to gather information about a user’s social links and retweets. The study focus on 6,189,636 users across Twitter who were steadily active, had valid user names, and generated an outside network of 52 million users.

To commence the study, users were assigned a rank value using Spearman’s rank correlation coefficient. This was used to measure strength of association between two data sets using multiple influence measures. These same measures were used to determine if the influence would vary based on topic genres. This second part was done by using popular topics that exploded in pop culture in 2009: the Iranian election, outbreak of H1N1, and the death of Michael Jackson. These were chosen since, while popular, they span the social, political, and health genres. Relevance within the tweets was established using the same keywords that were associated with stories on popular news sites. Using this first distribution, it was established that over 40% of the users within this population at least knew of one of the three topics.

Within these 3 genres, it was found that less than 2% of users discussed all three topics. However, for the purposes of this study, that yielded 13,219 users, which was more than enough for statistical analysis. It was then studied how influence was measured across a given topic, and how a user’s influence was determined by the three factors listed previously. It was found that retweets were significantly more likely to hold influence over time, while mentions started to lose significance as time passed. Essentially, as long as the news was still circulating it was significant. The mention of the individual who originally posted it became somewhat of a passing thought as time elapsed. The concluding remarks establish that influence is not gained spontaneously or accidentally, but through very concentrated efforts. Influence is most likely gained through great personal involvement. Lastly, topics play a role in what is and isn’t influential, as more ‘important topics’ (Iran presidency) are more likely to circulate than those that have little impact (H1N1 in US) to no impact (Jackson death).

I should start my reflection by stating this paper was extremely difficult to read. The verbage used, the extensive number of graphs, and the discussions made me consider suicide. It’s a damn hard paper. I can appreciate the results and information that is being passed along, but couldn’t it be simplified just a little more?! That said, some of the results were extremely interesting, and I can’t much knock them for such a large population. However, it makes me wonder about some of the users that got knocked out for factors that might be intentional, such as an invalid user name. I’ve seen a lot of ‘silly’ or ‘invalid’ user names that actually contribute a great deal of significant conversation, especially in the political debate world. Granted, it can be realtively easy to pull out which users are spam bots. That said, I’ve also found a few ‘experts’ who somewhat function as their own spam bots, as they post the same link on a more than regular basis, and break it up by including a link of them eating ice cream every 10 links or so. The selection process makes me slightly nervous-at what point can you argue that this is oversaturation of a population? I also think it would have nice (though much more work and much more painful reading) to see a broader range of topics, or a more unique time frame. That isn’t to say that the Iranian election isn’t interesting, but there can be more ways to break down that pool of data-before, during, after-or even going as far as establishing what was being retweeted. Again, I’m aware that it’s a lot of extra work. Overall, I think this paper does very well in shedding light on how influence is established across a constantly changing media.

-Vanessa B!

Leave a comment

Posted by on April 25, 2011 in Uncategorized


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: