Andy Baio, application programming interfaces, Big Data, data manipulation, data visualisation, datasets, Dhiraj Murthy, DITA, Google Sheets, information architecture, JSON, markup languages, Martin Hawksey, metadata, Raffi Krikorian, social media, TAGS, Twitter
In an information era defined by an exponential growth in data output, fuelled by the connective and interactive technology possibilities provided by the Internet, there is perhaps no better exemplar of these phenomena than Twitter. Twitter is a social media website that allows its users to send short (140 characters or fewer) messages (“tweets”) that can be viewed instantly be their followers, or by other users searching for particular words or phrases. The scale of the enterprise is vast: Twitter estimates that it has 284 million active montly users, and that 500 million tweets are sent by these users every day. Despite the brevity of each message, each also contains a great deal of associated metadata, as shown by this “Map of a Tweet” (in the markup language JSON) produced by Twitter developer Raffi Krikorian.
(The most re-tweeted tweet of 2012.)
Twitter’s importance in contemporary culture can be seen in its use by political leaders the world over (a verified list includes a total of 66). The role of Twitter, and other forms of social media, in influencing world events such as the Arab Spring has also been well discussed and documented, and its sociological implications are being studied by academics such as Dhiraj Murthy. This makes it a valuable tool for scholarly research, but there is a key problem that must be overcome: Twitter’s main strengths as a communications medium—its timeliness, conciseness and immediacy—and its popularity mean that older Tweets are continually buried beneath successive waves of newer ones. The site’s own search function is notoriously limited (only extending back for one week, for example), and despite recent improvements, is still not capable of producing useful data. For instance, searching for mentions of an extremely popular hashtag (a Twitter tool used to denote key words or phrases, e.g. #citylis, City University London’s Library and Information Science course) will not return all of the possible results due to bandwidth limitations imposed on individual user queries.
However, using the technological possibilities opened up by APIs (as first mentioned in a previous post). A user can create their own Twitter app to gain access to the Twitter platform (under strict conditions), and, using the ingenious TAGS tool, developed by Martin Hawksey, which in turn uses the APIs of Twitter and Google Sheets (an online spreadsheet programme similar to Microsoft Excel) to automatically export the metadata—encoded in JSON—of selected Tweets (usually filtered by hashtag) into a database. One of my classmates, Daniel van Strien, has written more fully on the technicalities of how this is done, and I recommend that you read his blog to find out more.
These API processes result in a Google Sheets document which consists of four spreadsheets:
- A “Readme/Settings” sheet that the user manipulates to set the search parameters, and including links to data visualisation tools (more on which later).
- An “Archive” sheet, consisting of the tabulated metadata of each tweet retrieved.
- A “Summary” sheet, listing users by number of Tweets about the desired hashtag, word or phrase, and also including other basic information derived from the Archive.
- A “Dashboard” sheet, consisting of the Archive data presented in graph and chart form.
My TAGS Google Sheets document, which is used to search for the #citylis hashtag, can be viewed here.
Perhaps the most compelling results of TAGS, however, is the creation of tools to further manipulate and visualise the raw metadata. TAGS Archive creates something that looks, superficially at least, like a standard Twitter feed, but is in fact a fully-preserved archive of Tweets on the given search term that can also be filtered by screen name or tweet content, or searched for chronologically, far more effectively than by using Twitter’s own search interface. TAGS Explorer is even more visually exciting, as it creates a map of interactions between all of the different users who have used the search term.
This is particularly useful as it clearly demonstrates Twitter activity—within the chosen parameters—over time. Another of my classmates, Shermaine Waugh, tweeted this image last Monday (October 27), showing a map of tweet replies between users within #citylis:
The equivalent image from today (November 2), indicates that new links have appeared and that existing ones have been strengthened, after a further six days of Twitter activity (another advantage of using TAGS is that all tweets from the moment of the automated programme being set up are retained, not just those sent seven days prior to the most recent export of data being carried out):
TAGS Explorer also allows the user to create more complex maps by including mentions (when a Twitter user mentions another user’s screen name in one of their Tweets):
An even more intricate map can be included by also including retweets (when a user republishes another user’s tweet without modification):
Finally, every node in the network can be clicked on to view each individual Twitter user’s interactions with the wider group (this can even be animated!), in this case myself:
These examples have all been rather self-indulgent; only of use to someone encountering these tools for the first time (or perhaps a future historian researching how Twitter was integrated into LIS university courses in the early 21st century? I can dream!), but these visualisation principles can also be applied to areas of more value for scholarly research, or indeed public interest. For example, I mentioned the Arab Spring earlier, and this video shows the activity, in real-time, concerning the #jan25 hashtag at the moment of former President Hosni Mubarak’s resignation:
Another example concerns Twitter itself, specifically its use in the current “Gamergate” controversy. American technologist and blogger, Andy Baio, used a set of 316,669 tweets and their associated metadata to produce a series of data visualisations in order to demonstrate a series of illuminating conclusions.
It is clear that Twitter offers a great variety of research possibilities and I am very much looking forward to continuing on this theme in the future lectures and lab sessions that comprise this module.