Update: The Twitter Mood Map now uses Mapbox to display mood data. Hover over each state to see information about how the color is calculated. You can also click “Show Tweets” to visualize the number of Tweets in each state and then zoom in to see information about individual Tweets.
For a long time I’ve been wanting a Raspberry Pi (a credit card sized computer for about $40) without really knowing what I wanted to do with it. There are so many possibilities and tutorials out there, but I wanted to do something that I could call my own. That’s when I decided to make a Raspberry Pi Twitter Mood Map of the United States. Of course, this concept is not entirely new, but my unique take is that I want to make it into a physical object – more work of art than tech toy.
I haven’t yet begun building the Pi Mood Map, but in preparation I’ve built a Python program (actually two Python programs) that performs the basic software requirements. I then used the results to create the dynamic live mood map you see above. Eventually the same mechanisms will be used to control a physical map that I will post about some time in the future. For now, I’ll talk about some of the Python modules I used to make this a reality and how the system is deployed as a web service.
In order to make the result seen above I needed several things: a way to 1) hook into the Twitter API in Python, 2) reverse-geocode coordinates from twitter data into the state of origin, 3) analyze the sentiment of the tweet, and 4) persist the information so that it can be averaged and displayed.
Twython: Twitter API for Python
It’s all in the title. Twython is probably the most popular Python library for accessing Twitter and for good reason. It is actively maintained and supports the latest version of Twitter’s API. You will need to register a Twitter application and have a basic grasp on OAuth (how Twitter authorizes API calls), but generally it is very easy to use and have an app up and running in no time.
I used the “filter” streaming API. This allows you to make a long-lasting http connection to the Twitter servers and provides a live stream of Tweets filtered by either user, location or keyword. For my example, I used a location filter. Locations are coordinate bounding-boxes, so for this project: one for the contiguous United States, one for Alaska and a third for Hawaii.
Geocoding is the process of taking a semantic address or place name and turning it into a pair of coordinates. Since Twitter’s API returns coordinates in its stream, what I needed to do is the opposite: turn a pair of coordinates into a state name. The most obvious and popular way to do reverse geocoding is to use the Google Maps API, which I did…initially. There are two problems with doing it this way: 1) it is a web service and therefore the time it takes to process is unpredictable. This can be especially troublesome because I am processing live information – Twitter will also terminate my connection if the processing starts to lag. 2) The Google Maps API has a request limit that I was sure to surpass when processing this amount of information.
So, I needed to find a way to perform this task locally without using a web service. My solution was to use a low resolution “shapefile” of the United States. A shapefile is a file that describes regions as polygon shapes, in this case individual states. Each state in addition to having a polygon also has a set of attributes (i.e. name, abbreviation, etc). They are widely available, but the one I ended up using was from Natural Earth. The reason I chose this source is because I was able to get a low resolution map of just states (I am not looking for high precision, but am looking for minimizing size of the program and complexity of calculation – I am eventually running this on a Rasp Pi).
Ok, now I have the shapefile – how do I use it? The first step is to get a Python module that can read shapefiles. The most popular and the one I chose is PyShp. Now that I can read the data, how do I determine which shape (state) the coordinate is in? I do this by using a fast algorithm called Point in Polygon; a Python implementation can be found on the awesome GeospatialPython website.
Sentiment analysis is a really cool up-and-coming field of study and one that I hope to learn more about before the end of this project. In order to make your own really good sentiment analyzer, you need 1) a large dataset of manually labeled data and 2) a lot of time and/or computing power to “train” your model. Neither of these things have been especially easy for me to come by, so I decided to use a tool that already exists called TextBlob. TextBlob is a full-on Python text processing library that also has the ability to do sentiment analysis. In order to make the results slightly better, I did some preprocessing on the Tweets such as removing urls and hashtags.
In order to create average “moods” over time, I needed a way to persist data for a given lag period. I decided to use a MySQL database, mostly because I already had MySQL installed in my local development environment. I may decide to change this at some point, but for now it seems to work fine. There are oodles of Python MySQL connectors out there, but the one I landed on is PyMySQL. I needed to be able to install it via Python Package Index (pip) and this one fits that requirement and is competent in every other way.
Deploying as a Web Service
For this task, I turned to my favorite Platform as a Service, Heroku. I had never hosted a Python application on Heroku, but luckily there is a very helpful guide to doing this available on the site. There are two distinct parts to the app, 1) a worker process that connects to Twitter and constantly logs tweets and 2) a web service that calculates average mood and responds to http requests. This could of course be done in a single Heroku app, but instead I made it two separate apps that share a database.
Now, all that’s left is for me to put in credentials into environment variables and we’re ready to display! For that I used a simple SVG jQuery plugin called U.S. Map , which you saw above.
After observing for some time after the completion of this portion of the project, it would seem to me that the map currently skews positive. This could be for a number of reasons. Likely, the TextBlob sentiment analyzer was trained on data like movie reviews that differ from Tweets in their syntax and grammatical correctness. It could also be that I’m often observing on nights and weekends when most people are happy…
In any case, it’s a start! All code is available on my GitHub, please let me know if you have questions or comments!