Back Up Your Twitter Posts February 1, 2010

One of the things that bugs me about Twitter is the inability to efficiently search through the tweets that you have made and the tweets that others have made. There have been countless times where I am thinking “I know I tweeted about this a month or two ago, where is it.”

I have decided to solve this problem myself and start to backup my own twitter account and the accounts of people who I care about. I grab all the tweets of a person using the twython twitter module for Python that I can and store them in a MySQL database for me to do various manipulations on and what not. I also wrote a simple PHP script to query my database and display all the information in a “nice” way. Instead of breaking the tweets up into pages of 20 each, I’d rather see all the tweets at the same time.

If you are interested in looking more into this project, here is my GitHub account where the project is located: http://github.com/geoffhotchkiss/TwitterBackup . I still have to add the Python file to setup the tables that I use to store the information and give the PHP files I wrote to display the tweets of everyone. Obviously my method isn’t for everyone since not everyone runs their own MySQL server and webserver, but I figure that those people who do would be interested in something like this.

There are some improvements I would like to make, like breaking the “getting a new user” and “updating current users” sections into different functions and asking the user what they would like to do. I should also reverse the order in which I am getting tweets. In my current implementation, I am retrieving the most recent tweets to the earliest tweet. This is a problem because sometimes the API doesn’t like to work or something funky goes on with the authentication and thus the program stops getting tweets. If I go from earliest to most recent, if there’s a problem, I will still be able to get the rest of the tweets that I missed the first time around. I also need to add a “sleep until I can ask for more” feature so that if you are near your allocated API calls,  the program can wait until you have more to continue running.

It would also be a good idea to request for the 20,000 API request limit since only 150 API calls to backup everyone’s tweets can be too low sometimes.

Leave a Reply