TwapperKeeper – The Final Days. Thanks for your support.

January 5, 2012 by

First, I want to apologize to everyone out there for not keeping this blog current especially as announcements were publicized in September 2011 regarding the Hootsuite acquisition of

Things have been a little crazy following the announcement – though I realize that isn’t a good excuse.  Sorry!

As many of you are aware, on January 6, 2012 (tomorrow) the site will finally shutdown and the primary archiving features will be available via the Hootsuite Pro platform.

I want to thank everyone who has been involved with TwapperKeeper over the years, as it evolved from a weekend science project in 2009, to a grant funded effort in 2010, to a freemium service in early 2011, and finally to acquisition late 2011.

It has been a roller coaster ride of highs (grant engagement with JISC and the release of the open source yourTwapperKeeper) and lows (the Twitter ToS violations regarding exports proved to be tons of fun, and scaling challenges were a constant battle) – but more importantly a learning experience that I plan to carry forward in new projects.

I also want to personally thank Brian Kelly and the whole JISC / UK HE community for all your support over the years.   It has been a fun and challenging  – and my hope, value to the HE / archiving community.

Now onto to new and other exciting projects in 2012!



New Filtering Capability on Archives

April 9, 2011 by

Many people have requested that we allow filtering on not only begin / end dates of archives, but also by time. This has been added to the site as of this afternoon. Let us know if you have any questions or see any issues. It was a quick “afternoon” tweak… 🙂

Sometimes I jump the gun.. and sometimes I misinterpret rules… But better to push forward as fast as possible – and ask for forgiveness after the fact then to not push at all. Sorry.

March 30, 2011 by

Looks like my great idea of re-enabling the “Save as Excel” feature was a bad idea – as I just got additional clarification from Twitter that the only way I can export data is 1) if it is the actual user’s and if not 2) only the IDs.

Therefore, the “Save as Excel” will be removed tomorrow morning around 5:00 am ET.  Sorry I got everyone excited…

I have toyed with the idea of just exporting IDs and giving you guys scripts to do the lookups – but the rate limits will kill anyone trying to do that to grab all the actual tweet text.

Don’t really know what else to do at this point…!

“Save as Excel” feature has been brought back online

March 22, 2011 by

After reviewing the new Twitter ToS ( published on March 17, 2011, I have decided to bring the “Save as Excel” link back online when viewing an archive.

This will allow you to get the currently viewed content into an Excel file for review.

While it doesn’t replace the “Export and Download” feature completely (which could extract very large datasets), it will give you the ability to download smaller data sets, which is very useful for conference / chat transcripts, etc.

And remember you can change the view from different time periods to get different data sets.

Let us know if you have any questions.

Reminder: On March 20, we will be removing Export / Download and API features

March 15, 2011 by

This is just a reminder to all of our users that we will be removing all Export / Download and API features on March 20th.

This will also include the “view in Excel” feature which is basically a download of the tweets.

If you have any questions, please contact us at



A little more information about the Twitter TOS violations…

February 28, 2011 by

Our interview with WebProNews –

Removal of Export and Download / API Capabilities

February 22, 2011 by

We regret to inform our users that on March 20, 2011 we will be removing the “Export and Download” and “API” features of the website.

While we realize that these features are very important to many of our users, this change comes at the request of Twitter to bring our service into alignment with the API Terms of Service ( regarding redistribution and syndication of content.

If you require access to structured raw tweets for analysis, we highly recommend leveraging the open source version of TwapperKeeper (yourTwapperKeeper) created during our partnership with JISC.

If you have any questions, please contact us at

Help – please login to TwapperKeeper!

October 15, 2010 by

As noted yesterday, we are beginning to use the rate limits of the owner of each archive to help us increase our ability to query Twitter for tweets in our crawling process.  This crawling process is important as it also prioritizes which tweets are being watched by our streaming connection.

If you have created an archive and have not logged into the site since Oct 15, we ask that you at least login one time so we can properly capture your login information (of course, just the OAuth tokens per Twitter security)!

Your help is much appreciated!   And if you have any questions please contact us at

We need your rates and your help if you see missing tweets.

October 14, 2010 by

In the coming days we are going to be testing some new archiving processes that will leverage the rate limit of the user who created the archives, vs. using a single account which TwapperKeeper has been historically doing.

Therefore, we are going to start storing your OAuth tokens (most 3rd party apps do this, we just simply haven’t to date) and leverage your credentials to crawl your personal timeline (if you have an @person archive) or search hashtags / keywords for archives you have created.

This is in response to increased rate limiting we are seeing – and will be tested periodically for feasibility.

Also, if you see tweets missing in your archives let us know ASAP at so we can try to fix before they disappear from Twitter search.  Time is of the essence when it comes to missing tweets.

6 months later – and we are a very different TwapperKeeper

October 13, 2010 by

In this blog post, I want to take a few moments and share some of the highlights of our partnership with JISC and UKOLN and set the stage for where we are going in the future.

First, it is hard to believe that it was only 8 months ago that I attended dev8D and met with David Flanders (JISC) and Brian Kelly (UKOLN) to discuss a potential partnership – as I feel like they have been helping guide TwapperKeeper from the very beginning.

During that event we laid the plans for a JISC / UKOLN partnership and drafted a ~6 month schedule that focused on 1) stabilization, 2) capability evolution, and 3) sustainability / openness of the platform.

Ironically we released the news about the partnership on April 16, 2010 – which happened to be the same week of Twitter’s Chirp conference in which they announced that the Library of Congress (LoC) and Google would have archives available.

Initially this looked like a potential setback to the partnership, but after discussions we felt it was important to continue to press on since there was still a desire for crowd sourced tweet archiving and the fact that capabilities and access to the LoC and Google archives were unclear.  (And even as of today seems unclear.)


After announcing the partnership in April, my initial focus was on stabilizing the platform.  I had just released Version 2 of the platform a month earlier which increased the archiving capabilities from  #hashtags to also include keywords and @person timelines – and we were growing like crazy.

To get a sense of the growth during that period, from March to April the volume of tweets on file doubled from 50 million to 100 million.

Plans to implement a larger VM was set into motion and an additional VM was procured to act as a hot backup.  All was good for a short period of time.

Unfortunately, the volume of tweets continued to grow and the increasingly growing load on the host’s “shared” VMs was becoming a point of contention between the host and I, resulting in back end archiving processes being shutdown on occasion.

I’ll be honest, my plans to use VMs quickly became foolish – and over the last 6 months I have been in a ongoing battle with growth, new servers, tuning, and infrastructure changes.

As a result TwapperKeeper has made many infrastructure changes including:  migrating from a small VM to a larger VM, migrating to a a single dedicated box, and migrating to it’s current state of two dedicated boxes (with a 3rd one just around the corner).

And while this was a struggle which included many sleepless nights, I am happy to announce that now the application’s architecture has been refactored to take advantage of the multiple servers.  We can now grow horizontally across N-number of database servers which is increasingly important to support the ever growing number of archives.

Capability / Evolution

Following the partnership announcement, Brian began to solicit feedback from the Higher Education (HE) community to gain input to evolving the capability of the TwapperKeeper platform.

Enhancement requests predominantly centered around improving / standardizing the API / RSS endpoints to allow 3rd party application developers (such as Andy Powell’s Eduserv Summarizr and Martin Hawksey’s iTitle) to tap the TwapperKeeper archives and increasing the ability for end users to filter and view tweets archives and group archives together into collections.

Enhancements were rolled into production in an on-going manner during the last 6 months and continue be tweaked when issues / bug-fixes are raised by the HE community.

One request that caught us all off guard was around privacy – where users were concerned about their public tweets being archived.

The discussion that followed on user privacy rights resulted in two important enhancements being implemented including: 1) restricting @person public timeline archiving to only the person who owns the timeline and  2) allowing for users to opt’out of archiving.

The findings from this discussion were also presented at iPres 2010 in the paper Twitter Archiving Using Twapper Keeper: Technical and Policy Challenges.

Sustainability / Openness

During our initial partnership discussions, David, Brian, and I talked about open sourcing part of the platform.  Initially I was hesitant and committed to at a minimum outlining the strategy of how TwapperKeeper was archiving tweets (which includes a hybrid approach of crawling and tweet stream ingestion / processing).

As the 6 month period continued, I came to the realization that the service cannot be the only archiving platform, especially in special cases where people want quicker archiving times / etc – and I knew the right direction was releasing the code.

Therefore, I took the best pieces of TwapperKeeper and rewrote them from the ground-up into a simple self managed web application.

The yourTwapperKeeper and the code was released on August 25, 2010 and to date we have had over 100 people download the application.

As a result of the release, we caught the attention of Ross Gardler at OSS Watch who had some important advice on how best to license and manage the project to facilitate traction and growth.   We are now working with his team to ensure we have all of the appropriate licensing and governance models – so that yourTwapperKeeper can grow and continue to be used by the HE community in other projects.

Where do we go from here?

The huge growth over the last 6 months means we now have grown from 150 MILLION to 1+ BILLION tweets on file.  The server infrastructure is becoming increasingly more complex and costly.  We have to constantly battle and tune around Twitter API rate limits to try to get tweets requested by users.  Frankly, the 24/7 maintenance and operations / support is really too much for one person.

So with that said, how do we keep the primary service sustainable?

I know this has been a concern of JISC and UKOLN leadership from the beginning – and is also important now as the partersnhip winds down.

It concerns me as well – and means we have to start to monetize.

Whilst the service will continue to be free to the HE sector, new paid for services will begin be rolled out to others.

Premium services being considered include 1) sponsored archives that will have priority archiving processes and increased reach back capability,  2) charging a small fee for exports, and 3) possibly charging for access API endpoints.


In closing I want to once again thank David, Brian, and all of the HE community that participated in input, testing, bug fixing, etc.

As a team we have taken and social media archiving a step forward and have set the foundation for the future growth of the service, both in and yourTwapperKeeper – and without your help this would not have been possible.

– John