1 BILLION Tweets Archived!

October 8, 2010 by

Back in June 2009 when I started TwapperKeeper as a fun weekend hack, I had no idea 16 mos later I would still be running the site.

Heck, I was just having some fun with the Twitter API while my family was out of town.

However, the last 16 months has been crazy.  User demand for raw Twitter data sets has continued to increase.  Major world events were captured, exported, and analyzed.  Small businesses, major brands, market researchers, conference leaders, and academia continued to ask for more data.   And the data sets continued to grow…

And as of this morning I am proud to announce that TwapperKeeper  has passed the  1 BILLION TWEET milestone!

All I can say is WOW!

I want to thank our users for your continued support, praise, suggestions, and thanks – as your feedback helps drive us to be better.

I also want to sincerely thank JISC and UKOLN for stepping in and partnering with TwapperKeeper to help drive growth, enhancement, and sustainability of the platform.  Their support came at a critical time and has been instrumental in our ability to support the heavy growth in users and archives over the last 6 months!

So now the question is… when do we hit 10 BILLION? 🙂

Should TwapperKeeper request READ / WRITE access?

October 7, 2010 by

Recently members of the UK HE community asked “why TwapperKeeper was requesting READ / WRITE access when trying to opt out of being archived?”

Recognizing that many people are concerned with 3rd party web applications having having WRITE access to their Twitter accounts, we paused to think a little about our implementation of OAuth – and why we were asking for READ / WRITE vs. simply READ.

Stepping back in history, early users of TwapperKeeper may remember that originally we did not require logins to create archives.  This was done to allow for a frictionless way of creating archives, and minimized the need to create an account management solution or try to validate users with basic auth (pre-OAuth days).  It simply wasn’t important to know who the user was.

However, as we began to roll-out new features in partnership with JISC and UKOLN, it became more and more important to identify the user (for example, to confirm who was creating an @person archive, to confirm who is requesting to opt out of archiving, etc).

That led us to implementing an application wide OAuth login for the user – which we just happened to set to a READ / WRITE request.

However, upon review of current features, we are technically only using OAuth to identify the user (i.e. get your screen_name) – and really  never use the OAuth tokens to READ or WRITE anything on behalf of the user (honestly, we don’t even store the tokens, because we really don’t need them at this time).

Therefore, we have decided to go ahead and reduce the permission to READ at this time.  (The lowest level.)

In the future, this decision may need to be revisited as additional features are added that require WRITE access (i.e. for instance the cool features of Twitter @anywhere require READ / WRITE to be turned on) – but it is best to keep the permission at the lowest level required for operations at this time.

Announcing yourTwapperKeeper – archive your own tweets on your own server!

August 25, 2010 by

As part of our partnership with JISC, we are now releasing an open version of TwapperKeeper that is designed to run on your own server.

This will give users the power of archiving tweets using the same power of TwapperKeeper but on their own server.  It also provides many of the similar output features of TwapperKeeper including HTML, EXCEL, RSS, and JSON.

We are also planning to host yourTwapperKeeper for customers if they want their own environment (yes, even TwapperKeeper needs to make some revenue at some point!)

Finally, if you are a hacker, pick up the code, change, branch, and run with it!

Output in Excel… why not?

August 23, 2010 by

A quick lunch time hack – and now you can view your archive in Excel.

IMPORTANT NOTE:  This will only work for results of  less than roughly 50k and therefore leverages the filter criteria. If you want a full dump, we still recommend the EXPORT AND DOWNLOAD feature which allows you to download EVERYTHING!

New API to extract tweets from archives

August 5, 2010 by

Many people have contacted us about the limitation of the /notebook/tweets API because of the requirement to use start / end timestamps to extract portions of an archive and the 10,000 tweet limit per call – forcing people to write some creative hacks to ensure they don’t miss anything.

We have just released a Version 2 of the /notebook/tweets API that now allows for easy pagination through the archive as well as opens the API to accept many other parameters to further slice / dice the data.

We have also worked to ensure parameters align with the Twitter search API where possible.

Try it out and let us know if you have any issues!  (And if you need an API Key just let us know!)

tweets within notebook – VERSION 2

http://api.twapperkeeper.com/2/notebook/tweets/?apikey=xxxx&name=xxxxx&type=xxxx%5B&optional parameters]
GET arguments
apikey [required]
type (hashtag, keyword, person, collection, person-collection) [required]
name [required]
lang [optional – ISO-639-1 2 letter code included in tweet metadata]
max_id [optional – maximum twitter id]
since_id [optional – minimum twitter id]
since [optional – start date – format = YY-MM-DD]
until [optional – end date – format = YY-MM-DD]
order_by [optional – a = ascending, d = descending (default)]
nort [optional set to 1 to remove all tweets starting with RT, default = 0]
text [optional – tweet text to search for]
from_user [optional – twitter username of sender]
latitude | longitude | radius [optional – must include each parameters individually – radius in km]
rpp = results per page [optional – but default set to 10, max allowed = 1000]
page [optional – default = 1]
example w/ Version 2 – page 1 of tweets (results per page = 10) within #jisc hashtag from 2010-01-01 to 201 0-06-01 within a 1000 km radius of lat = 51, long = 0 with language = en

(This enhancement aligns with enhancement API-2 and API-4 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs.)

More RSS options added to all archive types

July 30, 2010 by

As of today, all archives (person, keywords, hashtags, and collections) now have an RSS feed available.

In addition, the RSS feeds are also now auto-discoverable and include all of the “search / filter” parameters that have been selected on the archive page giving you more options on using the RSS feeds for integration with other 3rd party services.

(This enhancement aligns with enhancement API-1 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs.)

Feed of archives as they are created

July 28, 2010 by

A new ATOM feed has been added to Twapper Keeper to allow you to track new archives as they are created.

This feed can be subscribed to from the main page and should be auto discoverable by your browser.  As we address other enhancements that touch feeds we will be making them auto discoverable on given pages as well.

(This enhancement aligns with enhancement API-3 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs.)

Opt’ing Out of Twapper Keeper Archiving

July 22, 2010 by

We often get requests from users about whether they can keep their tweets from being archived at Twapper Keeper.  With a goal of allowing you, the user, to have a choice, we are beginning to roll out a way to opt out of Twapper Keeper archiving.

If you are interested in not having your tweets archived, please head over to http://twapperkeeper.com/optout.php and accept the form.  If any questions, send us an email at support@twapperkeeper.com.

(This enhancement aligns with enhancement UI-3 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs.  We will add the opt’out link to the main site shortly after we have processed our initial set of user requests.)

Now you can merge your archives together into a collection!

July 16, 2010 by

A new feature has now went live that will allow you to merge multiple archives (up to 5) together into a single collection!

Why would you want to do this:  How about when multiple hashtags are used for a conference?  Or maybe you are doing market analysis on a company but want to aggregate multiple keywords that represent the company?  Or maybe you want to view a list of person archives in a single view?

Let us know if you see any issues.

(This enhancement aligns with enhancement UI-5 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs)

NOTE:  Will also made some minor updates to the APIs and API docs (http://twapperkeeper.com/api.php) that will allow you to query, view, and create collections.  The documentation is a little “rough” right now and will be updated when we do the planned API enhancements, so if there are any questions let us know!

We are seeing some database issues occurring…

June 29, 2010 by

Need to take a small outage to fix.