Archive for the ‘Uncategorized’ Category

Outages over the last 24 hours [UPDATED 06/24 @ 10:30 pm ET]

June 24, 2010

As many of you probably noticed, TwapperKeeper has been tough to reach over the last 24 hours.

At first we thought it was a network issue, but after further investigation I believe it is based upon a minor change we made to the apache config file over the weekend which was causing Apache to load up and crash.  (THINK: too much memory allocated to PHP, causing excessive loading, leading to crashes).

Things have been reverted back as of this morning, and we will continue to monitor the site to ensure things continue to run smoother.

One thing to note, even though the site was tough to reach, the archiving processes have continued to run.  We have, however, slowed the post-processing processes which will result in tweets taking a little longer than normal to be realized in their archive.  (This was done during the troubleshooting process and additional jobs will be started shortly to play catch up more quickly.)

[UPDATE 06/24 – 10:30 PM ET] I believe we may have found the issue.  Looks like a code update pushed to production mid-week resulted in many archive queries being run unconstrained, resulting in heavy load in our middle / db tier – and causing things to freewheel “out of control”.  Will continue to monitor through the night.  Archiving processes are running but there is a very large queue of records waiting to be processed.


Tweet Backlog Update

June 22, 2010

Over the weekend, we made huge strides with out archiving algorithm which has allowed us to catch up on most tweets (there still is a batch of tweets from June 15 that are still being processed, but most likely those tweets are already in their archives based upon our backup process that looks at the Twitter search cache).

We will continue to monitor, but the good news is we now are capturing the Twitter Streaming API with no issue, and all processing the tweets to the right archive within a matter of minutes.

Thanks for everyone’s support and for hanging in there as we grow!


Backlog of Tweets Being Processed

June 17, 2010

Many of you might have noticed we are still lagging in the tweets being post processed from the Twitter STREAM into the right archives.

(Or some of you may not have noticed, because this backlog can often be hidden by the fact that we are also searching the Twitter SEARCH CACHE which can fill in tweets.)

While our recent upgrade this week removed the inbound contention on ingesting the data from Twitter (now consistently ingesting 100-200 tweets/sec – which I believe may be close to 20% of the entire Twitter Stream assuming the estimated 65M tweets per day that I have heard previous), we now have a new bottleneck as we try to post process all the tweets to route them to the correct archive.

Twitter engineers have recommended a slightly different approach to matching predicates to the tweet text that I am targeting / coding releasing tonight, which hopefully should speed things up.  The next step will begin to establish a distributed architecture with other servers to further offload / distribute processing.

Bear with us as we continue to catch up!

(BTW, if you need to see tweets in your archive in the short run, hit the RESET button on the archive, which will force it to go regrab tweets in the Twitter SEARCH CACHE – which will help in the short run.)

Our archiving process evolves again…

June 15, 2010

As we continue to scale up, we continue to find slow downs in our archiving process that requires tuning.

This last week, I continued to witness contention on the inbound table that ingests the raw tweet stream from Twitter, which created a bottleneck in the stream.

To remedy this contention, as of this morning, we have stopped writing directly to the database as we ingest the stream, and instead, are now writing directly to the file system.  We then post process from the files to put the tweets into the proper archive.

Yes, I know this is what Twitter  recommended in the first place (writing to file system), but in the early days of us leveraging the Streaming API we seemed to stay pretty efficient and it was much easier to handle the data in a structured format in the DB, vs. managing files.  However, we have hit a point where that simply isn’t feasible anymore.

Throughout the next 24-48 hours I expect us to be playing catchup.  Bear with us as we continue to scale.

Tweet Details…

June 11, 2010

For fun I added a “tweet details” link under each tweet that shows all the raw data we capture from Twitter for a given tweet.  This is the same type of data you will get when doing an export from the system.

Twapper Keeper Update

June 7, 2010

As many of you know we took an emergency outage last week to do another server migration.

I apologize for the last minute nature of the migration, but our increased loading caused our hosting company to “strongly recommend” moving to a another server after we continued to negatively impact other customers.  Therefore, we had to do the server upgrade “on the fly.”  It  wasn’t as smooth as I had hoped 😦 and I am really, really sorry.

Now to the good news…

Twapper Keeper celebrated it ONE YEAR anniversary this last weekend.

To think, one year ago I threw together a quick hack to better understand the Twitter API – honestly, just for fun.  I had no idea at the time how many people were passionate about archiving and analyzing tweets – but I quickly realized it after I continued to get feedback from users all over the world.

Together we have lived through major events, learned what works / what doesn’t, and established new partnerships to help grow the archiving platform.

Now as we go into the second year I am excited about the continued evolution of the platform supported by our partnership with UKOLN and JISC as outlined in the blog.  Stay tuned to this blog as we begin to release various new features over the coming weeks.

Year number 2 is going to be lots of fun – thanks for being a part of it!


We are doing server maintenance tonight – and will be offline

June 2, 2010

Sorry for short notice, but we are doing urgent maintenance this evening.  Be back online shortly!

[update 06/04/2010 – 6:53 am ET] Unfort our maintenance is taking longer than expected due to the high growth in the last few months and the need to also move servers (AGAIN).  We don’t have an exact ETA but are hopeful things are back online by mid-day.  As soon as we are back online we will start crawling back into the Twitter cache to catch up.  Sorry for the downtime!

[update 06/04/2010 – 2:00 pm ET] We have began the file import of data that we need to do in our server move – and we have turned on our connection with the phirehose.  Once the import is complete, then we will turn on other archiving processes.  Bear with us – this took much longer than expected.

[update 06/04/2010 – 3:15 pm ET] Core backend archiving processes have been turned back on and are running against current archives.  However, a couple key processes have to be held back until final import of data is complete – which I am expecting to be done in the next few hours [fingers crossed].  Then we will bring the front end up and flip DNS.  Sorry for all of these issues, this server moved became somewhat forced because of negative impacts we were having on other customers in the cyber center.  Ingesting 8000 tweets per minute takes its toll on a server and anything next to it…

[update 06/04/2010 – 8:15 pm ET] Processes started.  Front end web server started.  Flipping DNS now.  It will take all weekend to catch up with 1) post processing stream data that we have on file (~ 10 MILLION records) and reach back in Twitter search cache.  Thanks to everyone for your continue patience.

Why the Twitter Login buttons?

June 2, 2010

You will begin to see Twitter Login buttons begin to emerge on various Twapper Keeper pages.  This is in preparation for various enhancements that will be rolling out in the coming weeks that will require login.  Stay tuned!

TwapperKeeper is currently migrating to new servers

May 20, 2010

We are currently in the process of migrating to new servers. Sorry for short notice, but we also received notice late of when the change over would happen.

Once the system is back online, we will begin crawling back through the Twitter SEARCH cache for any tweets missed during the downtime.

New Archiving Processes Being Implemented Today

April 30, 2010

In an effort to realize tweets in their appropriate archives more efficiently, a new archiving process has been implemented as of approximately 5:00 am ET (GMT-4) this morning.  During the switch over we fell behind on processing tweets (roughly 150K) but are catching up relatively quickly.

The new process is much more efficient at putting tweets in their appropriate archives (roughly 6x faster) and should keep things running more smoothly.

However, I am also noticing many tweets in the queue that we are receiving from Twitter that are not finding a home in an archive, and I am keeping them to the side for additional investigation over the weekend.  So if you see something amiss in one of your archives, let me know.