Archive for June, 2010

We are seeing some database issues occurring…

June 29, 2010

Need to take a small outage to fix.

Advertisements

Filter Archive Based Upon User and Tweet Text

June 29, 2010

As of this morning, you now have the ability to filter your view of an archive based upon who sent the tweet and tweet text.

(This enhancement aligns with enhancement UI-2, UI-1, UI-4 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs)

Outages over the last 24 hours [UPDATED 06/24 @ 10:30 pm ET]

June 24, 2010

As many of you probably noticed, TwapperKeeper has been tough to reach over the last 24 hours.

At first we thought it was a network issue, but after further investigation I believe it is based upon a minor change we made to the apache config file over the weekend which was causing Apache to load up and crash.  (THINK: too much memory allocated to PHP, causing excessive loading, leading to crashes).

Things have been reverted back as of this morning, and we will continue to monitor the site to ensure things continue to run smoother.

One thing to note, even though the site was tough to reach, the archiving processes have continued to run.  We have, however, slowed the post-processing processes which will result in tweets taking a little longer than normal to be realized in their archive.  (This was done during the troubleshooting process and additional jobs will be started shortly to play catch up more quickly.)

[UPDATE 06/24 – 10:30 PM ET] I believe we may have found the issue.  Looks like a code update pushed to production mid-week resulted in many archive queries being run unconstrained, resulting in heavy load in our middle / db tier – and causing things to freewheel “out of control”.  Will continue to monitor through the night.  Archiving processes are running but there is a very large queue of records waiting to be processed.

Filter RTs when Viewing Archive

June 23, 2010

As of this morning, you now have the ability to remove “Retweets” when viewing the TwapperKeeper archives.  The “No RT” flag also is carried through to the permalink so that you can bookmark / send the link to others.

(This enhancement aligns with enhancement UI-6 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs)

“Tag and Find” Enhancement Live

June 23, 2010

As of this morning, the tag and find enhancement is live.

Now all archives can been tagged when first created.  In addition, the owner of the archive can edit the tags at any time (as well as the description, which has been a much requested feature).

Since we did not require login until recently, all archives have been updated so that Twitter screen name that was provided at the time of creation is used as the owner.  If a non-existent Twitter ID was used or the screen name of the user has been changed, those archives will technically have no owner.

If anyone sees any problems or has any questions, please let us know!

(This enhancement aligns with enhancement UI-8 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs)

API Update : Archive / Notebook Create

June 22, 2010

As we continue to implement OAuth and battle SPAM we are making an important update to the /notebook/create API.

In addition to the CREATED_BY field, we also ask developers to include the numeric USER_ID of the user who is making the call.  This should be a valid user screen name / user id combo.

This will help us in the future to control edits on archives, as well as attack spam issues when the CREATED_BY (screen_name) and USER_ID don’t match.

Update also reflected at http://twapperkeeper.com/api.php

create a notebook

http://api.twapperkeeper.com/notebook/create/?apikey=xxx&name=abcdefg&type=hashtag&description=This is a test.&created_by=jobrieniii&user_id=1234
POST arguments
apikey [required], type (hashtag, keyword, person) [required], name [required], created_by [required], user_id [required], description [required]

example – create hashtag notebook abcdefg

http://api.twapperkeeper.com/notebook/create/
$_POST or $_GET -> apikey=xxx&name=abcdefg&type=hashtag&description=This is a test.&created_by=@jobrieniii&user_id=1234

{“status”:1,”message”:”Notebook created successfully”,
“response”:{“url_to_notebook”:”http:\\twapperkeeper.com\h\abcdefg”}}

Tweet Backlog Update

June 22, 2010

Over the weekend, we made huge strides with out archiving algorithm which has allowed us to catch up on most tweets (there still is a batch of tweets from June 15 that are still being processed, but most likely those tweets are already in their archives based upon our backup process that looks at the Twitter search cache).

We will continue to monitor, but the good news is we now are capturing the Twitter Streaming API with no issue, and all processing the tweets to the right archive within a matter of minutes.

Thanks for everyone’s support and for hanging in there as we grow!

@jobrieniii

Backlog of Tweets Being Processed

June 17, 2010

Many of you might have noticed we are still lagging in the tweets being post processed from the Twitter STREAM into the right archives.

(Or some of you may not have noticed, because this backlog can often be hidden by the fact that we are also searching the Twitter SEARCH CACHE which can fill in tweets.)

While our recent upgrade this week removed the inbound contention on ingesting the data from Twitter (now consistently ingesting 100-200 tweets/sec – which I believe may be close to 20% of the entire Twitter Stream assuming the estimated 65M tweets per day that I have heard previous), we now have a new bottleneck as we try to post process all the tweets to route them to the correct archive.

Twitter engineers have recommended a slightly different approach to matching predicates to the tweet text that I am targeting / coding releasing tonight, which hopefully should speed things up.  The next step will begin to establish a distributed architecture with other servers to further offload / distribute processing.

Bear with us as we continue to catch up!

(BTW, if you need to see tweets in your archive in the short run, hit the RESET button on the archive, which will force it to go regrab tweets in the Twitter SEARCH CACHE – which will help in the short run.)

“Tag and Find” Starting to Roll Out

June 17, 2010

We are starting to roll-out the capability to tag and find archives with tags.  At this time keyword / hashtag archives can now have a set of tags added to it when first created, and these tags are shown when the notebook is displayed.

However, we are finalizing the code to allow for the tags to be “edited.”

This is the first time we have allowed an archive’s metadata to be edited by users, and initially we were going to only allow the creator of the archive to edit tags (while also extending to description as well) – since now we can validate the user with OAuth.

However, we could also say that since the archives become “public” that anyone should be able to edit (with a history log of course).

Would love your thoughts on this matter… (1) should anyone be able to update the description / tags or (2) only the creator?

(This enhancement aligns with enhancement UI-8 outlined in the post Plans for Updates to Twapper Keeper Functionality and APIs)

Our archiving process evolves again…

June 15, 2010

As we continue to scale up, we continue to find slow downs in our archiving process that requires tuning.

This last week, I continued to witness contention on the inbound table that ingests the raw tweet stream from Twitter, which created a bottleneck in the stream.

To remedy this contention, as of this morning, we have stopped writing directly to the database as we ingest the stream, and instead, are now writing directly to the file system.  We then post process from the files to put the tweets into the proper archive.

Yes, I know this is what Twitter  recommended in the first place (writing to file system), but in the early days of us leveraging the Streaming API we seemed to stay pretty efficient and it was much easier to handle the data in a structured format in the DB, vs. managing files.  However, we have hit a point where that simply isn’t feasible anymore.

Throughout the next 24-48 hours I expect us to be playing catchup.  Bear with us as we continue to scale.