Why not? Went ahead and re-implemented the auto trending topics feature so that new notebooks are created automagically. 200 MILLION tweets by next week? We’ll see…
Archive for March, 2010
In just over 1 week since the Version 2.0 go-live of Twapper Keeper our archive has grown from 50 MILLION to 100 MILLION! All we can say is WOW…
First, I want to thank everyone for your continued patience last week as we handled the backlog and began to tune our new archiving algorithms. After adding more server capacity, spinning up a few more archiving processes, and tweaking a few settings, we were able to catch up on over TWO MILLION tweets that were waiting for processing. Now we are current and usually get new tweets into the archive within approximately 30 minutes (we are just starting to measure this, so this will be variable) and ingesting 150+ tweets per second from Twitter. So hang in there as we continue to make things even more efficient.
With regards to the operations of system, we have received a couple complaints about archives “missing” tweets. One of them was very blatant and I was shocked to find out how much was missing. In most cases these “missed” tweets will be filled in over time as we have a background archiving process that is reaching back into the Twitter search case to fill in any holes from the Twitter stream (which can happen if we temporarily lose connectivity, etc). This is something we are going to continue to monitor b/c 100% accuracy is always our goal. If you see an issue, don’t hesitate to log something via Feedback (http://community.twapperkeeper.com) or contact us directly at email@example.com – b/c we can usually fix pretty quickly.
Finally, SPAM. As a temporary fix we have blocked the string HTTP from our notebook descriptions since spammers are usually putting URL links in the description to get the links to their sites. This is just a short term fix and we plan to incorporate a CAPTCHA as a first step of defense. In the meantime we will also keep an eye out for spam and simply delete it.
Yesterday, we made a huge dent in the backlog of tweets that needed to move into their appropriate archives. We expect that the backlog will be gone as of late Monday (ET).
Contact us at firstname.lastname@example.org if you have any questions!
Hey all. Thanks for bearing with us over the last few days since the V2 release as we tune our archiving algorithms and fix some of the bugs that surfaced.
With regards to the backlog, we are making progress. We are down to 1.4M records in Twitter Stream backlog, but have implemented a few more archiving nodes which should help cut further into this backlog.
However, you may not even notice a difference in your notebook because we also periodically fill in the blanks from the Search API, and well, that is running a little more efficiently right now.
Finally, SPAM. We are starting to get a good deal of spam that we are manually attacking and will implement some controls / automated routines shortly to better police this.
Thanks again for all your support!
The good news is we are starting to make progress on the backlog again, though we are still about 1.8Million records behind. If you have any issues or need assistance do not hesitate to contact as email@example.com. And thanks for understanding as we continue to catch up.
Also, we are finding some times where the @person notebooks routines are failing. We will continue to monitor this, but you may see some delays updating @person notebooks. But don’t worry, we’ll get your tweets soon enough.
2 days into the Version 2.0, and we are getting crushed by the new users (doubled yesterday), new archives (5+% increase in a single day), and a new archiving process that needs to be tuned.
As of this morning our backlog queue of tweets was growing over time (vs. getting smaller). However, we are applying more resources as I type and hope to start “keeping up” and “reducing backlog.”
The bad news is that means you may not see your tweets in your archives immediately. However, rest assured we have them (good news), we just need to put them in the right notebooks.
Bear with us as we try to scale this thing called Twapper Keeper…
So things seems to be running ok this mornining, BUT, we are behind in processing about 1Million rows of data.
The good news is we have the data and just need to go through it and put it into the right notebooks [thats the great part about now hooking directly into the Twitter Streaming API.]
The bad part is we have a million+ records that need to be slowly processed.
This may result in some data missing in the notebooks (though some will get filled in by our hybrid archiving process which is really cool that does some fancy things and searches the twitter cache… oh wait, you don’t care, you just want complete archives).
Just bear with us as we tune this thing… 🙂
No doubt we will be wrestling through a few bugs over the next few days as a result of our new release, so please post any findings to our Get Satisfaction forum at http://community.twapperkeeper.com.
Enjoy the new capabilities!
BTW, if you are an API user, the endpoints should still work [they assume you are pulling a hashtag notebook], but may give some unexpected results due to schema changes. Highly recommend moving over to the new API calls… and sorry I wasn’t able to give you a longer lead time on the changes.
Well, I know it has been a long time coming and I have been promising it for weeks – but FINALLY Version 2 of Twapper Keeper is set to release on the evening of March 16.
This is a major release for the Twapper Keeper archiving platform which will:
- Increase the scope of archiving from just #hashtags to also include keywords and @person’s timelines (so that we can hammer the shared hosts servers more)
- Introduce a hybrid approach to Twitter archiving that will tap the Twitter Streaming API as well as the Twitter Search API (to keep Twitter from banning me from always using search)
- Introduce more API capabilities (b/c more and more developers want to harness the power of our archiving engine)
- Allow for time slicing of archives when viewing (b/c hashtags get reused)
- And many more subtle improvements… (b/c users keep giving me great ideas)
Due to the major changes, we (I always say we, but it is really just me at the moment… which should change soon enough) are in the process of migrating all data over to the new platform which may yield some unexpected results – so bear with us (I mean me…)
See everyone tomorrow night with Version 2 – and watch this blog for any updates if I screw things up… 🙂