First, I want to thank everyone for your continued patience last week as we handled the backlog and began to tune our new archiving algorithms. After adding more server capacity, spinning up a few more archiving processes, and tweaking a few settings, we were able to catch up on over TWO MILLION tweets that were waiting for processing. Now we are current and usually get new tweets into the archive within approximately 30 minutes (we are just starting to measure this, so this will be variable) and ingesting 150+ tweets per second from Twitter. So hang in there as we continue to make things even more efficient.
With regards to the operations of system, we have received a couple complaints about archives “missing” tweets. One of them was very blatant and I was shocked to find out how much was missing. In most cases these “missed” tweets will be filled in over time as we have a background archiving process that is reaching back into the Twitter search case to fill in any holes from the Twitter stream (which can happen if we temporarily lose connectivity, etc). This is something we are going to continue to monitor b/c 100% accuracy is always our goal. If you see an issue, don’t hesitate to log something via Feedback (http://community.twapperkeeper.com) or contact us directly at firstname.lastname@example.org – b/c we can usually fix pretty quickly.
Finally, SPAM. As a temporary fix we have blocked the string HTTP from our notebook descriptions since spammers are usually putting URL links in the description to get the links to their sites. This is just a short term fix and we plan to incorporate a CAPTCHA as a first step of defense. In the meantime we will also keep an eye out for spam and simply delete it.