Many of you might have noticed we are still lagging in the tweets being post processed from the Twitter STREAM into the right archives.
(Or some of you may not have noticed, because this backlog can often be hidden by the fact that we are also searching the Twitter SEARCH CACHE which can fill in tweets.)
While our recent upgrade this week removed the inbound contention on ingesting the data from Twitter (now consistently ingesting 100-200 tweets/sec – which I believe may be close to 20% of the entire Twitter Stream assuming the estimated 65M tweets per day that I have heard previous), we now have a new bottleneck as we try to post process all the tweets to route them to the correct archive.
Twitter engineers have recommended a slightly different approach to matching predicates to the tweet text that I am targeting / coding releasing tonight, which hopefully should speed things up. The next step will begin to establish a distributed architecture with other servers to further offload / distribute processing.
Bear with us as we continue to catch up!
(BTW, if you need to see tweets in your archive in the short run, hit the RESET button on the archive, which will force it to go regrab tweets in the Twitter SEARCH CACHE – which will help in the short run.)