Our archiving process evolves again…


As we continue to scale up, we continue to find slow downs in our archiving process that requires tuning.

This last week, I continued to witness contention on the inbound table that ingests the raw tweet stream from Twitter, which created a bottleneck in the stream.

To remedy this contention, as of this morning, we have stopped writing directly to the database as we ingest the stream, and instead, are now writing directly to the file system.  We then post process from the files to put the tweets into the proper archive.

Yes, I know this is what Twitter  recommended in the first place (writing to file system), but in the early days of us leveraging the Streaming API we seemed to stay pretty efficient and it was much easier to handle the data in a structured format in the DB, vs. managing files.  However, we have hit a point where that simply isn’t feasible anymore.

Throughout the next 24-48 hours I expect us to be playing catchup.  Bear with us as we continue to scale.


%d bloggers like this: