I often read complaints that twitter is buggy, or broken, or malfunctioning in some way. And although I’ve never seen any of the twitter code, I have worked on large scale distributed systems my whole career, and I have really good intuition when it comes to diagnosing bugs. The vast majority of twitter problems I see are undoubtedly caused by the choice of the twitter team to use a database with “eventual consistency” instead of a traditional ACID database. And now I’m going to explain what that gobbledegook sentence means.
Let’s start with ACID. That is an acronym of other jargon, so don’t worry about what it stands for. An ACID database is the kind of database you imagine computers would use. If you put some information in there, and then you go look where you just put it, the information is there. And if you leave and come back, it’s still there. And if something went wrong you would get an error, so you can try again later or tell the user to try again later, or whatever. You know, computers acting all computery.
Your bank uses an ACID database. When they take money out of your account to put into another account, they absolutely don’t want it to suddenly show up still in your account.
The trouble with ACID databases is that you can only make them so big, and then they get too slow to use. Your bank will only ever have so many accounts, and it’s easy to segregate accounts if they need to into different databases. But imagine that your bank had 271 million users doing 5,787 transactions a second and there was no way to separate them into reasonable groups. That ACID database wouldn’t work. You can do big, or you can do fast, but you can’t do both and still have what they call “transactional integrity” where things act, you know, computery.
Twitter has 271 million active users doing 5,787 tweets a second. And, in fact, they are doing a lot more than 5,787 transactions a second because they are also starring and following and deleting and listing and searching and on and on… So to make a system like twitter (or facebook or any other internet-scale service) you have to give up ACID. You have no choice. Transactional integrity simply is not possible at that kind of scale.
The alternative to ACID is BASE, a contrived acronym of jargon that is also not worth decoding. The E in BASE stands for “Eventually Consistent.” It means that when you send data to the database, it’ll eventually be there when you go look for it. But if you go looking right away, it probably won’t be there yet. And if something goes wrong and it never gets there, you’ll never get an error message. And “Eventually Consistent” doesn’t mean “Eventually Right.” It just means that all the different copies of the database will eventually have the same thing in them. Not necessarily the right thing. Just the same thing. (In theory, they are supposed to eventually all have the last thing you wrote to that spot, but in practice shit happens and the “last written” data is just lost.)
So let’s look at some of the weird shit that happens when you use twitter:
I tweeted from my phone, but I don’t see it on the tablet. But it’s on my phone. Wait, there it is on the tablet. That was weird.
Classic eventual consistency bug. The tablet was connected to a database copy that didn’t have the tweet yet. In fact, the phone probably was, too. The app almost certainly keeps a copy of the tweet, and shows it to you even if twitter’s database isn’t showing it yet. It is imperative when you build an app on an eventually consistent database that you lie to the user about what’s in the database, because otherwise the user will freak out.
The favorites count doesn’t match the list of people who starred my tweet.
This can happen because one of the people who starred it is a private account, so you can’t see them, but their star still counts. But more often it happens because the number of stars on a tweet is just an estimate. Although you can look at the list of people who starred it and count them, the server can’t. Well it could, but it is much faster to just keep track of the stars by adding/subtracting from a counter whenever one is added or removed. The trouble with that is that if a bunch of people are starring/unstarring at the same time all over the world, you are pretty much guaranteed to end up with a wrong count in there. Recall that eventually consistent doesn’t mean eventually right. This is one of those cases. Eventually all the databases will agree on the number of stars, but they almost never agree on the actual right number.
My follower count is bouncing all over the place.
This is a lot like the star-counting problem. When twitter suspends people or they unfollow the count goes down, and when suspended accounts are reactivated or people follow you, the count goes up. But unlike stars, people really want to know their actual follower count, so periodically twitter undoubtedly actually does go through and count them and write an actual number in there. Until then, it’s just an estimate, and if there is any activity at all going on it’s an estimate that will only eventually be consistent. Hence, each time your phone asks for an update, it might be getting the answer from a different database, and that database might not be consistent with the other databases yet. Hence the bouncing.
My deleted tweet keeps coming back.
It’s clear from the behavior of twitter that some things are given a higher priority at propagation from database to database than other things. So “Eventually” is really fast for some stuff, but takes longer for other stuff. New tweets are very high priority. Deleted tweets less so. Account suspension/deactivation is very low priority. (Refreshing a suspended account is a downright bizarre thing to watch.) I’ve never seen a deleted tweet end up not deleted. But I have seen them stick around for as long as an hour. Again, this is something your phone app will hide from you. It knows you deleted the tweet, so it’ll probably hide it when it shows back up.
So, back to my tweet. If your star doesn’t stick, it could be an eventual consistency bug. The star is there in some versions of the database but not in others. However, it also could be that you are “twitter jail” or as twitter puts it “you are rate limited.” Rate limited means that the twitter servers have decided that you are doing too much stuff, and so you need to cool it for a while. This happened to me on Monday, but I hadn’t been starring all that much. So let’s guess why they thought I should be rate-limited.
To know how many things you have done in the last hour, twitter needs to keep a counter. But we’ve already established that counters are problematic. Like star counts and follower counts, these transaction counts are only an estimate. So my bet is that sometimes that estimate gets wildly out of whack. And even though eventual consistency would fix it, the bad user trigger doesn’t wait for this and boom, you’re in jail.
Anyway, that’s my guess, and I’m almost always right about this stuff.