Today Facebook launched their new privacy settings and forced all their users to update their settings.
We noticed at Yahoo! that we were started to get lots of timeouts from Facebook’s APIs that we use on the Yahoo! homepage.
I’ve not heard anything official from Facebook yet, so this next part is all speculation.
I suspect that as a user updates their privacy settings, as forced to the first time they hit facebook.com now, Facebook flags that users account is dirty (something changed) and purges the cache of the users stream data.
Now that the cache is empty for a particular user, when a friend of that user views Facebook, the backend servers must go back to the master data source and confirm permissions on what is visible before it is displayed.
That is fine when you have a few users updating their permissions, if however you force ALL your users to do this, you site rapidly becomes overwhelmed, as you effectively removed your cache. This effect is known as “The Thundering Herd”.
As the load increases, due to all extra requests for data that would normally be handled by caches, you start to get race conditions as requests return and try to populate the cache. Also as the load increases, requests start to take longer as connections are kept open while backends are queried. Because connections are being kept open, the server stops handling requests, as there are no more connection slots and you start to get errors as clients can no longer connect.
How do you prevent this. Don’t push changes that require you whole userbase to update their records at once. Stage the release, starting with 1% until you can guage how much load it will generate on your servers. Then you can safely ramp up your change safely.