During our ongoing efforts to modernize the infrastructure Roll20 platform runs on, we implemented a few changes to our backend that caused cascading failures. Basically, we pulled the thread in our sweater a little too far. This resulted in significant downtime and absolutely unacceptable impacts to some of your games. If you’ve used Roll20 in recent weeks, you may have experienced character sheet issues, customs sheet sandbox failures, and general network interruptions.
We hate that. We hate knowing we caused it. It sucks. And we are sorry.
We have been working feverishly trying to implement solutions that continue to push our infrastructure forward while we regain a state of normalcy. Our top priority is to provide players with a stable platform on which to connect and create amazing stories. Upgrading our infrastructure is a huge part of achieving that goal– every weekend, every day, every game session.
Next week the Roll20 Development team will be giving you a breakdown on the issues we’ve experienced, the steps they’ve taken to restore stability, and their plans for the future. We will be sharing that information on Twitter and here on the Roll20 blog.
Over the course of this week, we have rolled out changes to all known causes of service instability. Most recently we isolated and upgraded the main service that routes all of the traffic to the Roll20 platform. This relieves a bottleneck and improves monitoring transparency. We will continue to have engineers and our customer support and community management teams on-call throughout the weekend actively monitoring the infrastructure.
We know playing on Roll20 has been frustrating lately. Thank you for sticking with us as we continue to try and improve our platform to handle the enormous growth of the community over the last year. Stay tuned for more updates next week.