Behind the Scenes: How we 10x'd VTT Performance

Hi everyone! I’m Jeff, Roll20’s Software Development Director, working closely with the VTT team. I want to talk about the performance work that’s helping your maps load faster and run more smoothly during play, especially for games with a lot of tokens, maps, and drawings.

When we embarked on rebuilding the virtual tabletop, we wanted to architect a platform that could be built upon for many years to come. To accomplish this, we had a few critical goals in mind:

Enable Cool Stuff: Build an architecture that supports delightful, useful, and powerful features (now and in the future)
Empower our Developers: Standardize our code around internal and external best practices to make it easier to build new features and harder to introduce painful bugs
Enhance Performance: Dramatically improve performance for both existing and new features across the VTT experience for the widest possible audience
Rollout Smoothly: Deliver all of this in a way that allows players to migrate seamlessly, without disrupting ongoing games

To achieve this, our fourth goal led us to embark on Project Jumpgate: the “jumpstart” that made the new engine a reality. Developed as an opt-in alternative to the legacy engine, we were able to build a new system side-by-side with the old one. It worked with the same Roll20 game data but used it in a completely new engine built on Babylon.js, a powerful WebGL-based rendering engine.

That project is now complete. The vast majority of players are using the new engine, and we hope that the transition felt as smooth as possible. That said, the Virtual Tabletop is an ongoing project, and our original goals are still always on our mind.

You’ll mostly have to trust us on our first and second goals, and can look to features like the new Foreground Layer and Map Pins (not to mention many other improvements, small and large) as proof that we’ve honored them over the past year.

Goal three, performance, is different. It’s something you really notice when it isn’t working (and ideally, don’t notice at all when it is). Performance work is always continuous, but some fundamentals are especially important when implementing a rendering engine, especially when, like us: our users have the freedom to create, modify, and destroy arbitrary content at any time.

We’ve recently finished work on the last major fundamental performance tuning we knew we needed to tackle in new engine, so I want to dive into some details below. Although fairly technical, it’s also interesting!!! – and it points the way toward even more improvements down the road.

Draw Calls

When I use the word “fundamental”, Draw Calls are what I’m talking about. Our usage of the Babylon engine relies on building meshes with dynamic textures to represent everything on the tabletop: tokens, maps, doors, windows, Map Pins, and more. Each of these objects is its own mesh, with some optimizations applied (including rendering through an orthographic camera) so that, despite using a 3D engine, we’re effectively rendering a 2D view.

However, by default, each mesh sent to the GPU by Babylon requires its own draw call (sometimes several). A draw call is essentially the function where the CPU tells the GPU to draw something. Because the CPU is much slower than the massively parallel GPU, each draw call carries significant overhead. In rendering, you generally want to minimize how often the CPU gets involved.

You might imagine how this adds up. On maps with many tokens, tiled maps, doors and windows, drawings, text, and other elements… we observed upwards of 2,000-3,000 draw calls per frame. While this number might be okay for certain applications, we’re limited in our ability to optimize based on assumptions due tohow customizable every single aspect of our canvas is. On weaker or older devices, especially within a busy application like Roll20, the CPU can already be heavily taxed. So, we knew we wanted to reduce draw calls, their overhead, or both. Ultimately, we landed on a common optimization that uses instancing and a texture atlas to batch those draw calls.

Instancing

Mesh Instancing allows you to take a single mesh and tell the GPU to draw it many times with different parameters. The parameters can vary, enabling the GPU to render many seemingly different objects at once, leveraging its parallelism.

In our case, nearly everything on the tabletop uses the same mesh, differing in position, size, and texture. (The Texture is the actual art that you’re uploading, or the text, or the door icon, etc.) Instancing requires that instances share a single texture, but you can control which part of the texture is displayed by adjusting UV coordinates. This leads us to the creation of a texture atlas, a large texture that contains many smaller textures and metadata to store where and how those textures are aligned.

For every game object we render, we send the token mesh to the GPU with instance parameters that specify where the object’s texture appears in the atlas. The result is that all tokens on a map can often be rendered with a single draw call, instead of thousands of CPU-driven calls per frame!

Complexities

Since we can’t specify draw order, we batch tokens, text, and other objects into groups based on the z-order defined on the Tabletop by GMs. We also need to account for GPU texture size limits; you can’t always fit every single token into a single texture atlas, so those have to be batched and grouped as well.

When tokens resize or require a different quality level (directly or via zooming), they need to be removed and re-added to the atlas (or a new one). A lot of behind the scenes shuffling is taking place to ensure everything works smoothly. When all is said and done, we see the number of draw calls drop from 3000 to under 100. On moderate-to-large pages, this can translate into performance gains up to 10x (or more), depending on hardware and other factors.

In particular, maps with many small tokens like Tomb of Annihilation’s Player Map of Chult (which uses individual hex cover to reveal the map piece by piece) perform dramatically better, surpassing even the HTML5 engine of yesteryear.

All of that work, dynamically building and managing texture atlases, is done and enabled in your games (unless you are using the legacy engine). Because we need to build a texture atlas before rendering anything, it might take a second or two to load maps with lots of tokens, but on average GPU memory usage should still be slighly lower due to reduced overhead. And once you’re playing, the experience should feel faster and smoother.

We’d love to hear how this performs for you… especially if you notice improvements, but just as importantly: if anything feels off, or performance is worse than expected. Let us know right away, and we’ll not only investigate, but squash any pesky bugs to get you back in your games running smoothly!

What’s Next?

As I’ve mentioned frequently, performance is an ongoing effort, and we’ll continue looking for ways to improve over time. That said, we feel really good about where we are today. With a powerful engine and a solid foundation in place, the team is excited to help GMs continue to deliver awesome experiences for their players.

The Map Pins Open Beta is still underway, and the team is actively responding to feedback as we work toward the full launch. Report any issues that arise during your games so our team can address them, and let us know if you have specific thoughts about the direction you’d like to see Map Pins continue to develop (outside of the list above). Feel free to share on our Forums, on social (X/Twitter, Bluesky, Instagram), in our Discord Server, or via our direct feedback form.

Forums

Discord

Feedback Form

As for what comes after that… you’ll have to wait and see for now! We have a really powerful platform to build on, and we can’t wait to show you what we’re cooking, so stay tuned!

Behind the Scenes: How we 10x’d VTT Performance

Jeff Lamb

Draw Calls

Instancing

Complexities

What’s Next?

Share:

Discover more from Roll20 Blog