Feb 23, 2025
For the past couple months I’ve spent a few hours a week building a sort-of-clone of HaxBall in Godot. My previous gamedev experience is limited to some dabbling in ThreeJS, but so far Godot has been a joy, with an intuitive scene-tree model and a surprisingly capable gradually-typed scripting language.
The tl;dr is I have an ugly-but-functional, peer-to-peer multiplayer, 2D top-down car-soccer game playable here. Harangue your group chat into trying it out with you, or try playing against the AI on your own!
Godot is a FOSS game engine with both 2D and 3D support. As far as I’m aware, multiple 2D commercial games have successfully shipped using Godot (Case of the Golden Idol, Luck be a Landlord, Brotato), and Slay the Spire 2 is a big upcoming one.
Given that this was my first game in Godot and I was more interested in building out networked multiplayer, I’ve stuck with 2D.
To get myself acquainted, I followed along with the first 5 hours of this Godot tutorial (after which I got impatient and jumped into developing this game).
Impressively, the Godot editor is self-hosted using Godot’s own UI system. For a project as small as mine, it’s felt quite lightweight and responsive, initializing quickly and being relatively intuitively laid out. I haven’t used Unity or Unreal, so it’s hard to compare, but I imagine they’re both a bit heavier.
Godot uses a scene-tree model to organize the game world. This seems natural based on what I remember from my computer graphics class in university, and it lends itself to a familiar component-based model of development where scripts attached to parent nodes generally call functions on their children, and children emit signals that parents can register callbacks on.
The editor and GDScript together make this feel natural enough that I’m not noticing the framework too much while coding game behavior - it pretty much just gets out of your way. For prototyping, the fact that everything can be stringly-typed if you want it to means you can quickly slap some behavior together without formally codifying an interface for it, but the gradual typing system and tree structure nudge you towards relatively maintainable patterns over time.
Like many other systems programmers, I enjoyed Mike Acton’s talk on data-oriented design, and in the game dev space that’s more-or-less synonymous with ECS, especially given recent efforts around engines like Bevy.
Interestingly, Godot maintainers have themselves answered this question in a blog post. Their one-line summary is that in Godot’s model, composition of components happens at a higher level than it does in ECS — their contention is that the data-oriented optimizations belong inside the engine implementation, but that game code itself often doesn’t need to be concerned with it. Most Godot user code executes in callbacks responding to signals or input. This sounds very similar to many front-end development frameworks, so it makes sense that a tree-of-components model would be well-suited for this.
Relying on a physics engine for emergent gameplay has always seemed like a great idea to me. Humans have some intuition for playing with the physical word, so this way you can leverage a big, complicated codebase that has a clear API for you to code against that still feels conceptually small for you and players. Godot’s built-in physics engine has been more than capable enough for this simple project.
Because I want to rely on the physics engine to determine the results of player-player and player-ball collisions, I have to model the player’s cars as rigid bodies subject to engine physics rather than purely controlled by movement code responding to input. This is counter to how many games work, where as a gameplay programmer you want more fine-grained control over exactly how player characters speed up, slow down, jump, and fall.
This cashes out to needing the arrow keys to tell the engine to apply a force to the player in a given direction, and the space bar to apply “boost” in the form of an impulse in the direction the player is facing. This has required quite a bit of tweaking and the way the car moves still doesn’t feel quite natural to new players (most people seem to expect tank controls, and the car’s facing doesn’t quite rotate quickly enough to the direction travel to feel properly physical), so I’ll need to iterate on this more.
Godot being able to export to an HTML + WASM is great for plopping builds on GitHub Pages to share around. The networking-related wrappers for browser APIs worked well in my experience - I didn’t have to debug CORS issues or anything like that for WebSocket or WebRTC connections.
The stickiest parts have actually had to do with input - for example, the Clipboard API applies some checks that the user has recently interacted with the page before it allows the game to read from the clipboard, and in practice this seems to make pasting the lobby code into the connect-to-room box in my menu screen flaky. Longer term, if I’m going to stick with deploying to web, it seems like it’d be cleaner for me to avoid relying on Godot’s UI elements for text input, instead doing one of the following:
Godot has an interesting high-level set of APIs for mutiplayer which present abstractions for RPCs between instances of the “same” node synced between different peers, where any given peer might be the “authority” for some subtrees of nodes.
Again the official Godot blog presents a nice worked example for multiplayer, including spawning the same scene onto multiple clients, sending RPCs from client to server, and synchronizing properties of nodes (e.g. position, velocity) from the node authority to other clients.
One interesting caveat in my game’s case is that, because all of the relevant objects in the game are physics-engine-controlled, I can’t just naively synchronize position and velocity in the normal Godot processing callback. Instead, I have to hook into a separate _integrate_forces callback to manipulate the physics engine’s state tracked for any given body.
Here the Godot community helps out with a useful example of how to implement this.
With the help of said resources, I was quickly able to get multiplayer working for native builds using Godot’s simpler UDP networking backend, which meant I had a working baseline to compare my WebRTC setup against next.
There are several reasons to use WebRTC for a game like this:
A reasonable reply to the peer-to-peer point is - won’t it be hard to prevent cheating if there’s no authoritative server? Guarding against cheating like that would be interesting in its own right, but for the time being (especially without world-discoverable lobbies or matchmaking) I’m satisfied with taking the stance of: don’t play with people who cheat!
I was disappointed to learn that WebRTC actually depends on the existence of an intermediary server to allow for peers to negotiate connection setup, even if after connection establishment the peers can communicate directly.
In all WebRTC documentation I can find (example), this intermediary is referred to as a signaling server. Typically this part is implemented using WebSockets, so after studying some examples — including a demo Godot project — I was able to implement my own signaling server in Rust in relatively short time, and deployed it to Shuttle.
Fortunately, a signaling server is more-or-less the simplest possible WebSocket server project, akin to a basic chat room. The only functionality you need to implement is creating/joining/leaving rooms, and exchanging messages between peers in the rooms; you can treat the actual messages peers send as opaque (they contain the actual WebRTC protocol messages which we can assume the client libraries handle for us).
While reading up on ways to avoid deploying my own server, I came upon a couple interesting projects:
These both look like they’d be interesting to investigate further, but for my purposes it was actually easier and more debuggable to implement my own signaling server. Being able to run everything on localhost is huge for figuring out where the problem in the stack is.
Now that I have everything working end-to-end I could consider swapping to one of these alternatives if hosting costs ever became a concern.
Games essentially use WebRTC as a UDP transport without caring about the features that video calls use for synchronizing various streams of media. Because of this, virtually all of the complexity of WebRTC has to do with NAT traversal in order to establish between two peers that might each be behind network address translation.
(IPv4’s refusal to go away bites us again.)
Tailscale has a great post on NAT traversal that explains all the concepts without getting too much into WebRTC specifics. Essentially, in the common scenario where your users are on residential internet, it’s unlikely either of their computers are publicly-addressable — there’s probably a router in front of them that’s performing NAT. Because of this, even if you have a WebSocket signaling server allowing them to exchange messages, you don’t trivially know what the public IP:port tuples each client can be reached at are, or if any even exist.
That’s where STUN servers come in — essentially just a publicly-addressable server that you can connect to which tells you what IP address and port you’re connecting from. Then, if your client application retains the same socket that you used to connect to the STUN server, you now have a known address and port that you can pass along to other clients via your signaling server. Additionally, for the less stringent NAT variants, this process also establishes outbound UDP traffic in your router’s firewall state, which the router will remember when deciding whether to allow the inbound traffic from the other clients.
Fortunately, because the bandwidth requirements for STUN servers are low, there are many STUN servers run for free on the open internet.
In some cases, both users are behind sufficiently restricted NAT setups that neither can be reached even with the benefit of a STUN server. That’s where a TURN server comes in, which is essentially the nuclear option — you’re proxying all traffic through a server that you know both clients can reach.
In my case, my home and workplace NAT setups were sufficiently restrictive that I had to set up a TURN server in order to get my game working. Because the data path goes through a TURN server if it’s being used, this is substantially more expensive than a STUN server, so there are no free unmetered TURN servers out there (that I know of).
For now, I’ve gone with Open Relay, which has a free tier of 500 MB a month without requiring payment information. This is relatively low; for reference, after only a few short gameplay sessions over the past week or so I’m already at 37 MB.
Other options I’m considering in the future:
With all of the multiplayer in place I can settle into the real work of trying to make the game actually fun.
One thing that stood out immediately from playtesting was input lag and apparent jitter for players not hosting the game. Up until now, I’d been testing the multiplayer synchronization on localhost, which meant that I hadn’t exposed my game to actual network latency until my WebRTC setup went live.
With any given piece of game state, I need to make a determination as to which peer is the authority on what that state actually is. For example, naturally, the host (which I’ll refer to as the server, though in a p2p game this is another client) is the authority on the current score and which players are present in the room. But each client must necessarily be the authority on their own input.
Things get muddy when we ask who should be the authority on the position of each player. The physics computations are cleanest if make the server authoritative over all of them, limiting the clients’ authority to only input. I still let the physics simulation run locally for each client according to their own input, and had the clients adjust the game state according to the server’s overriding information. But in playtesting this naive approach produced some visible jankiness for players:
After observing the results from the previous section, I thought a simple fix might be to let clients be the authority for their own positions instead — I’ve already decided I don’t care about cheating (for now), after all!
This turned out to have other problems. If players are the authority for their cars’ positions — so their cars move according to their own clientside physics simulation — then this interacts poorly with the server continuing to be the authority for the ball’s position. Essentially, a player would collide with the ball, which would stop the player’s own car, and should send the ball flying in the other direction. That does happen in the player’s clientside prediction, but the server is the authority for the ball, and it’s possible given some latency for the server to observe the player’s car stop without actually observing the player’s car collide with the ball at all! This leads to some jankiness where the ball seems to stop a player’s car dead without responding at all to their car hitting it.
Similar issues happen in car-car collisions.
I think this means the client-authority approach is not a good fit when physics are involved, and I’ll have to return to the server being the authority for real game state.