Hacker News new | past | comments | ask | show | jobs | submit login
How to Animate Multiplayer Cursors (liveblocks.io)
235 points by stevenfabre on July 5, 2022 | hide | past | favorite | 61 comments



There's another approach that might work better.

Instead of sampling the mouse position every 100ms, you'd save off all the mouse positions, and then send the latest batch every 100ms. The other side would then replay the exact positions, just delayed by 100ms. It'll end up with the same latency as these motion smoothing approaches, while only using slightly more bandwidth.


Yeah, wanted to say the same thing. If latency is the only problem, just batch up the updates and replay on the other end. If bandwidth also becomes a problem, only then start compressing the data. But to be honest, if we can stream video over the internet, we surely have enough bandwidth to stream cursor positions.


Good point, it's also important to consider this in the context of the application that will use those multiplayer cursors. It's important that the state of the document matches the cursors, so it makes sense to have both presence (cursors, selection) and storage (document data) be perfectly in sync even if that means having a slight 80-200ms delay.


I think this is one of those situations where “just because we can doesn’t mean we should” applies.

The marginal benefit of exact cursor positions is so low that sending all that data still feels like a waste.


Could also have a positive impact on battery life as it would let the WiFi/cellular radio sleep longer.


What about a mix:

Every 100ms you fit the best Beizer curve to the last batch of mouse positions and send that.

It seems like that would give a more precise reconstruction than fitting a Beizer curve only to one sample every 100ms on the server.


Agreed. This would be only slightly annoying if you're talking to the person. But 100ms or even 200ms is an acceptable latency, especially if it's constant. This solution is not just simpler, but it's also more efficient as you can bundle the data up efficiently, and include state changes as well.

There is a lot of prior art in this space, btw. I remember Meteor.js having a great real-time demo over websockets that actually used predictive techniques to keep things (imperfectly, but still impressively) 0 latency.


Honestly, I don't see why you'd need to batch it ever 100ms. Sure, you don't want to send a mouse movement every time an event triggers, but surely 30fps looks smooth and won't overload the system.


You can't reliably send, fail, retry, and confirm receipt of a TCP packet in 33ms over arbitrary Internet connections. 8ms is right out. I can get 200us over a local EtherCAT realtime industrial IO network, but that's with careful management of well-isolated single-machine network conditions and it just doesn't work with a cellular modem for download and oversubscribed residential cable for upload. Assuming latency of 120ms (as used in the defaults in the linked tutorial) is much more realistic.

And you also can't set up your system with a 5 second delay to send data every 5 seconds, because any jitter will result in hiccups.

You could set up your system to send out the new data each time the previous buffer is acknowledged, but that's kind of pointless, if you get lucky with a good connection and can send data to be rendered 4.970 to 5.000 seconds from now, what's the difference for the user between doing that versus reducing the network load by approximately a factor of 3 and waiting until you have data for 4.900 to 5.000 seconds?

I think 100ms is a reasonable minimum batch size.


The biggest problem is TCP. Ping times for multiplayer games on decent Internet connections can be on the order of 10-20ms or even lower.


You're confusing latency with throughput.

I don't need to reliably send the TCP packets in 33 ms. I just need to be able to start sending a packet every 33ms. Assuming sending, acking, etc is non blocking and uses few enough resources asynchronously it's fine.


What we need is UDP websockets! Isn’t that what WebRTC is though?


Just for comparison, the default USB mouse polling rate is 125 Hz, so 8 ms. If that is too often, 16 or 32 ms would make sense, which is close to 60 or 30 Hz/fps, respectively.


Am I missing how that would work? That's an exponential increase in bandwidth needed to send all the mouse positions. 1 position per 100ms per player vs 10-30 per 100 ms per player. Those positions all have to be propagated to other players. so in the first case, 10 players = 10 positions per 100ms. In the 2nd case is 100-300 positions per 100ms.


Yeah but stop and think about how little bandwidth it still is.

X and Y can easily be 2 bytes each, 4 bytes total. 100 samples per second is a mere 400 bytes per second. You could do it from a dialup modem from the early 90s!


You could, but I think for most web applications the authors wouldn’t think about binary encoding. So you’d end up with something like:

  {“x”:50,”y”:56}
Encoded as a UTF8 string, which is 15 bytes, x100 is 1.5kb/second/participant.

Ok, I guess that’s still not that much.


So you implement spline logic rather than a byte stream? Doesn't make sense to me.


That’ll get compressed a bit too.


Well spotted, and in reality it would more likely be 4096x4096 plane, encoded as 12 bits + 12 bits = 3 bytes. And probably 30 FPS giving 90 B/s. So the only problem is how often you want to send those packets, but the bandwidth of the cursor data becomes completely irrelevant.


60 FPS = 60 F/s = 6 F/100 ms - so where did you get 10-30? That would mean 90-270 FPS, you don't need that much for a cursor.


I like this solution but would latency create issues? You'd only be sending 100ms worth of motion every ~120ms. Would you just drift 200ms behind with every passing second (20ms behind after each 100ms batch)? I think I may be missing something though.


You actually intentionally start with more playback delay with this method and sync playback with the other user intentionally 300ms behind (assuming a 100ms latency)

So the remote user packets up 100ms of mouse movement with timestamps, sends it with ~100ms latency. Your side now has a buffer of ~100ms to start playing the positions back.

This also removes all jitter in the playback from varying latency (up to the point the jitter stays under 100ms).

All of the above numbers are made up for this example. You can adjust the playback delay as much as needed for smooth playback.


Thanks, that makes sense!


You've added a crap ton of data though. The point is to show someone's movement, not every pixel the hovered.

Sampling X amount per second is enough, and send every clicking position in between as those are valuable information. The rest is noise.


Not quite the same thing, but the author (and others) might be interested in some modern techniques used in the game industry to filter inputs:

https://twitter.com/evil_arev/status/1128062338156900353

https://www.shadertoy.com/view/NlcXWM


The animations on the blog itself are incredible. Would love to read something in the future about how you guys put this together (i.e. what frameworks, components, tooling is used to generate these blogs).


That would be pretty meta! "how to create how to interactive articles" :)


I'm super interested in this. I have a few interactive articles I'd like to make, but I'm not sure what tech to use for it to make it relatively quickly and without taxing the page too much.

Was thinking maybe the Phaser game framework, but that might be too heavy, especially if there needs to be 7 or 8 of them on the same page.


We build our interactive visuals with React and embed those into the mdx file for the blog post.

I made note for us to write a blog post about it. Stay tuned :)


Already learned something. Wasn't aware of mdx and it looks right up my alley, thanks for mentioning it. I look forward to your blog post!



I think it's a mistake to just drop most of the data and then try to rebuild the lost data afterwards.

Nothing requires you send just a set of points across. A better approach is to transform and compress the data BEFORE sending it across the wire. This way you get the benefit of using all the available data to create a more accurate simplification.

For example, each update take your cursor point set and construct a bezier curve that best fits the data.


Nobody will ever see this response since the article was posted 3 days ago but:

Another advantage of doing it this way is that the sender can downsample on large/gross movements that would render faithfully as splines with fewer points, and send more samples on tight movements.

If you're not sending uniformly-spaced (in time) samples, though, you'll want some kind of timing information encoded as well.


This is analogous to player position in any multiplayer game.

One additional parameter to keep in mind is how real-time does your simulation of remote players need to be? If you don't need real-time positioning there's a whole other dimension of shortcuts you can take by introducing what amounts to a 'streaming delay'.

In a lot of these apps the cursors can't interact with each other so you have no need for real-time positioning and its accompanying smoothing techniques. Cheat cheat cheat! That's how games get their performance, way more often than being smart they're clever opportunistic cheaters!


> This is analogous to player position in any multiplayer game.

Not sure. Multiplayer games are easier to predict player movement. Once someone starts moving forward, you can predict that they'll move forward for a little while, start turning, continue forward and so on. Add in physics (like the motions/movements of a car, or the running of a human) and there will be constraints the player can't break (when you stop moving forward, you'll move forward slower and slower until you stop, maybe over 100ms or so)

But mouse movement is highly erratic. It'll be short of impossible to add any sort of prediction, as it'll be incorrect most of the time, instead of being mostly correct but sometimes not.


The real fun starts when online players start shooting at each other. How to decide if a hit was fair AND make it look believable for both parties to minimize outrage?


> AND make it look believable for both parties to minimize outrage?

Call of Duty MW2 used to replay what your opponent saw, after your death. That always made an inexplicable death much easier to swallow.

Of course, they probably had to add that feature since they did away with dedicated servers, but it was still nice.


It's very common for games to deal with erratic input from keyboard controls. Consider player position in a moba or fps where two humans are responding to each others' presence (footsie.)

The penalty for a wrong prediction is rubber banding, so if you can get away with streaming their past at high latency, I would avoid predicting at all.

That's the model often used by async PvE games (think social games, invest/express, etc) and I think that approach would map best onto SaaS presence features.


Have you ever played a multiplayer game? Have you seen how people move in FPS games? Highly erratic, especially during gun fights. Instantaneous starting and stopping.

Your example is only relevant if the player just loves to hold down the "move forward" key for long periods of time.


Here's a series of articles about client-server game architecture which I found tremendously interesting, and it's a bit more involved than what's explained in this article on cursor positions though I agree the basics are similar. It starts out explaining a naive approach, identifying the issues with that, and moving to more advanced solutions which involve prediction. The article also has a good explanation that goes a bit more in-depth to how the latency causes issues with the gameplay.

https://www.gabrielgambetta.com/client-server-game-architect...


The spline stuff is super useful to me for another non-cursor idea I've had simmering.

But I'm torn by this. The spline approach seems to look the most accurate. But when all three approaches are shown at the same time at the end, I think the spring animation might look more visually pleasing. But then, if the spring approach is only degrees better than CSS transitions, is it really worth all the extra code?


You're comparing the end-path, but the CSS with easing just isn't smooth, because the easing abruptly stops when the target changes. Without easing you still have abrupt changes in cursor speed. In the end, as calculated above, you only need 90 bytes per second for 30 FPS cursor movement, so why bother... Maybe in your case there's more objects and then it makes sense to approximate, but then if you can deal with some CPU usage for that, you can build a detailed bezier curve, and then unsubdivide it where it wouldn't change the overall shape much (probably detectable with segment length). This way the approximation is done on the sender that has the entire data, rather than on the receiver.


Can we just take a moment to appreciate how well done this article is.

It's like the AAA studio equivalent of a tech article.


I was hoping that this article would go into deeper techniques than just interpolation and splines.

I would be cool to see an example of how a Kalman filter approach would compare in terms of precision and latency. My expectation is that it would be the best of both worlds.


Or even a physics simulation—if the mouse is moving with a given velocity, that velocity won't change super fast. So even if your data is a bit behind, you can use physics to estimate where they probably are now. And if you get new data, and your estimation is too far off, then move the mouse to the right spot. If it's mostly correct, just base your future simulation off of the new information, so that it moves smoothly towards the correct value.


Another potential approach could be using the Web Animations API with an additive/relative animation approach via the `composite: 'add'` option. I say potential as it really only fully works in Chrome atm — Firefox has it but has weird rendering bugs and Safari says it has experimental support but I've never managed to get it working.

I like the potential of this approach as it lets you get smoother results than just CSS transitions yet doesn't require you to use a RAF loop-based animation library.


The interactive animations in this post are mind-blowingly good!!!


I opened the article just to check and you're right. This is A+, top-tier stuff. Sets a standard for many other technical articles.


Loved the deep dive into the difference between timing functions and spring/spline animations (ctrl-f comparison to jump to the animation).


It depends on the app, but in some cases you should also include content identifier to the usual X,Y coordinates, as from the user perspective it is more important to see the cursor hovering on top of something instead of being "very close". The meaning is different in such case.


Exactly! With Liveblocks, you can also store anything presence related. You can for instance store a selectedId and use that on the other side to highlight the selected element.


Would be interested to hear a signal processing perspective on this problem. I feel like splines might be susceptible to creating false information (e.g. ringing/overshoot), but I don't know enough about it.


I always felt like using a CSS ease transition between points would be enough. Love the spline approach with perfect-cursors, cursors really feel like they're being moved by real people.


I love this article! Great analysis and solutions. It's nice when user interface designers care enough to think through, implement, try out, and refine such important details.

I developed a multi player version of SimCity for X11 that I released in 1993 and demonstrated at the InterCHI '93 Interactive Experience, which showed you other player's cursors moving around and editing the map, but of course it required a fast network to run on and updated all the clients synchronously, due to the limitations of X-Windows, so there were no interpolation issues. (X-Windows clients aren't even capable of performing local computation and feedback the way NeWS clients could and web browser clients now can, so it was a moot point.)

https://www.donhopkins.com/home/catalog/simcity/simcity-anno...

https://www.donhopkins.com/home/catalog/simcity/simcitynet.h...

The multi player demo showing different kinds of voting dialogs, multiple cursors, the tool palette and pie menus, and the voting "yes" shortcut of building the same thing in the same place, starts at 5m45s:

https://www.youtube.com/watch?v=_fVl4dGwUrA&t=5m45s

One interesting thing about the SimCity cursor was that it was color and shape coded to show which tool was selected. The tool palette (and also the pie menus which had the same icons and layout as the tool palette) served as a legend for the cursor by showing the same color coded outline around the icons as the cursor used. So you could tell which tool other users had selected. You could hide the tool palette to make the map bigger, and use the pie menus instead, which were much more efficient.

Multi Player SimCityNet for X11 on Linux: Demo of the latest optimized Linux version of Multi Player SimCity for X11. Ported to Unix, optimized for Linux and demonstrated by Don Hopkins:

https://www.youtube.com/watch?v=_fVl4dGwUrA

Micropolis Online (SimCity) Web Demo: A demo of the open source Micropolis Online game (based on the original SimCity Classic source code from Maxis), running on a web server, written in C++ and Python, and displaying in a web browser, written in OpenLaszlo and JavaScript, running in the Flash player. Developed by Don Hopkins:

https://www.youtube.com/watch?v=8snnqQSI0GE

Source Code:

https://github.com/SimHacker/micropolis

HAR 2009 talk: Constructionist Educational Open Source SimCity:

https://donhopkins.medium.com/har-2009-lightning-talk-transc...


These are pointers and not cursors, unless I misunderstand. Cursor made me think of sharing tmux or IDE text sessions.


Not only is this is needless pedantry, it's also incorrect, making it a very typical HN comment.

It is not uncommon to refer to pointers as "mouse cursors" or shorten it to just "cursor". In fact, Wikipedia considers pointers to be a subset of cursors.

https://en.wikipedia.org/wiki/Cursor_(user_interface)#Pointe...

"In computer user interfaces, a cursor is an indicator used to show the current position for user interaction on a computer monitor or other display device that will respond to input from a text input or pointing device."


Is it so needless? Perhaps I'm misremembering but the distinction was at least useful in the past as GUIs evolved out of TUIs.

If they had said pointers there would be no confusion.


Since its wrong, I'd think it's basically needless by definition.


Exactly. In css this is exactly called

cursor:pointer/crosshair


Awesome deep dive!


Good article!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: