PDA

View Full Version : Multiplayer Ranking Explained



fredizzimo
05-17-2015, 10:08 AM
Note, I copied this post from the news section here for more visibility. I don't think it makes sense for it to be deeply hidden in a thread called "Update 8 has arrived"

Hi All,

First let me introduce myself. I'm Fred Sundvik, a programming architect at RedLynx, with a big focus on online related things. I have evaluated many different ranking systems, and finally suggested that we should use this one. I also wrote the technical design for the ranking, so I'm the first to blame if you think that ranking system doesn't work as it should, or if you think it should be be replaced by something else.

This post is rather long and technical so I completely understand if you don't want to read it all. So if you are not technically interested, you can scroll down to the FAQ section, where I will try to explain common problems and misconceptions, in an hopefully easy to understand way.

The ranking system that Trials Fusion is using is called Glicko-2. It's a system originally developed for chess, and the output is quite similar to the standard Elo rating system (http://en.wikipedia.org/wiki/Elo_rating_system).

There are a few differences though. From a practical point of view, the Glicko-2 system converges to the actual rating much faster than ELO does. It can also handle a group of matches as if they all happened at once, if we applied ELO to a multiplayer game, then the order of the application would have a big effect on the output, while with Glicko-2 we can threat the whole match as if everything happened at the same time. Glicko-2 and normal Glicko are quite similar, but Glicko-2 can adapt the ranking a bit faster if a player suddenly gets better or worse, by for example practising more, or stop playing for a while.

Microsoft's True Skill is similar to Glicko, but there's special optimizations for team games. It also suffers from a problem that often is referred to as "level locking", at some point it's almost impossible to get your rating to change, even if your actual skill has improved. Do a Google search for more information about that.

I'm not going to go very deep into all the mathematics behind Glicko-2, since it's all in the article linked above in the first post and repeated here (http://www.glicko.net/glicko/glicko2.pdf). Don't worry if you don't understand it, I haven't bothered to understand all the small details either. It's not complicated just because of the forumlas, but also because there's an iterative step(5.), which makes things very hard to calculate by hand. If that article is not enough, there's an even more technical one here (http://www.glicko.net/research/dpcmsv.pdf), that actually shows how all the formulas are derived.

Fortunately Barry Cox, has made an excel sheet (http://www.bjcox.com/?page_id=20&did=11) that does the hard work for you, you just need to enter the starting numbers an results. We are using a system constant system constant(tao), with a value of 0.3, so you need to change that in the parameters sheet first. You also need to disable some macro security as described on the downloads page. Also note that the starting skill is 1500, and the starting RD is 350. You will probably not be able to get the exact same scores as the game though, since the volatility parameter isn't displayed within the game.

Now on to the actual application of Glicko-2 in the game, and what input values it actually uses. As you can see from the Glicko description, the algorithm takes a list of two player games as input. This has to be mapped to a game that consists of matches with up to 8 players, which is furthermore split into multiple heats.

In order to do that, we imagine that you are playing against every other player in a separate game. So if you are playing an 8 player match, the game will actually see you playing 7 different matches. We furthermore divide this into heats, so each heat is considered separately. But to give a bit more weight to actually winning the match, the final score is also input as another set of up to 7 matches.

So in total for a single match, up to 7 * 5 = 35 different results will be input into the Glicko-2 algorithm.

For the actual results, we look at the points for a single heat(or the end result), so both the amount and faults and the time counts. A better score than the other player is considered a win (1.0) a worse score a loss (0.0), and the same score is of course a draw (0.5). It doesn't matter how much you win or lose.

Technically Glicko-2 supports different grades of win or loss, however if we wanted to take that into account, we would have to find some reliable metrics for it. The points are not good, since then it would determine that you just barely won over a much weaker player if the heat consisted of only weak players. One of the players would have a high score, which likely would turn into a negative rating gain for you.

Also remember that it would work both ways, now every time you win, you get the full gain. If we took grades of wins into accounts, you would likely not gain as much score.

A disconnection is always treated the same way as if you had lost the game. It doesn't matter if you quit the game by yourself or if there's a genuine disconnect. If we start to treat different disconnects differently, then we are at the same time opening up for cheating. You should note however that currently there's a slight bug, if you get disconnected during the first round, you will get a much bigger penalty than if you got disconnected during the later rounds. This will be fixed in a future patch.

After the match, we end the current Glicko-2 rating period. Note that we do not increase the RD when a player has been idle during the rating period. The reason for this is that we don't have any global rating periods, like days or weeks. We decided to do it this way, because we considered it important that you can see the rating changes immediately after the game finishes, as opposed to once per day or once per week.

Now that you know the technical details behind how the system works, let's move on to the perhaps more interesting part, what does the actual numbers tell, and what exactly does the system measure?

The English Chess Federation has written an excellent article The Glicko Sytem for beginners (http://www.englishchess.org.uk/wp-content/uploads/2012/04/The_Glicko_system_for_beginners1.pdf). I'm not sure if I would classify that article as a beginners article though, it contains some pretty advanced information. The article is of course written with chess in mind, but many things applies for the game as well. One obvious thing that doesn't concern the game is the rating deviation vs time, because it's disabled as explained earlier. Here I will explain only things related to the game and leave out a lot of other details.

Let's start with the most basic value, the rating. The rating represents your currently estimated skill. It's important to realise that the value is just an estimation and not the exact skill. I will talk more about the accuracy of the estimation later on, when explaining the RD value.

The higher the rating is, the more skilled the player is. Exactly how much better the player is, can be determined by using the ELO win expectancy formula(there's a more complicated formula for Glicko, that takes RD into account, but the end result is so similar, that it doesn't really matter).

E = 1 / (1 + 10^((B - A) / 400))
Where
E = The chance of player A winning the game
A = The rating of player A
B = The rating of player B

One interesting property of this formula is that only the difference between the players matters. So a player with a rating of 2000 has exactly the same probability of winning against a 1500 player, as a 1500 player has against a 1000 player.

Let's look at this formula in more detail by plotting the differences and a graph.

http://i60.tinypic.com/qxl1n4.jpg

As you can see a rating difference of 400 is roughly equal to a 90% chance of winning the game, while a difference of 800 gives you a 99% chance of winning. Or in other words, if you have a rating of 2400, then the system expects you to win 99 times out of 100 against a player with a 1600 rating. If you aren't able to do that then you are rated too high. That's also why you immediately lose points, even if you just lose one heat against a low ranked player.

The system will look at only the heats within the single match. So it will see that you lost 1 out of 5 games, and change your rating according to that. If you then win the next 95 heats as expected, you will regain your score.

This means that your current rating will most likely oscillate around your real skill level, sometimes it will be too high and other times it will be too low. So don't be too concerned when your score drops down because you lost, just as you are expected to do once in a while.

When Mark Glickman designed the Glicko algorithm he was very much aware of this oscillation that will occur. In fact it's built in as a very integral part of the system, as the Rating Deviation, or RD.

The RD value basically tells you how unreliable the displayed rating value is. It starts high(350), and will drop when you play more games. At some point it will stop dropping, and might even increase. This depends on how consistently you play, if you play very consistently, then the RD value will go very low, but if you are inconsistent the value will remain higher.

It should be noted that inconsistent here is relative to your current rating. So if you are currently playing at a higher level than your rating is, then you should see that the RD value will slowly start growing. The actual mathematics behind all that is not actually that simple, it all depends on the hidden volatility value that is very hard to calculate, but this description should be good enough for a reasonable understanding of the RD calculation.

This basically means that you can roughly tell if a player is playing consistently at his rating or not, just by looking at the RD value. A high RD value probably means that the players rating actually could be something completely different. Very high RD values of over 100 much likely means that the player is new, an you should not trust the rating at all.

Another way to look at the rating deviation is to treat it as a standard deviation (http://en.wikipedia.org/wiki/Standard_deviation) and use the 68-95-99.7 rule (http://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule). So for example if your rating is 2000 and you have an RD value of 50, the system is 68% confident that your actual rating is between 1950 and 2050, 95% confident that it's between 1900 and 2100, and 99.7% confident that it lies between 1850 and 2150.

This means that even if you lose to a low ranked player as described a few paragraphs above, the combination of the rating and RD value is still most likely correct. Another consequence of this is that you shouldn't blindly look at the rating value and say that you are better than the other player.

The RD value has another, probably even more interesting effect. Your RD value controls how much rating you gain or lose. If you have a low RD value, your ratings will be considered stable, so you will not gain or lose that many points. Play around with the excel sheet that I linked to see the effect of different RD values.

So if you stopped playing because you lost a lot of points due to loosing against a lower ranked player, it could have been because your rating wasn't stable yet. Once it's more stable, you won't lose that many points any more.

A similar thing could happen for new players if they manage to win the first few games. Because the RD value is high they could gain very high ratings. However the rating should quickly drop to a normal level when they start losing. We are considering to have a minimum amount of games until you can get ranked on the leaderboard, but until then I suggest that you ignore everyone with high RD values, like I already suggested earlier.

The matchmaking takes the rating into account. It tries to avoid putting skilled players together with unskilled ones. However it tries to do so, by still maintaining reasonable matchmaking times. This is totally dynamic, and the number of different groups it splits into depends on the number of players playing multiplayer at that time.

In practice it doesn't work so well, because most of the time there's not enough players to both have the players split into skill buckets, and maintain the current target matchmaking time of 40 seconds.

Finally in order to ensure integrity of the system, care is taken to have only one result uploaded for every match started. If a game is split into two parts because of disconnects, only the bigger part with the most players connected will continue. The rest of the players will get an error, and the game will end for them. They will have their rating updated when the original game ends, probably loosing points because they got disconnected.

FAQ

Why did I lose points even if I won the game?
The game calculates ratings for every heat separately, so it's possible to get a negative rating by losing just one of the heats. If you lose or even draw against a player that is much lower rated than yourself, it could be enough to make your rating change turn negative.

Please note that, the score is calculated as the combination of faults and the time. So if you are the first to finish, it doesn't necessarily mean that you won the heat.

We have tried to find bugs in the code regarding this, but so far we haven't found any. That doesn't mean that the code is bug free, so if you encounter a situation that you believe is wrong, we would be very pleased if you could provided a video of the whole match as a proof. With the help of the server logs that we have, we should then be able to figure out what's wrong.

I also encourage you to play around with the excel sheet that I linked above in order to get a better feel for the ratings.

What is the strange RD value?
In short it just tells how reliable the rating for that player is. A lower RD value means that the player has played more, and at a consistent level. High values, say above 100 are quite unreliable, and the ratings for those players should really not be trusted.

If you are interested in the exact technical details, they are all described above.

Does the margin that I win a heat matter?
No, we only consider win, draw and losses. So it doesn't matter if you win by 1 minute or 0.1 second. See the details above if you are interested.

I believe someone is cheating, he has a very high rating
First check the RD value, is that high? If yes, then he has probably been playing just one or a few games. In that case you should ignore the rating. In a future patch we will introduce a minimum amount of games for being ranked, or alternatively a maximum RD value.

It's also possible that the player is genuinely cheating or that there's a glitch somewhere, in that case please contact us and we will investigate.

Why do I lose much more points if I get disconnected during or before the fist heat than if I get disconnected during the last one?

This is a bug, if you get disconnected during the first heat currently, you will also lose all the other heats. The same thing applies for the second heat, you will lose that one, and also the third and the forth. This is going to be fixed so that you only lose the heat that you disconnected in.

Why do I lose points if there's some kind of network error, like "Host migration failed", during searching for games, or before the game starts?

First of all the host migration failed error should really never happen. It does happen from time to time, due to very complicated reasons. We probably won't be able to fix this, as it would most likely require us to rewrite a major part of the network code that has been in use since Trials Evolution.

The good thing, is that these kinds of errors are much more likely to happen before you actually get to the ingame. So in a future path, we will ensure that no points are lost unless you managed to get that far.

Note that you only lose points, if the majority of the other players manages to continue despite the fact that you get the error message.

This bug is also much worse because you lose points for all of the heats, as opposed to just the first one like you are supposed to.

Why do I have to play with so weak/strong players? Shouldn't the matchmaking make sure that we get even matches?

Yes, the matchmaking should make sure of that. However the simultaneous player count is quite low for Trials Fusion, so in order to find a game to play within a reasonable time, this logic doesn't usually happen. This would fix itself if more players were playing at the same time.

Why is there no unranked games?
This was one of the hardest decisions that we had to make. Splitting the community into two parts could make it hard to find games, and the multiplayer would eventually die out. We also wanted to ensure that grouping by skill works a little bit better.

We might be able to combine both the players that play ranked and the ones that play unranked games into the same session in a future patch. It would work by having you only gaining or losing rank by winning or losing against other players that re playing ranked games.

I hope this helps, and if you have further questions, please don't hesitate to ask.

Best Regards,
Fred Sundvik