In the last few weeks I have looked at a basic method of oddsline creation and an approach that uses fuzzy logic. Today I’m going to look at an approach that is used by a lot of betting teams around the world. It is a statistical method called multinomial logistic regression. This is a model that predicts the possibility of a single outcome based on a set of independent variables.
Possibly the hardest part of building an odds-line is to determine what importance (or weight) you should give the difference factors you are using in it. This goes hand-in-hand with the factors that you are using. On a previous article someone commented that you would need to have a model which allows you to adjust the importance because, for example, in some races, such as sprints, speed may be more important than in others. This was a good point and very relevant to the article, however it is not necessarily an issue across building your models. Rather than having a factor called speed, we can have a factor that measures speed under todays conditions. This factor can already take into account the importance of speed on todays race by adjusting it up/down based on the importance in the current race. This means that by the time we get to making our odds we have a single factor for speed which takes into account not just how fast the horse is likely to be but also how important it is in the current race conditions.
In fact, it is best to try and condense your factors down to just a few by combining information if you are making an oddsline. This means that you may have one factor for Form which takes into account recent form, collateral form, conditional form etc…in other words you combine them first before making your oddsline so that you are only making your odds from 6 or 7 pieces of information.
Doing this brings us to another issue with this type of model. Related factors!
When using a multinomial logit regression model we need the factors in it to be as dependent as possible. Unfortunately in horse racing this is very difficult, after all if we say a horse was the fastest in the race then there is the chance that this will be shown in the form rating as well as the speed rating. This means that those two ratings have a cross-over of information. In other words we are counting a portion of the information twice! It is impossible to get away from this completely but if we are going to use this approach then we need to do so as much as possible. This is known as correlation. If planning on developing this type of oddsline model, you need to be aware of correlation when combining your ratings. Your focus should be to make them as un-correlated as possible and you can do this using a correlation ratio test.
For the rest of this article I am going to assume that we have a set of factors that show the picture of a horse in the race as a whole and are as un-correlated as possible. The challenge now is to determine what level of importance (weight) to give them. This is very similar to the approach we used in Creating Odds Lines – Episode 1 but this time we are going to use multinomial regression to calculate the weights.
To do this you are going to need a piece of software, you can get command line programming languages, Excel plugins such as Unistat and Solver all the way to fully fledged software such as SAS. When you run your regression it will go through all the past data and calculate what the weights should be for each factor in your model. It essentially takes the issue of importance and weighting out of your hands.
Once you have the weights you then simply raise each factor to it’s weight and then combine it to create a final rating from which you can make a probability and odds.
However, even having done all this, it is still unlikely that your oddsline is going to be accurate enough to use. What is now needed is the added influence of something that we know is accurate, the public odds. To get your final odds you use a process that takes your odds and the public odds and combines them because the likelihood is that the accurate odds are going to be somewhere between the two.
If you are thinking that this is a lot of work, then you are right. It is and there are a lot of obstacles to overcome, which is why this process is usually only used by betting syndicates or multi-player teams who can spread the workload of creating the model.
What happens if you want to create an oddsline from a single rating but do it in an effective way?
Well that is something that I will look at in the next article in the series!