Here's something I'm monitoring. I'll record and measure to BSP win and place.
12.2 | PLUM | MIDNIGHT CALLISTO(GB) |
12.4 | HERE | LORD APPARELLI(GB) |
12.5 | PLUM | LINELEE KING(FR) |
13 | CURR | BASEMAN(GB) |
13.2 | PLUM | HIDDEN CARGO(IRE) |
13.3 | CURR | ROUGH DIAMOND(IRE) |
13.4 | HERE | BENSON(GB) |
13.5 | KEMP | MINI MILK(GB) |
13.55 | PLUM | LANGER DAN(IRE) |
14.05 | CURR | BORN INVINCIBLE(IRE) |
14.05 | CURR | MISS FLORENTINE(IRE) |
14.1 | HERE | ACROSS THE LINE(IRE) |
14.2 | KEMP | PERFECT INCH(GB) |
14.3 | PLUM | THE BROTHERS(IRE) |
14.45 | HERE | KAUTO THE KING(FR) |
14.55 | KEMP | LIGHTS ON(GB) |
15.05 | PLUM | HAREFIELD(IRE) |
15.15 | CURR | CALL ME DOLLY(IRE) |
15.3 | KEMP | DELICATE KISS(GB) |
15.3 | KEMP | MASKED IDENTITY(GB) |
15.4 | PLUM | NOCTURNAL MYTH(GB) |
15.5 | CURR | STELLIFY(IRE) |
15.55 | HERE | COTTON END(IRE) |
16.05 | KEMP | ARATUS(IRE) |
16.15 | WOLV | SIR HAMILTON(IRE) |
16.2 | CURR | PRETTY BOY FLOYD(IRE) |
16.25 | HERE | PATS FANCY(IRE) |
16.35 | KEMP | LOXLEY(IRE) |
16.5 | WOLV | HIGHEST AMBITION(FR) |
17.05 | KEMP | LUCANDER(IRE) |
17.2 | WOLV | BATRAAN(IRE) |
18.2 | WOLV | ARABIAN WARRIOR(GB) |
19.2 | WOLV | GET BOOSTING(GB) |
19.5 | WOLV | CAFE SYDNEY(IRE) |
20.2 | WOLV | DRAGONS WILL RISE(IRE) |
I'm recording this on a spreadsheet here,
https://www.dropbox.com/s/9tplt1onyrdfnja/ADP%20Test.xlsx?dl=0
The selections are determined purely statistically and automatically with no final form reading or filtering.
I am very much going to enjoy following this. I like a high volume method. Are you using a model that considers all race conditions?
I am looking for high-volume, low-volatility, quick and easy to identify easy to set up and run automatically.
The selections are generated from my electronic formbook. This considers race conditions and dozens of factors. It arranges these in according to their statistical significance (in terms of SR or ROI) for each race code/type. I am taking the 24 most significant factors for each race code/type.
Selections must be above the median for all competitors in the race in at least ten of the 24 factors. This is the tricky bit, deciding on how many of the 24 factors to set as a minimum. I set it to all 24 initially. Qualifiers were very few but with a high strike rate and high A/E. Ten is the most workable at the other end of the spectrum, lots of qualifiers, lower SR and A/E.
This sounds like an excellent approach.
How many years data did you use to determine the most significant factors?
How far do you go with your race type, handicap / non-handicap, distance, going etc.
What would you consider to be the most important factors. In NH I have found the key factors to be:-
1. Age of horse.
2. Days since last run.
3. Weight.
4. Forecast odds.
How many years data did you use to determine the most significant factors?
This is built into the software and is just over 8 years worth
How far do you go with your race type, handicap / non-handicap, distance, going etc.
Again, built into the software. To keep sample sizes reasonable they tend to be fairly general. For example, UK AW Hcaps, UK Flat Hcaps 8f to 36f, Irish Hrdle Stakes, UK Hrdle Novice Stakes.
What would you consider to be the most important factors.
For this exercise they are determined by the software and vary according to each of the groups. For example, for UK Hurdle Handicaps the top 12 are,
- Efficiency in the last two races (any code)
- Efficiency in the last two races at this class
- Efficiency on adjacent goings to today
- Today's official mark compared to last official mark (same code only)
- Efficiency over courses with similar topography to today's course
- Trainer SR over last 12 months
- Efficiency in all races
- Efficiency on today's going
- Efficiency at adjacent distances to today's
- Consistent improvement in handicap ratings
- Efficiency in today's class
- Jockey's SR over last 12 months
Over a sample of 8,461 races.
The strike rate 'significance' has been calculated using likelihood ratios.
'Efficiency' is a cumulative percentage measure calculated according to how many horses finished in front of and behind the runner in previous races.
More generally, I have found the important factors to be,
- Market odds
- Forecast odds
- Recent form ratings (similar to Racing Post or Timeform performance ratings. i.e. an assessment of what the horse 'ran to' in previous races not what it 'ran off' like the official handicap marks)
- Master ratings (overall current assessment of the horses potential)
- Fitness (DSLR)
- Trainer strike rate (12 months and lifetime at course)
The first two are most significant by a country mile. The others add some more to the picture but must be assessed 'intra-race'. That is relative to the other runners in the race.
Then it's down to converting this information to relative win chances, then to prices, then finding those prices.
Thanks Andrew, very interesting. I must admit I have only been using very basic information over the last year or so (i.e. information that is included on a basic racing race-card such as the racing post). My results have improved substantially since I have started doing this.
I do find that using rules that seem to go against the norm (e.g. excluding "fit" horses) seem to work well.
One of the key factors seems to be how many selections I have in my base data. Systems with more than 540 winners have performed a lot better than those with less than 540 winners, even though the smaller systems had a higher AE. This tends to limit the specificity of the particular analysis but is compensated by the increased sample size.
'Efficiency' is a cumulative percentage measure calculated according to how many horses finished in front of and behind the runner in previous races.
I have previously looked at various methods of classifying a good run including efficiency, lengths beaten by, finish position compared to SP rank etc. The conclusion I came to was that none were any better than using finish position last time out although I may revisit this.
Thank you for sharing the details, very interesting. Have you considered looking at profit or A/E for significance of a factor? ROI vs Profit here is interesting, because ROI will tend to reduce selections significantly, and profit, whereas profit will have more selections but potentially tiny ROI's. Preferable would be a combination of all information for comfort levels.
I'd be interested in how much impact the 22 non-odds based factors are having into the odds you are creating? If it's only moving them marginally can you make it simpler by not including them but requesting a larger advantage?
Over 1,300 bets with this so far. Whilst it's made a small profit to level stakes the underlying A/E shows no advantage over the market. There's nothing wrong with the strike rates at 33% & 62%. These are similar to market favourites and many of the selections are favourites. I suspect this is the issue, at BSP any value has been sucked out of the selections. I would need to get prices better than BSP.
I have lots more data than that shown. These indicate some other potential angles to explore but of course, they are based on reduced sample sizes. So I will carry on recording and gathering data for now.
Win | Place | Comb | ||
Bets | 668 | 658 | 1326 | |
Winners | 221 | 409 | 630 | |
SR | 33.1% | 62.2% | ||
LSP | 19.68 | -13.71 | 5.97 | |
POT (LSP) | 2.9% | -2.1% | ||
Profit (prop) | -2.59 | -7.89 | -10.48 | |
POT (prop) | -1.2% | -1.9% | -1.6% | |
Stakes (ExpWins) | 220.90 | 414.17 | 635.07 | |
A/E | 1.00 | 0.99 | 0.99 |
https://www.dropbox.com/s/9tplt1onyrdfnja/ADP%20Test.xlsx?dl=0
With enough data will you be able to put the selections into odds brackets and determine which brackets have an advantage and then adjust the staking based on the advantage in each odds range at BSP? If a negative is big enough they could be laid.
Seems to be going reasonably well Andrew now with a small profit.
I have analysed the last few years NH data in an attempt to identify the best way of classifying a good run. Nothing particularly stands out.
The best appears to be using the last 3 form figures with a greater weighting given to the latest figure. The top 1% gives a strike rate of 26% and an AE of 1.04. The top 5% gives a strike rate of 20% and an AE of 1.03.
The worst appears to be the efficiency rating of the last race. The top 1% gives a strike rate of 14% and an AE of 0.86. The top 5% gives a strike rate of 18% and an AE of 0.98.
The highest individual figure is for distance won. The top 1% gives a strike rate of 28% and a AE of 1.07. This includes those horses that won by 13 lengths or more on their last start. Their were 401 winners from 1,441 runners. It is also profitable for winning distances of 7 lengths to 12 lengths. I may look at this in a bit more detail.
Hi Michael, as a 'straight out of the box' strategy mine looks like it might be a reasonable starting point. With some additional study of the selections it ought to be possible to make a profit.
The larger dataset also points to other possibilities. For instance, those with 20+ factors all above the median for the race are showing a decent return in the place markets. I'll need a few thousand more in the sample before I reach the confidence interval I'm happy with. Todays are,
13.25 | KEMP | MY GIRL MAGGIE(GB) |
13.55 | KEMP | RIVER CHORUS(IRE) |
13.55 | KEMP | VANITAS(GB) |
Interesting NH data analysis. Recent form figures and 'distance beaten' seem logical factors on which to base a good run. Perhaps 'days since' those recent runs might strengthen it more. I might look at this myself.
By form figures do you mean rating-type figures rather than last three finishing positions? If so, what figures are you using? Are they intra-race (e.g. rank, distance from top) rather than absolute? How are you measuring 'efficiency' in last race?
The form figures are simply the last 3 finishing positions added together, so a horse that won it's last 3 races would be top rated with a score of 3. If the horse finished worse than 8th or did not finish then I have given it a rating of 9.
For efficiency I have taken number of horses beaten less number of horses beaten by. So the top rated would be the winner of the Grand National with a rating of 39 (assuming 40 runners).
All three 20+ horses placed yesterday. Today's are,
14.45 | LIME | CAPTAIN MC(IRE) |
14.45 | LIME | FISSA(FR) |
20 | WOLV | ROYAL PLEASURE(IRE) |
Thanks Michael. I can't help thinking some sort of time or sequence factor would help the form figures. After all, there's a difference between a horse with figures of 4-4-1 and one with 1-4-4 especially if there's a long seasonal break somewhere between the runs. I guess this is your 'weighting' factor.
A logistic regression would do the same thing and view 4-4-1 total 9 and 1-4-4 total 9 completely differently. How many years of data did you use? I'll run a regression on last three finishing positions, days since run and distance beaten in last race and see what figures I get.