Exploring Lichess data on improvement using e4 vs d4 and blitz vs rapid
A common question in the chess community is whether to play e4 or d4 as a beginner. Opinions are generally split between e4 being better for learning, no difference, and rarely that d4 is better. Typical reasoning being that e4 leads to sharper games and gives you more opportunities to learn. The other starting moves as white are rarely recommended for beginners altogether and rarely played at all levels. Another common question is whether playing shorter or longer games helps you improve faster, with the orthodoxy being that longer games help you learn more overall. The standard argument I’ve seen is that you have more time to analyze things in longer games which helps in all modes while the skills you improve in quicker games are less applicable in the longer time controls. I’ve downloaded games from Lichess’ database to explore those questions.
In short, my analysis suggests that d4 is actually somewhat better for your improvement (confidence: high), and so are longer games (confidence: medium).
I download the January 2022 (102,110,423 games) and January 2023 (103,178,407 games) PGNs from Lichess’ database. I went over all rapid, and blitz games saving them separately and recorded whether White played e4 or d4 and their rating at the time. I threw everything else - Black, different time controls, non-rated games, anything after the first move etc.
I filtered out anyone above 1800 rating since the majority who are that high aren’t beginners - according to Lichess only 18% of weekly rapid players have a rating > 1800 and 21% for blitz.
Further, I removed anyone that wasn’t present in both 2022 and 2023, and anyone who played both e4 and d4. While this reduces the number of remaining players, if we assume that e4 or d4 is better we dont want to have many players marked as say d4 but improving just as fast because they also play e4. The above left me with 125447 e4 rapid players and 22274 d4 rapid players, and 950477 blitz e4 players to 477755 blitz d4 players. It also made the ratio of e4 to d4 players worse than beforehand since there were more ‘pure’ e4 players than pure ‘d4’ players.
That leaves us with a good amount of players, as we are comparing the same players at both times and we should expect to detect an effect if there is one.
I personally prefer rapid, so I looked at those games first. After filtering the data and extracting the rating differences between 2022 and 2023, we get a mean increase of 78.6 points for e4 over a year of play vs 85.5 for d4. This looks like a small but noticeable difference and A quick ttest suggests
pvalue=0.001, so the results are significant.
Looking at the graph it appears that most of the difference is in the 1350 and below range. I’m not completely sure why there is such a jump in rating improvement there but the starting rating on Lichess is 1500, and I believe 1 lost game on a new account brings you to the 1350-1400 group so that’s where the newest player are and where ratings are most volatile.
Looking at the blitz data, the results do replicate, however, the improvements are much smaller. The mean improvement for E4 is 5.2 points and 8.1 points for blitz (pvalue=0.009). It’s important to note that the rapid and blitz ratings aren’t directly comparable and the cutoff at 1800 does not mean the same thing for blitz as it does for rapid. The 50th percentile for rapid is 1550 on Lichess and 1430 for blitz, with rapid ratings generally being much higher for the same player. Nonetheless, the results broadly replicate and look like this
Given how hard it is to compare rapid and blitz ratings, I have less to say about the relative improvement in the two time controls but the average improvement in <1800 rapid is ~80 points and only 5-8 points for blitz, which is a rather large difference. If I cut off the data to only <1500 the blitz improvement jumps to 60-68 points which is still lower, and even limiting it to below 1300 leads to lower improvements in blitz (78 points for e4 and 66 for d4) than for <1800 rapid (and that number only grows if I also limit rapid to a lower rating). It is admittedly not conclusive as the two distributions are different, and I might explore the question further by comparing blitz players’ rapid improvements and vice versa but what I am seeing here certainly suggests that your improvement might be slower with blitz.
The biggest limitation is that we are simply comparing the improvement of d4 and e4 players a year later but can’t know if the improvement is due to their choice of opening or because the groups of e4 and d4 players are different themselves - our study is simply correlattional.
It’s also important to note, that some in the ‘play longer games’ camp recommend going for games even longer than rapid but if longer games lead to more improvement I’d expect to see a difference in rapid vs blitz already.
Longer games or at least rapid seem plausibly better for your improvement over blitz games (at least at lower ratings). d4 in particular seems to lead to a faster rating rise than e4 BUT the difference is fairly small, so if a player prefers a particular opening and would play more games using it then it’s unlikely to be worth switching. Personally, I will attempt to switch away from my default - the Scotch - if only because it would be somewhat silly to have ran this analysis and not changed anything because of it. If anyone wants to try to replicate my findings the quick code I wrote is here, however, you’ll need to download the PGNs from Lichess as a month’s worth of games is ~200gb uncompressed and hard to share. Let me know if you get different results or spot any mistakes.