Maths Education and AI in 2025
Today there is a story running in the news regarding the investment in the UK into Artificial Intelligence. It feels like a suitable moment to reflect on where we are with regards to AI in mathematics education, and some of the wider improvements in performance in the last year or so. Firstly, LLMs have made huge advances in a very short amount of time. The most popular of which, ChatGPT, has gone through a few iterations and is currently at version o1 for advanced tasks, which slows the response time down a little to vastly improve the quality of results. LLMs are generally very, very good now at solving mathematical problems in part due to focused improvements on this area specifically, and we’re going to see giant leaps again soon with the development of AGI (artificial general intelligence) models that match or exceed human performance. These models ‘exist’ (are being developed and are making rapid progress - ChatGPT o3 for example was revealed in December) and are supposedly going to be available either this year or next. Given the pace of development I wouldn’t be surprised if it was before the end of 2025. All this rapid advancement has made benchmarks obsolete (remember the Will Smith eating pasta video? That was less than two years ago!). Benchmark saturation has been met with much higher bars being designed, such as the Frontier Math project setting incredibly difficult PhD mathematics problems that Terrance Tao himself claims will resist AI for several years. However, OpenAI claim their new AGI improved performance in that arena by more than 10x. It seems the AI plateau is still some way off. It’s also worth noting before we dive into AI in education specifically, that the environmental cost of AI usage appears to be rapidly reducing too thanks to advancements in efficacy, although despite the energy usage of LLMs generally decreasing, that hasn’t stopped companies future-proofing by investing in huge energy consuming data centres which inevitably are going to be bad for the environment.
AI in maths education
Despite all these enormous gains, it almost feels absurd to point out that LLMs are still incredibly poor at writing maths questions. Granted, they are generally outstanding now at solving them, which works nicely to show off capabilities, but for maths education, we have little need for solving machines. However if you take the maths out of the maths lesson, they’re also very good at identifying suitable lesson structures, misconceptions to be aware of, models to use (although if they require diagrams, there’s some way to go), vocabulary to use in explanations etc. This in itself bodes well for an AI tutor for teachers which I’ll discuss shortly.
What would be more useful is something that can create strong resources, mark them (there’s a government funded project working on this!) or tailor them (eg for adult learners, younger learners, context specificity, additional practice on sub-skills etc) using simple prompts from teachers. At Oak National Academy we have an impressive lesson planner, but it performs less reliably in mathematics specifically. Elsewhere, there is an abundance of interest in pupil tuition models for maths, but again they appear to have some immovable flaws by ignoring some basic principles around child motivation, personalisation and hallucinatory errors. As such fragile trust issues remain. Part of the issue is that LLMs simply aren’t calculators, they’re more like multi-dimensional predictive text. This is a recognised area for improvement, and there are some interesting projects such as Microsoft’s Phi-4 that are beginning to address these performance dips. Rest assured it won’t be a barrier for much longer, and I’m confident we’ll see iterations this year that improve on these weaknesses, utilising more robust maths-specific products and packages that get integrated and called-upon when LLMs are recognisably low performing. A more multi-faceted LLM response that uses an agentic approach and retrieves, say, a set of real examples of maths questions to use and iterate from should, in some form or another, become part of the quality gains I think we’ll see soon. Similarly integration of a robust output self-evaluation will improve LLM’s recognising whether output is high or low quality and hopefully prevent poorer results reaching the end-user. It would also reduce associated costs, by deconstructing processing an input into smaller parts - some of which utilise an LLM, some of which don’t (eg a database retrieval, comparison, evaluation or a mathematical calculation) and effectively only allocating and utilising AI in sub-components of a process before producing an output - handy when the financial costs of large usage of LLMs are getting pretty big.
AI tuition models
I think we’re starting to see a shift away from ‘AI as a child-tutor’ (a teacher replacement model) to ‘AI as a teacher-tutor’ (a teacher augmentation model). This is a welcome development, and bypasses one of the above immovable (?) problems of child motivation. I see a really positive and adoptable role for AI here that is already showing signs of making gains in the sector. Dylan Wiliam, Mary Myatt and John Tomsett are all advising on a new product called Aristotal, an AI tool geared towards mentoring teachers to improve their teaching. It has been trialed in schools seemingly successfully, and appears to reach into an area that is ripe for AI. As a teacher, you can receive personalised tuition based in part on your own reflections of your teaching, but crucially, it’s not only personalised, but private. It bypasses the relationship complexity of a teacher being advised on how to improve by a peer or senior leader, and can offer strong expertise on micro-elements of teaching, accessed by the teacher at their own convenience. I find it hard to critique a product that could, say, effectively advise an inexperienced teacher or teaching assistant on the best intervention design for a pupil struggling with percentages for example. This kind of teacher-nudging feels less intrusive, more apolitical, and could go some way to helping with pupil performance in maths anywhere in the world. I think that’s a good thing, and a model that does not encroach on the feeling of teacher autonomy. It can feel overwhelming trying to keep up with the somewhat relentless pace of development in AI (bringing with it a set of ethical and legislative issues), but I’m excited and a little more optimistic about how these kinds of models progress and evolve in 2025.