IPHS 300: AI for Humanity
Document Type
Poster
Publication Date
Spring 2025
Abstract
This research project investigates large language models’ (LLMs) abilities to develop conceptually and mathematically correct proofs by using a benchmark based on the Taylor Series representation. The task examines LLM models for their capacity to adhere to definitions, theorems, and calculus. A range of models of varying sizes was tested, including Qwen, Gemma and LLaMA. Models under 2B parameters demonstrate poor understanding of Taylor and geometric series and apply wrong theorems while lacking logical reasoning about convergence. Models with at least 27B parameters typically generate proofs that are both coherent and almost complete. The research identifies existing constraints in symbolic reasoning capabilities of language models while proposing a strict evaluation standard for these functions.
Recommended Citation
Idowu, Godwin, "AI Proof Benchmarking: Evaluating Mathematical Reasoning in Open-Source LLMs via Taylor Series Analysis" (2025). IPHS 300: AI for Humanity. Paper 57.
https://digital.kenyon.edu/dh_iphs_ai/57
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
