IPHS 300: AI for Humanity

Document Type

Poster

Publication Date

Spring 2025

Abstract

This research project investigates large language models’ (LLMs) abilities to develop conceptually and mathematically correct proofs by using a benchmark based on the Taylor Series representation. The task examines LLM models for their capacity to adhere to definitions, theorems, and calculus. A range of models of varying sizes was tested, including Qwen, Gemma and LLaMA. Models under 2B parameters demonstrate poor understanding of Taylor and geometric series and apply wrong theorems while lacking logical reasoning about convergence. Models with at least 27B parameters typically generate proofs that are both coherent and almost complete. The research identifies existing constraints in symbolic reasoning capabilities of language models while proposing a strict evaluation standard for these functions.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.