Document Type


Publication Date

Spring 2024


AI language models like ChatGPT(GPT-4), Claude, Grok, PI, and Gemini Advanced have revolutionized various domains with their remarkable capabilities. However, their performance varies significantly depending on the prompting techniques and the domain of application. This research investigates the performance of these models across zero-shot, few-shot, and chain-of-thought prompting techniques in three domains: HELLASWAG (common-sense reasoning), TRUTHFULQA (popular misconceptions), and Game Theory (textbook problems). By evaluating the models using a qualitative scoring rubric and exploring a novel domain, we aim to identify the most effective prompting strategies, gain insights into their strengths and limitations, and inform future research and development efforts in this field. The insights gained will contribute to the academic discourse on AI language models and guide practitioners on effectively leveraging these tools in their respective domains.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.