LLM: Automated Prompt Scoring

How to objectively choose, improve, and reduce hallucinations on your ChatGPT prompts using python

Generative AI

Michael Malin

April 10, 2023

Large language models (LLM) like ChatGPT are having a huge impact. They are also just the beginning. Over the next year, companies big and small will begin to roll out domain/persona specialized LLM models. Indeed, this is already becoming a reality with new products like the finance-specialized BloombergGPT and Microsoft’s developer-focused Copilot. We will soon see AI personal trainers, health coaches, councilors, legal assistants, and many more. While some cases may require fine-tuned models on domain-specific data, the majority can be accomplished with simple prompt engineering. But how do you know when your prompt is good enough? How can we generate objective accuracy scores on subjective text?

This guide will cover:

Theory
Prompt engineering
Prompt Testing
Prompt Scoring
Prompt Feedback

Michael Malin

With over 12 years of experience, I have deployed over 40 successful projects across all AI domains including computer vison, NPL, GNN, and forecasting. I specialize in TensorFlow graph and deep learning networks.