Generation AI
AI Trust, Eval Frameworks, and Why Data Quality Matters
Episode Summary
In this episode of Generation AI, hosts JC and Ardis tackle one of the most pressing concerns in higher education today: how to trust AI outputs. They explore the psychology of trust in technology, the evaluation frameworks used to measure AI accuracy, and how Retrieval Augmented Generation (RAG) helps ground AI responses in factual data. The conversation offers practical insights for higher education professionals who want to implement AI solutions but worry about accuracy and reliability. Listeners will learn how to evaluate AI systems, what questions to ask vendors, and why having public-facing content is crucial for effective AI implementation.
Episode Notes
In this episode of Generation AI, hosts JC and Ardis tackle one of the most pressing concerns in higher education today: how to trust AI outputs. They explore the psychology of trust in technology, the evaluation frameworks used to measure AI accuracy, and how Retrieval Augmented Generation (RAG) helps ground AI responses in factual data. The conversation offers practical insights for higher education professionals who want to implement AI solutions but worry about accuracy and reliability. Listeners will learn how to evaluate AI systems, what questions to ask vendors, and why having public-facing content is crucial for effective AI implementation.
Introduction: The Trust Challenge in AI (00:00:06)
- JC Bonilla and Ardis Kadiu introduce the topic of trusting AI outputs
- Contrasting traditional predictive modeling metrics with new AI evaluation methods
- Understanding that trust is both earned and lost through interactions
The Psychology of Trust in AI (00:03:35)
- How human psychology frameworks for trust transfer to technology
- Challenge appraisal (seeing AI as enhancement) versus threat appraisal (seeing AI as risky)
- Example: How autonomous driving shows trust being built or lost through micro-decisions
- The importance of making AI systems more predictable to humans
Evaluating AI Outputs: The Evals Framework (00:11:41)
- Moving from traditional machine learning metrics to new evaluation methods
- How OpenAI Evals works as a standard for measuring AI performance
- Creating test sets with thousands of variations to check AI outputs
- The concept of "AI checking on AI" for more thorough evaluation
- Element451's achievement of 94-95% accuracy rates on their evaluations
Retrieval Augmented Generation (RAG) Explained (00:27:23)
- RAG as an "open book exam" approach for AI systems
- How data is processed, categorized, and made searchable
- The importance of re-ranking information to find the most relevant content
- How multiple documents can be combined to create accurate answers
Addressing Common AI Trust Concerns (00:33:31)
- Reducing hallucinations through proper grounding in source material
- Why "garbage in, garbage out" fears are often overblown
- Using public-facing content as reliable data sources
- The value of traceable sources in building confidence in AI responses
Conclusion: Building Earned Trust (00:38:11)
- Trust in AI comes from reliability and transparency
- The importance of asking the right questions when selecting AI partners
- How to distinguish between companies just talking about AI versus implementing best practices