Test AI & LLM App with DeepEval, RAGAs & more using Ollama

Roadmap to become AI QA Engineer to test LLMs and AI Application using DeepEval, RAGAs and HF Evaluate with Local LLMs

Test AI & LLM App with DeepEval, RAGAs & more using Ollama
Test AI & LLM App with DeepEval, RAGAs & more using Ollama

Test AI & LLM App with DeepEval, RAGAs & more using Ollama free download

Roadmap to become AI QA Engineer to test LLMs and AI Application using DeepEval, RAGAs and HF Evaluate with Local LLMs

Testing AI & LLM App with DeepEval, RAGAs & more using Ollama and Local Large Language Models (LLMs)

Master the essential skills for testing and evaluating AI applications, particularly Large Language Models (LLMs). This hands-on course equips QA, AI QA, Developers, data scientists, and AI practitioners with cutting-edge techniques to assess AI performance, identify biases, and ensure robust application development.



Topics Covered:

  • Section 1: Foundations of AI Application Testing (Introduction to LLM testing, AI application types, evaluation metrics, LLM evaluation libraries).

  • Section 2: Local LLM Deployment with Ollama (Local LLM deployment, AI models, running LLMs locally, Ollama implementation, GUI/CLI, setting up Ollama as API).

  • Section 3: Environment Setup (Jupyter Notebook for tests, setting up Confident AI).

  • Section 4: DeepEval Basics (Traditional LLM testing, first DeepEval code for AnswerRelevance, Context Precision, evaluating in Confident AI, testing with local LLM, understanding LLMTestCases and Goldens).

  • Section 5: Advanced LLM Evaluation (LangChain for LLMs, evaluating Answer Relevancy, Context Precision, bias detection, custom criteria with GEval, advanced bias testing).

  • Section 6: RAG Testing with DeepEval (Introduction to RAG, understanding RAG apps, demo, creating GEval for RAG, testing for conciseness & completeness).

  • Section 7: Advanced RAG Testing with DeepEval (Creating multiple test data, Goldens in Confident AI, actual output and retrieval context, LLMTestCases from dataset, running evaluation for RAG).

  • Section 8: Testing AI Agents and Tool Callings (Understanding AI Agents, working with agents, testing agents with and without actual systems, testing with multiple datasets).

  • Section 9: Evaluating LLMs using RAGAS (Introduction to RAGAS, Context Recall, Noise Sensitivity, MultiTurnSample, general purpose metrics for summaries and harmfulness).

  • Section 10: Testing RAG applications with RAGAS (Introduction and setup, creating retrievers and vector stores, MultiTurnSample dataset for RAG, evaluating RAG with RAGAS).