If i am using a local RAG. Which context am i suppose to use during evaluation #1337

zeoxleh · 2024-09-20T08:31:51Z

Your Question
From my understanding, the "create your own test set" documentation uses OpenAI models and embeddings to create Question/context/groundtruth. In this case, context is obtained using embeddings from OpenAI.

When I use my own RAG pipeline, I use GenAI embeddings, which will provide different contexts.

So I'm just wondering, during evaluation, if I should include contexts from my RAG or use the context obtained during test set generation?

Or maybe I am misunderstanding something, if so, please do correct me. Thank you

dosubot · 2024-09-20T08:33:19Z

Hey @zeoxleh! I'm here to help you with any bugs, questions, or contributions. If you need assistance, feel free to ask!

You should use the contexts generated by your RAG pipeline for evaluation. The document describes the process of building a RAG pipeline and evaluating it using the contexts retrieved by the pipeline itself. This approach ensures that the evaluation metrics reflect the actual performance of your RAG system in retrieving and generating relevant content.

Supporting evidence from the document:

## The Data

For this tutorial, we'll use an example dataset from one of the baselines we created for the [Amnesty QA](https://huggingface.co/datasets/explodinggradients/amnesty_qa) dataset. The dataset contains the following columns:

- question: `list[str]` - These are the questions your RAG pipeline will be evaluated on.
- context: `list[list[str]]` - The contexts which were passed into the LLM to answer the question.
- ground_truth: `list[str]` - The ground truth answer to the questions.

## Metrics

Ragas provides several metrics to evaluate various aspects of your RAG systems:

1. Retriever: Offers `context_precision` and `context_recall` that measure the performance of your retrieval system.
2. Generator (LLM): Provides `faithfulness` that measures hallucinations and `answer_relevancy` that measures how relevant the answers are to the question.

This indicates that the evaluation is performed using the contexts retrieved by the RAG pipeline built with the specified embeddings ^[1].

To continue talking to Dosu, mention @dosu.

zeoxleh · 2024-09-20T08:38:06Z

I see.

So effectively, the Question and ground truth portion of my test set is obtained from the "create your own test set" documentation. While the answer and context portion is obtained from my own RAG pipeline?

dosubot · 2024-09-20T08:40:23Z

Yes, the question and ground truth portion of your test set should be obtained from the "create your own test set" documentation, while the answer and context portion should be obtained from your own RAG pipeline ^[1]^[2].

To continue talking to Dosu, mention @dosu.

zeoxleh added the question Further information is requested label Sep 20, 2024

zeoxleh closed this as completed Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If i am using a local RAG. Which context am i suppose to use during evaluation #1337

If i am using a local RAG. Which context am i suppose to use during evaluation #1337

zeoxleh commented Sep 20, 2024 •

edited

Loading

dosubot bot commented Sep 20, 2024

zeoxleh commented Sep 20, 2024

dosubot bot commented Sep 20, 2024

If i am using a local RAG. Which context am i suppose to use during evaluation #1337

If i am using a local RAG. Which context am i suppose to use during evaluation #1337

Comments

zeoxleh commented Sep 20, 2024 • edited Loading

dosubot bot commented Sep 20, 2024

zeoxleh commented Sep 20, 2024

dosubot bot commented Sep 20, 2024

zeoxleh commented Sep 20, 2024 •

edited

Loading