Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If i am using a local RAG. Which context am i suppose to use during evaluation #1337

Closed
zeoxleh opened this issue Sep 20, 2024 · 3 comments
Closed
Labels
question Further information is requested

Comments

@zeoxleh
Copy link

zeoxleh commented Sep 20, 2024

Your Question
From my understanding, the "create your own test set" documentation uses OpenAI models and embeddings to create Question/context/groundtruth. In this case, context is obtained using embeddings from OpenAI.

When I use my own RAG pipeline, I use GenAI embeddings, which will provide different contexts.

So I'm just wondering, during evaluation, if I should include contexts from my RAG or use the context obtained during test set generation?

Or maybe I am misunderstanding something, if so, please do correct me. Thank you

@zeoxleh zeoxleh added the question Further information is requested label Sep 20, 2024
Copy link

dosubot bot commented Sep 20, 2024

Hey @zeoxleh! I'm here to help you with any bugs, questions, or contributions. If you need assistance, feel free to ask!

You should use the contexts generated by your RAG pipeline for evaluation. The document describes the process of building a RAG pipeline and evaluating it using the contexts retrieved by the pipeline itself. This approach ensures that the evaluation metrics reflect the actual performance of your RAG system in retrieving and generating relevant content.

Supporting evidence from the document:

## The Data

For this tutorial, we'll use an example dataset from one of the baselines we created for the [Amnesty QA](https://huggingface.co/datasets/explodinggradients/amnesty_qa) dataset. The dataset contains the following columns:

- question: `list[str]` - These are the questions your RAG pipeline will be evaluated on.
- context: `list[list[str]]` - The contexts which were passed into the LLM to answer the question.
- ground_truth: `list[str]` - The ground truth answer to the questions.
## Metrics

Ragas provides several metrics to evaluate various aspects of your RAG systems:

1. Retriever: Offers `context_precision` and `context_recall` that measure the performance of your retrieval system.
2. Generator (LLM): Provides `faithfulness` that measures hallucinations and `answer_relevancy` that measures how relevant the answers are to the question.

This indicates that the evaluation is performed using the contexts retrieved by the RAG pipeline built with the specified embeddings [1].

To continue talking to Dosu, mention @dosu.

@zeoxleh
Copy link
Author

zeoxleh commented Sep 20, 2024

I see.

So effectively, the Question and ground truth portion of my test set is obtained from the "create your own test set" documentation. While the answer and context portion is obtained from my own RAG pipeline?

Copy link

dosubot bot commented Sep 20, 2024

Yes, the question and ground truth portion of your test set should be obtained from the "create your own test set" documentation, while the answer and context portion should be obtained from your own RAG pipeline [1][2].

To continue talking to Dosu, mention @dosu.

@zeoxleh zeoxleh closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant