Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples for RAG Benchmarking - LLM as a judge #96

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sreanikdk
Copy link

This PR adds examples for 'RAG Benchmarking - LLM as a judge', as a new section.

from rag_benchmark_utils.common_utils import create_retriever, validate_response, print_results
from helpers.config import TERRAFORM_DOCS_TABLE_NAME

RELATIVE_FILE_PATH = Path("data/golden_test_set.csv")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a must but might be easier to configure if moved to the config file

try:
log.info(f"Asking question: {query}")
result = qa_chain.invoke({"query": query})
actual_answer = result.get("result", "N/A")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again not a must but constants for N/A and Failed would be better in this scenario

multi_context: 0.4,
reasoning: 0.1
}
OUTPUT_PATH = Path('data/golden_test_set.csv')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a must but might be easier to configure if moved to the config file

Copy link
Contributor

@romaniam romaniam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small remarks but in general LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants