Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daniel/prompt optimization #1737

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft

Conversation

sfc-gh-dhuang
Copy link
Contributor

Description

Please include a summary of the changes and the related issue that can be
included in the release announcement. Please also include relevant motivation
and context.

Other details good to know for developers

Please include any other details of this change useful for TruLens developers.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to
    not work as expected)
  • New Tests
  • This change includes re-generated golden test results
  • This change requires a documentation update

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

system_prompt = (
feedback_v2.PromptResponseRelevance.generate_system_prompt(
min_score_val, max_score_val, criteria, output_space
)
)

# print(adalflow_optimized_system_prompt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -540,12 +540,15 @@ def relevance(
min_score_val, max_score_val
)

# adalflow_optimized_system_prompt = """You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n- RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n- RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n- RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n- RESPONSE that confidently FALSE should get a score of 0.\n- RESPONSE that is only seemingly RELEVANT should get a score of 0.\n- Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n\n- Be cautious of false negatives, as they are heavily penalized. Ensure that relevant responses are not mistakenly classified as irrelevant.\n\n- Never elaborate."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this in comment in llm_provider?

@@ -294,6 +294,16 @@ class Relevance(Semantics):
pass


adalflow_v2 = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to adalflow_v2_groundedness or similar


{criteria}
Never elaborate."""
Respond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nYou should score the groundedness of the statement based on the following criteria:\n\n- Statements that are directly supported by the source should be considered grounded and should get a high score.\n\n- Statements that are not directly supported by the source should be considered not grounded and should get a low score.\n\n- Statements of doubt, admissions of uncertainty, or not knowing the answer are considered abstention, and should be counted as the most overlap and therefore get a max score of 3.\n\n- Consider indirect or implicit evidence, or the context of the statement, to avoid penalizing potentially factual claims due to lack of explicit support.\n\n- Be cautious of false positives; ensure that high scores are only given when there is clear supporting evidence.\n\n- Pay special attention to cases where the prediction is 1 but the ground truth is 0, and ensure that indirect evidence is not mistaken for direct support.\n\nNever elaborate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add newlines directly for easier reading instead of \n

@@ -390,6 +399,24 @@ class Trivial(Semantics, WithPrompt):
)


adalflow_v2_context_relevance = """You are a RELEVANCE grader; providing the relevance of the given CONTEXT to the given QUESTION.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new prompts should be class attributes instead of variables. Also separate the criteria from the additional guidelines with the template as is done below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants