Daniel/prompt optimization #1737

sfc-gh-dhuang · 2025-01-17T23:28:25Z

Description

Please include a summary of the changes and the related issue that can be
included in the release announcement. Please also include relevant motivation
and context.

Other details good to know for developers

Please include any other details of this change useful for TruLens developers.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to
not work as expected)
New Tests
This change includes re-generated golden test results
This change requires a documentation update

…trumentation

review-notebook-app · 2025-01-17T23:28:30Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sfc-gh-jreini · 2025-01-22T23:17:43Z

src/feedback/trulens/feedback/llm_provider.py

        system_prompt = (
            feedback_v2.PromptResponseRelevance.generate_system_prompt(
                min_score_val, max_score_val, criteria, output_space
            )
        )

+        # print(adalflow_optimized_system_prompt)


sfc-gh-jreini · 2025-01-22T23:18:07Z

src/feedback/trulens/feedback/llm_provider.py

@@ -540,12 +540,15 @@ def relevance(
            min_score_val, max_score_val
        )

+        # adalflow_optimized_system_prompt = """You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n- RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n- RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n- RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n- RESPONSE that confidently FALSE should get a score of 0.\n- RESPONSE that is only seemingly RELEVANT should get a score of 0.\n- Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n\n- Be cautious of false negatives, as they are heavily penalized. Ensure that relevant responses are not mistakenly classified as irrelevant.\n\n- Never elaborate."""


why is this in comment in llm_provider?

sfc-gh-jreini · 2025-01-22T23:18:40Z

src/feedback/trulens/feedback/v2/feedback.py

@@ -294,6 +294,16 @@ class Relevance(Semantics):
    pass


+adalflow_v2 = """


nit: rename to adalflow_v2_groundedness or similar

sfc-gh-jreini · 2025-01-22T23:19:08Z

src/feedback/trulens/feedback/v2/feedback.py


-        {criteria}
-        Never elaborate."""
+Respond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nYou should score the groundedness of the statement based on the following criteria:\n\n- Statements that are directly supported by the source should be considered grounded and should get a high score.\n\n- Statements that are not directly supported by the source should be considered not grounded and should get a low score.\n\n- Statements of doubt, admissions of uncertainty, or not knowing the answer are considered abstention, and should be counted as the most overlap and therefore get a max score of 3.\n\n- Consider indirect or implicit evidence, or the context of the statement, to avoid penalizing potentially factual claims due to lack of explicit support.\n\n- Be cautious of false positives; ensure that high scores are only given when there is clear supporting evidence.\n\n- Pay special attention to cases where the prediction is 1 but the ground truth is 0, and ensure that indirect evidence is not mistaken for direct support.\n\nNever elaborate.


nit: add newlines directly for easier reading instead of \n

sfc-gh-jreini · 2025-01-22T23:20:56Z

src/feedback/trulens/feedback/v2/feedback.py

@@ -390,6 +399,24 @@ class Trivial(Semantics, WithPrompt):
    )


+adalflow_v2_context_relevance = """You are a RELEVANCE grader; providing the relevance of the given CONTEXT to the given QUESTION.


new prompts should be class attributes instead of variables. Also separate the criteria from the additional guidelines with the template as is done below.

sfc-gh-dhuang added 28 commits October 29, 2024 15:48

save

5aea893

improve trubasicapp setup for competitive experiments, add mlflow ins…

6a8d32e

…trumentation

save

7968d52

save progress

6b91ffe

save

65d92ce

save

b8325a4

Merge branch 'daniel/prompt-opt' into trec-exp-adding-rubric

cf5979a

add agreement analysis with scoreddocs

17941b0

notebook updates

fbe2ae4

Merge branch 'main' into trec-exp-adding-rubric

dd3c1f7

cleanup

99d2a2b

dataset preprocessing script update

78e3fb5

cleanup competitive analysis

4206707

Merge branch 'main' into trec-exp-adding-rubric

8554d04

add llm aggrefact nb

fb9c246

add llm-aggrefact experiment notebook

35432e7

Merge branch 'main' into trec-exp-adding-rubric

360fbb8

move notebooks

cb743c5

Merge branch 'main' into trec-exp-adding-rubric

680eab4

Merge branch 'trec-exp-adding-rubric' into daniel/prompt-optimization

12f38a0

save

ee872d0

save

4438819

save

d843fae

save progress - optimizing context relevance cortex prompt

469af41

add ragas answer relevance notebook

6deb0fd

add answer relevance notebooks

d6d7662

update prompt opt for groundedness nb

6cd9eb3

updated context relevance nb to persist optimized prompt

abd6f63

sfc-gh-jreini requested changes Jan 22, 2025

View reviewed changes

save

9370e31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daniel/prompt optimization #1737

Daniel/prompt optimization #1737

sfc-gh-dhuang commented Jan 17, 2025

review-notebook-app bot commented Jan 17, 2025

sfc-gh-jreini Jan 22, 2025

sfc-gh-jreini Jan 22, 2025

sfc-gh-jreini Jan 22, 2025

sfc-gh-jreini Jan 22, 2025

sfc-gh-jreini Jan 22, 2025

		@@ -294,6 +294,16 @@ class Relevance(Semantics):
		pass


		adalflow_v2 = """

		@@ -390,6 +399,24 @@ class Trivial(Semantics, WithPrompt):
		)


		adalflow_v2_context_relevance = """You are a RELEVANCE grader; providing the relevance of the given CONTEXT to the given QUESTION.

Daniel/prompt optimization #1737

Are you sure you want to change the base?

Daniel/prompt optimization #1737

Conversation

sfc-gh-dhuang commented Jan 17, 2025

Description

Other details good to know for developers

Type of change

review-notebook-app bot commented Jan 17, 2025

sfc-gh-jreini Jan 22, 2025

Choose a reason for hiding this comment

sfc-gh-jreini Jan 22, 2025

Choose a reason for hiding this comment

sfc-gh-jreini Jan 22, 2025

Choose a reason for hiding this comment

sfc-gh-jreini Jan 22, 2025

Choose a reason for hiding this comment

sfc-gh-jreini Jan 22, 2025

Choose a reason for hiding this comment