Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre and post process functions for Bedrock Rerank API #3254 #3339

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tkykenmt
Copy link
Contributor

@tkykenmt tkykenmt commented Jan 7, 2025

Description

Amazon Bedrock introduced Rerank model support. OpenSearch can invoke Rerank models on Bedrock by writing custom pre and post processing function, but pre-built function is good for performance.

Related Issues

Resolves #3254

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dhrubo-os
Copy link
Collaborator

Apply Spotless

Copy link
Contributor

@brianf-aws brianf-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor feedback on validation/converting scores.

}
List<?> outerList = (List<?>) input;
if (!outerList.isEmpty()) {
if (!(outerList.get(0) instanceof Map)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this iteratively? That way we check all the items for having a map and a valid index and relevance score key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will be updated on the next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on 4af169a

Comment on lines 45 to 52
scores[index] = switch (relevanceScore) {
case BigDecimal bd -> bd.doubleValue();
case Double d -> d;
case null -> throw new IllegalArgumentException("relevanceScore is null");
default -> throw new IllegalArgumentException("Unexpected type for relevanceScore: " +
relevanceScore.getClass().getName());
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice but I believe it will cause issues on backports without this feature what if we exctracted this into a method and do manual if-else checks?

Something like

private Double toDouble(Object score) {
    if (relevanceScore instanceof BigDecimal) {
        return ((BigDecimal) relevanceScore).doubleValue();
    } else if (relevanceScore instanceof Double) {
       return (Double) relevanceScore;
    } else if (relevanceScore == null) {
        throw new IllegalArgumentException("relevanceScore is null");
    }
        throw new IllegalArgumentException("Unexpected type for relevanceScore: " +
                relevanceScore.getClass().getName());
    


}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will be updated on the next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on 4af169a


@Override
public void validate(MLInput mlInput) {
if (!(mlInput.getInputDataset() instanceof TextSimilarityInputDataSet)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also check for null before getInputDataset()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will be updated on the next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked null check is already implemented in apply method in superclass, ConnectorPreProcessFunction and ConnectorPostProcessFunction. apply method is wrapping validate method.

Thus, I think it's not necessary to implement nullcheck in validate method again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added following validation in validate method.

        if (mlInput.getInputDataset() == null) {
            throw new IllegalArgumentException("Input dataset cannot be null.");
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on 4af169a

exceptionRule.expectMessage("The rerank result should contain index and relevance_score.");
function.apply(Arrays.asList(Map.of("test1", "value1")));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lets make a null test? just so we can understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null test can be implemented by referring superclass, but writing superclass's test case in test class for extended class can lead build failure when superclass is updated.

Map.of("index", 2, "relevanceScore", 0.7711548805236816),
Map.of("index", 0, "relevanceScore", 0.0025114635936915874),
Map.of("index", 1, "relevanceScore", 2.4876489987946115e-05),
Map.of("index", 3, "relevanceScore", 6.339210358419223e-06)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this will pass when just the first is in correct format but does not check the rest. Like mentioned early if you can change the validation to check each entry is in the right format

Copy link
Contributor Author

@tkykenmt tkykenmt Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add a test case to check a list having incorrect map. will update on the next commit.

    @Test
    public void process_WrongInput_NotCorrectMap() {
        exceptionRule.expect(IllegalArgumentException.class);
        exceptionRule.expectMessage("Rerank result should have both index and relevanceScore.");
        List<Map<String, Object>> rerankResults = List
                .of(
                        Map.of("index", 2, "relevanceScore", 0.7711548805236816),
                        Map.of("index", 0, "relevanceScore", 0.0025114635936915874),
                        Map.of("index", 1, "relevanceScore", 2.4876489987946115e-05),
                        Map.of("test1", "value1")
                );
        function.apply(rerankResults);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on 4af169a

@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval January 7, 2025 23:14 — with GitHub Actions Failure
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval January 7, 2025 23:14 — with GitHub Actions Failure
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval January 8, 2025 06:57 — with GitHub Actions Failure
@tkykenmt tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 8, 2025 06:57 — with GitHub Actions Inactive
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval January 8, 2025 08:04 — with GitHub Actions Failure
@brianf-aws
Copy link
Contributor

brianf-aws commented Jan 8, 2025

Looks like there is a build failure on a test I thought I fixed but I can see there is a different underlying problem within the IT involving encryption

VisualizationsToolIT > testVisualizationFound FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:54529/], URI [/_plugins/_ml/agents/bLjaRJQB515KRnslfdUv/_execute], status line [HTTP/1.1 500 Internal Server Error]
    {"status":500,"error":{"type":"AEADBadTagException","reason":"System Error","details":"Tag mismatch"}}
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:182)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:155)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:144)
        at app//org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound(VisualizationsToolIT.java:74)

    java.lang.AssertionError: The response failed to meet condition after 5 attempts. Attempted to perform GET : /_plugins/_ml/models/arjaRJQB515KRnsleNWv
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.waitResponseMeetingCondition(ToolIntegrationWithLLMTest.java:103)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.checkForModelUndeployedStatus(ToolIntegrationWithLLMTest.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.deleteModel(ToolIntegrationWithLLMTest.java:74)
        at

... 

2> REPRODUCE WITH: gradlew ':opensearch-ml-plugin:integTest' --tests "org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound" -Dtests.seed=AD7A0603B7C68274 -Dtests.security.manager=false -Dtests.locale=fr-GN -Dtests.timezone=America/Argentina/Buenos_Aires -Druntime.java=21
  2> org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:54529/], URI [/_plugins/_ml/agents/bLjaRJQB515KRnslfdUv/_execute], status line [HTTP/1.1 500 Internal Server Error]
    {"status":500,"error":{"type":"AEADBadTagException","reason":"System Error","details":"Tag mismatch"}}
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:182)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:155)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:144)
        at app//org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound(VisualizationsToolIT.java:74)

    java.lang.AssertionError: The response failed to meet condition after 5 attempts. Attempted to perform GET : /_plugins/_ml/models/arjaRJQB515KRnsleNWv
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.waitResponseMeetingCondition(ToolIntegrationWithLLMTest.java:103)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.checkForModelUndeployedStatus(ToolIntegrationWithLLMTest.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.deleteModel(ToolIntegrationWithLLMTest.java:74)
        at

See here and here

Copy link
Contributor

@brianf-aws brianf-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the code changes It covers all the possible scenarios I can think of.

exceptionRule.expectMessage("Rerank result is empty.");
function.apply(Arrays.asList(Map.of()));
}

@Test
public void process_WrongInput_NotCorrectMap() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_WrongInput_NotCorrectListOfMapsFormat(){
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated on the commit 8a4fdb2

@dhrubo-os
Copy link
Collaborator

Are you planning to update the corresponding blueprint in a separate PR?

@tkykenmt tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 9, 2025 00:52 — with GitHub Actions Inactive
@tkykenmt tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval January 9, 2025 00:52 — with GitHub Actions Inactive
@tkykenmt
Copy link
Contributor Author

tkykenmt commented Jan 9, 2025

@tkykenmt
Copy link
Contributor Author

tkykenmt commented Jan 9, 2025

@dhrubo-os
I submitted another PR for a blueprint and new tutorial.
#3352


if (!rerankResults.isEmpty()) {
Double[] scores = new Double[rerankResults.size()];
for (Map<?, ?> rerankResult : rerankResults) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to cast the elements as Map? We defined this as parameter: List<Map<String, Object>> rerankResults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. I agree with you. For the logic, we don't need to cast as Map but just specify data type for enhanced loop as follows.

for (Map rerankResult : rerankResults) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed on ededf78

@tkykenmt tkykenmt requested a deployment to ml-commons-cicd-env-require-approval January 10, 2025 01:39 — with GitHub Actions Waiting
@tkykenmt tkykenmt requested a deployment to ml-commons-cicd-env-require-approval January 10, 2025 05:41 — with GitHub Actions Waiting
@tkykenmt tkykenmt requested a deployment to ml-commons-cicd-env-require-approval January 10, 2025 05:41 — with GitHub Actions Waiting
@tkykenmt tkykenmt requested a review from dhrubo-os January 14, 2025 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Implement BedrockRerank[Pre|Post]ProcessFunction
3 participants