Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] post_process_function on rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md #3247

Closed
tkykenmt opened this issue Dec 3, 2024 · 3 comments · Fixed by #3296
Assignees
Labels
bug Something isn't working

Comments

@tkykenmt
Copy link
Contributor

tkykenmt commented Dec 3, 2024

What is the bug?
post_process_function on rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md has a logic issue which returns inappropriate sorted result. Need to fix with updated code.

https://github.com/opensearch-project/ml-commons/blob/main/docs/tutorials/rerank/rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What is the expected behavior?
A clear and concise description of what you expected to happen.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@mingshl
Copy link
Collaborator

mingshl commented Dec 3, 2024

Hi @tkykenmt thanks for making the fix, but more details to the PR would help the community understand the issue better,

This is how I interpret your fix if I get this right,

previously, the cross encorder model return the model output ordering the highest scores first, but it might cause the mismatch with the document order,

for example:

sample model response is ordering by the highest score first and rerank the document index num,

 [
  {index: 0, score: 0.95},
  {index: 2, score: 0.3},
  {index: 1, score: 0.2}
]

and the document was sending to the model in the order of

 [
  {index: 0},
  {index: 1},
  {index: 2}
]

and the post process function handles the model output following original document index sorting, so the scores are adding to the proper documents.

after the new pro-processing function, it will be

[
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.95]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.2]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.3]}
]

please correct me if I get this wrong

@tkykenmt
Copy link
Contributor Author

tkykenmt commented Dec 4, 2024

Hi @mingshl, thank you for clarification. My intention is as follows.

sample model response is ordering by the highest score first and rerank the document index num:

 [
  {index: 0, score: 0.95},
  {index: 2, score: 0.3},
  {index: 1, score: 0.2}
]

and the document was sending to the model in the order of:

 [
  {index: 0},
  {index: 1},
  {index: 2}
]

Current post process function also aims to handles the model output following original document index sorting, but reorder logic is incorrect. The logic can generate result with accidentally overwritten another item such as:

[
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.95]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.2]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.2]},
]

After the new pro-processing function, the issue should be fixed and result will be:

[
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.95]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.2]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.3]}
]

@mingshl
Copy link
Collaborator

mingshl commented Dec 4, 2024

Hi @mingshl, thank you for clarification. My intention is as follows.

sample model response is ordering by the highest score first and rerank the document index num:

 [
  {index: 0, score: 0.95},
  {index: 2, score: 0.3},
  {index: 1, score: 0.2}
]

and the document was sending to the model in the order of:

 [
  {index: 0},
  {index: 1},
  {index: 2}
]

Current post process function also aims to handles the model output following original document index sorting, but reorder logic is incorrect. The logic can generate result with accidentally overwritten another item such as:

[
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.95]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.2]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.2]},
]

After the new pro-processing function, the issue should be fixed and result will be:

[
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.95]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.2]},
  {"name": "similarity", "data_type": "FLOAT32", "shape": [1], "data": [0.3]}
]

thanks for the explaination, this helps a lot to understand the issue, can you add this to the issue description as well? Thank you!

tkykenmt added a commit to tkykenmt/ml-commons that referenced this issue Dec 25, 2024
…h_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md (opensearch-project#3247)

Signed-off-by: tkykenmt <[email protected]>
ylwu-amzn added a commit that referenced this issue Jan 3, 2025
…del_deployed_on_Sagemaker.md (#3296)

* fix post_process_function bug on sort results for rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md (#3247)

Signed-off-by: tkykenmt <[email protected]>

* fix typo

Signed-off-by: Yaliang Wu <[email protected]>

---------

Signed-off-by: tkykenmt <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>
Co-authored-by: Yaliang Wu <[email protected]>
@github-project-automation github-project-automation bot moved this from In Progress to Done in ml-commons projects Jan 3, 2025
opensearch-trigger-bot bot pushed a commit that referenced this issue Jan 3, 2025
…del_deployed_on_Sagemaker.md (#3296)

* fix post_process_function bug on sort results for rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md (#3247)

Signed-off-by: tkykenmt <[email protected]>

* fix typo

Signed-off-by: Yaliang Wu <[email protected]>

---------

Signed-off-by: tkykenmt <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>
Co-authored-by: Yaliang Wu <[email protected]>
(cherry picked from commit d5f47b4)
ylwu-amzn pushed a commit that referenced this issue Jan 3, 2025
…del_deployed_on_Sagemaker.md (#3296) (#3331)

* fix post_process_function bug on sort results for rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md (#3247)

Signed-off-by: tkykenmt <[email protected]>

* fix typo

Signed-off-by: Yaliang Wu <[email protected]>

---------

Signed-off-by: tkykenmt <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>
Co-authored-by: Yaliang Wu <[email protected]>
(cherry picked from commit d5f47b4)

Co-authored-by: Takayuki Enomoto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment