Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Index "analysis" section is not properly compared against the existing section #882

Open
rhoens opened this issue Jan 9, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@rhoens
Copy link

rhoens commented Jan 9, 2025

What is the bug?

On these lines the existing_analysis section is checked that it exists, and is the same, as in analysis. The problem is the existing_analysis section was not properly converted back to python types, so you can get diffs such as the following:

existing_analysis[section][k] = {'max_shingle_size': '3', 'min_shingle_size': '2', 'output_unigrams': 'false', 'type': 'shingle'}
analysis[section][k] = {'max_shingle_size': 3, 'min_shingle_size': 2, 'output_unigrams': False, 'type': 'shingle'}

These should obviously match, but they don't.

How can one reproduce the bug?

Create an index with an analysis section (e.g.,:

class TestDocument(Document):
    text = Text(
        required=True,
        fields={
            "raw": Text(),
            "2_3_s": Text(analyzer="2_3_shingles"),
        },
    )
    # NOTE: You must have at least this below for your index, but change the name to something else.
    class Index:
        name = "test_index"
        settings: Dict[str, Any] = {
            "index": {
                "max_ngram_diff": 2,
            },
            "analysis": {
                "analyzer": {
                    "2_3_shingles": {
                        "tokenizer": "standard",
                        "filter": ["asciifolding", "2_3_shingles"],
                    },
                },
                "filter": {
                    "2_3_shingles": {
                        "type": "shingle",
                        "min_shingle_size": 2,
                        "max_shingle_size": 3,
                        "output_unigrams": False,
                    },
                },
            },
        }

client = ...
TestDocument.init(using=client)

Then add a field to the document and try again, this will fail.

What is the expected behavior?

The existing_analysis and analysis sections should be properly deduped

What is your host/environment?

AWS

Do you have any screenshots?

Not applicable

Do you have any additional context?

None currently

@rhoens rhoens added bug Something isn't working untriaged Need triage labels Jan 9, 2025
@dblock
Copy link
Member

dblock commented Jan 9, 2025

@rhoens Thanks for reporting this, want to try to write a failing test and maybe a fix?

@dblock dblock removed the untriaged Need triage label Jan 9, 2025
@rhoens
Copy link
Author

rhoens commented Jan 9, 2025

@dblock is there some code inside of opensearchpy that would convert the dictionary to "opensearch native" format (i.e., the format that we'd get back from the db)?

@dblock
Copy link
Member

dblock commented Jan 9, 2025

@rhoens I don't know. Maybe @saimedhi?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants