Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Regex queries #35

Closed
valencik opened this issue Mar 30, 2023 · 3 comments
Closed

Add support for Regex queries #35

valencik opened this issue Mar 30, 2023 · 3 comments
Assignees
Labels
good first issue Good for newcomers query Related to executing queries given an index

Comments

@valencik
Copy link
Contributor

Add in Lucille cozydev-pink/lucille#21

I think we can take the regex string value, build a regex and iterate through the terms array collecting matches.
This obviously won't be particularly efficient, but it's probably fast enough for now.
(And storing an FSA in the index like Lucene does is just too complicated for now)

@valencik valencik added query Related to executing queries given an index good first issue Good for newcomers labels Mar 30, 2023
@VigneshSK17
Copy link
Contributor

Hi @valencik, I am a beginner to the world of open-source contribution and wanted to get started by contributing to this issue. I found out about protosearch through Google Summer of Code and found the scaladoc search project interest. I hope you can guide me through the PR process, as I have implemented what I believe is Regex query support in the forked branch referenced above.

@valencik
Copy link
Contributor Author

Hey @VigneshSK17, thanks so much for your interest and contribution!

Heads up: I have possibly just created some merge conflicts for you by merging a new PR that touches some of the same areas you've changed. Hopefully you can merge latest main into your work without too much trouble. But let me know if you need a hand.

Some thoughts on your approach here:

  • regex creation. There's no validation in Lucille on the correctness of the regex pattern, so it's possible to have the regex creation (calling .r on the string) fail by throwing a PatternSyntaxException. We should wrap the creation of the Regex in a try/catch and return a Left with a descriptive error message if it does fail. This can all still happen in your regexSearch method in IndexerSearcher.

  • TermDictionary. Once we have a valid Regex, I think we could put the term matching inside TermDictionary similar to what we do with termsForPrefix. Perhaps in a termsForRegex(regex: Regex): List[String] method.

  • tests. Can we add a test with a string that fails regex compilation and assert we get a Left. "[a" is an example of a failing string.

Let me know if you have any questions. And feel free to open a PR now and we can continue discussion there.
:)

@VigneshSK17
Copy link
Contributor

I created a PR #188 and will work on fixing the merge conflicts and your suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers query Related to executing queries given an index
Projects
None yet
Development

No branches or pull requests

2 participants