Add support for Regex queries #35

valencik · 2023-03-30T01:58:02Z

I think we can take the regex string value, build a regex and iterate through the terms array collecting matches.
This obviously won't be particularly efficient, but it's probably fast enough for now.
(And storing an FSA in the index like Lucene does is just too complicated for now)

VigneshSK17 · 2024-03-24T03:52:23Z

Hi @valencik, I am a beginner to the world of open-source contribution and wanted to get started by contributing to this issue. I found out about protosearch through Google Summer of Code and found the scaladoc search project interest. I hope you can guide me through the PR process, as I have implemented what I believe is Regex query support in the forked branch referenced above.

valencik · 2024-03-24T17:25:19Z

Hey @VigneshSK17, thanks so much for your interest and contribution!

Heads up: I have possibly just created some merge conflicts for you by merging a new PR that touches some of the same areas you've changed. Hopefully you can merge latest main into your work without too much trouble. But let me know if you need a hand.

Some thoughts on your approach here:

regex creation. There's no validation in Lucille on the correctness of the regex pattern, so it's possible to have the regex creation (calling .r on the string) fail by throwing a PatternSyntaxException. We should wrap the creation of the Regex in a try/catch and return a Left with a descriptive error message if it does fail. This can all still happen in your regexSearch method in IndexerSearcher.
TermDictionary. Once we have a valid Regex, I think we could put the term matching inside TermDictionary similar to what we do with termsForPrefix. Perhaps in a termsForRegex(regex: Regex): List[String] method.
tests. Can we add a test with a string that fails regex compilation and assert we get a Left. "[a" is an example of a failing string.

Let me know if you have any questions. And feel free to open a PR now and we can continue discussion there.
:)

VigneshSK17 · 2024-03-24T21:20:44Z

I created a PR #188 and will work on fixing the merge conflicts and your suggestions!

valencik added query Related to executing queries given an index good first issue Good for newcomers labels Mar 30, 2023

valencik assigned VigneshSK17 Mar 24, 2024

VigneshSK17 mentioned this issue Mar 24, 2024

Support for regex queries #188

Merged

valencik closed this as completed in 4915e61 Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Regex queries #35

Add support for Regex queries #35

valencik commented Mar 30, 2023

VigneshSK17 commented Mar 24, 2024

valencik commented Mar 24, 2024

VigneshSK17 commented Mar 24, 2024

Add support for Regex queries #35

Add support for Regex queries #35

Comments

valencik commented Mar 30, 2023

VigneshSK17 commented Mar 24, 2024

valencik commented Mar 24, 2024

VigneshSK17 commented Mar 24, 2024