Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement searchv2 #3058

Open
roman-khimov opened this issue Dec 18, 2024 · 7 comments
Open

Implement searchv2 #3058

roman-khimov opened this issue Dec 18, 2024 · 7 comments
Labels
feature Completely new functionality I1 High impact S1 Highly significant U2 Seriously planned
Milestone

Comments

@roman-khimov
Copy link
Member

Is your feature request related to a problem? Please describe.

I'm always frustrated when we don't have an implementation for nspcc-dev/neofs-api#314.

Describe the solution you'd like

The per-container DB should be structured like:

Given
DELIM = 0x00 // Not a valid UTF-8, can't be attribute name or value
KEY // Attribute key or special attribute that can be searched for
VALUE // Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE
PREFIXA // And B/C/D, some byte prefixes
OID // Object ID

For each object the following keys are created (with no values):
PREFIXA_OID // OIDs only to list objects without filters
PREFIXB_KEY_DELIM_VALUE_DELIM_OID // The main search workhorse, created for all keys
PREFIXC_OID_KEY_DELIM_VALUE // Auxiliary for secondary filters and additional data returned

The mechanics is:

  • node accepting the request forwards it to other nodes as in case of original search
  • it accepts answers, merges them and limits to the number requested (or max)
  • it generates the cursor corresponding to the result returned to client (more on cursor below)
  • reply is ready and sent

Each node does the following:

  • if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is
  • if there are filters, the first one is magic: whatever the key/action/value is there is used to position DB cursor into PREFIXB key if match is EQUAL/NOT_EQUAL/PREFIX/NUM* (don't forget about negative numbers)
  • other matches can be checked for using the same key (key>N && key <M), this can shortcut the search more quickly for numerics
  • iterating over the DB we check all matches via PREFIXC while the first one works (when it's not, we're done)
  • search is limited by the number of elements requested or max (1000), so we return results earlier if we have enough elements
  • we add requested fields of matching elements using PREFIXC
  • a cursor is returned if we're not yet at the end, it's KEY_DELIM_VALUE_DELIM_OID of the last element encoded as base64 or base58, this allows to continue easily
  • if the first filter is NOT_PRESENT we're also looping over PREFIXA and checking PREFIXC

Describe alternatives you've considered

SQL, various other types of DBs. But the scheme above should be sufficient for our primary cases now.

Additional context

#2990, #2757, #2989, nspcc-dev/neofs-api#306

@roman-khimov roman-khimov added this to the v0.45.0 milestone Dec 18, 2024
@roman-khimov roman-khimov added U2 Seriously planned S1 Highly significant I1 High impact feature Completely new functionality labels Dec 18, 2024
@roman-khimov
Copy link
Member Author

Caveat: creating a cursor from merged values can be non-trivial if attribute is not included into the requested list. It can be degraded to a simple OID then (complicating continuation somewhat) and in general most of use cases do need attribute values, but still.

@roman-khimov
Copy link
Member Author

Caveat 2: numeric values might require an additional prefix anyway since we can have Index=100500 in one object and Index=abcd in another, using the same prefix they'd be mixed and we can end up treating strings as numbers.

@cthulhu-rider
Copy link
Contributor

note: per-container DB describes virtual structure, physically it is split within existing metabases

@roman-khimov
Copy link
Member Author

Yes, we need to limit changes to this specific feature (expose API as early as possible) and deal with associated meta code (GC and alike) in future. Search is still possible with multiple DBs since results can be merged similar to the way results from different nodes are merged.

@cthulhu-rider
Copy link
Contributor

cthulhu-rider commented Jan 10, 2025

Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE

choice is obvious for system fields. For example, owner ID is a string while payload size is an integer

for user-defined attributes it is not so obvious. Like here #3058 (comment). In current protocol, there is no way to determine whether user attribute is numeric or not. So, I rly doubt storing them in various formats is legit. But we can resolve this on search query processing. In original search, any non-integer attribute mismatches any numeric query. Do we wanna change this behaviour for SearchV2 somehow?

@roman-khimov u also mentioned some special prefix, could u pls elaborate on this thought?

@roman-khimov
Copy link
Member Author

You can only do this content-based, just like you do this now for old search. The only difference is that the choice is made when processing the object instead of when processing the search request.

Special prefix means splitting PREFIXB into B1 and B2 for numeric and string data.

@cthulhu-rider
Copy link
Contributor

cthulhu-rider commented Jan 13, 2025

if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is

shouldnt cursor be OID + values of requested attributes to sort/continue in PREFIXC in this case?

UPD: seems like no, missed this requirement https://github.com/nspcc-dev/neofs-api/blob/9f1f12866a4742adb7778c51bd632cd240f81262/object/service.proto#L554-L555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Completely new functionality I1 High impact S1 Highly significant U2 Seriously planned
Projects
None yet
Development

No branches or pull requests

2 participants