Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dict with new hashtable: hash datatype #1502

Merged
merged 3 commits into from
Jan 13, 2025

Conversation

SoftlyRaining
Copy link
Contributor

@SoftlyRaining SoftlyRaining commented Jan 2, 2025

This PR replaces dict with the new hashtable data structure in the HASH datatype. There is a new struct for hashtable items which contains a pointer to value sds string and the embedded key sds string. These values were previously stored in dictEntry. This structure is kept opaque so we can easily add small value embedding or other optimizations in the future.

closes #1095

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 91.37255% with 22 lines in your changes missing coverage. Please review.

Project coverage is 70.92%. Comparing base (9e02049) to head (70c9d6e).
Report is 14 commits behind head on unstable.

Files with missing lines Patch % Lines
src/defrag.c 73.91% 6 Missing ⚠️
src/module.c 0.00% 6 Missing ⚠️
src/rdb.c 83.33% 5 Missing ⚠️
src/t_hash.c 97.95% 3 Missing ⚠️
src/db.c 92.30% 1 Missing ⚠️
src/debug.c 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1502      +/-   ##
============================================
- Coverage     70.93%   70.92%   -0.01%     
============================================
  Files           120      120              
  Lines         64981    64988       +7     
============================================
- Hits          46095    46094       -1     
- Misses        18886    18894       +8     
Files with missing lines Coverage Δ
src/aof.c 80.23% <100.00%> (ø)
src/lazyfree.c 86.11% <100.00%> (ø)
src/object.c 82.04% <100.00%> (-0.11%) ⬇️
src/server.c 87.60% <100.00%> (+0.02%) ⬆️
src/server.h 100.00% <ø> (ø)
src/db.c 89.56% <92.30%> (+0.07%) ⬆️
src/debug.c 52.40% <50.00%> (+0.27%) ⬆️
src/t_hash.c 96.13% <97.95%> (-0.01%) ⬇️
src/rdb.c 76.94% <83.33%> (+0.01%) ⬆️
src/defrag.c 89.93% <73.91%> (-0.04%) ⬇️
... and 1 more

... and 8 files with indirect coverage changes

@SoftlyRaining SoftlyRaining changed the title [draft] Replace dict with new hashtable: sorted set datatype [draft] Replace dict with new hashtable: hash datatype Jan 2, 2025
@zuiderkwast zuiderkwast self-requested a review January 2, 2025 09:45
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through it. I looks strait-forward. The only material thing to focus on seems to be the hashTypeEntry abstraction.

It would be good to have a clear abstraction and separation, even within t_hash.c, so we don't have too many direct accesses into the hashTypeEntry internals. It will help us later when we want to add things like TTL and embedded value.

src/server.h Outdated Show resolved Hide resolved
src/defrag.c Outdated Show resolved Hide resolved
@SoftlyRaining SoftlyRaining force-pushed the hash-datatype branch 2 times, most recently from 61772d3 to d357204 Compare January 4, 2025 00:22
@SoftlyRaining SoftlyRaining marked this pull request as ready for review January 8, 2025 21:34
@SoftlyRaining SoftlyRaining changed the title [draft] Replace dict with new hashtable: hash datatype Replace dict with new hashtable: hash datatype Jan 8, 2025
@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) labels Jan 9, 2025
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just a few nits.

Introducing the sds_const type can preferably be a follow-up. It might trigger a long discussion.

src/t_hash.c Outdated
Comment on lines 39 to 40
/* takes ownership of value, does not take ownership of field */
hashTypeEntry *hashTypeCreateEntry(const sds field, sds value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • const sds is meaningless. Is it time to introduce the const_sds typedef in sds.h now?

  • Take ownership of field but not value seems asymmetrical, but anything else is suboptimal, right? That's what we do for keys too IIRC. It's OK.

  • Shall we add a comment for the entry API, like the one below for the hash type API?

/*-----------------------------------------------------------------------------
 * Hash entry API
 *----------------------------------------------------------------------------*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, let's not introduce const_sds in this change. I'll remove the const here - it does not mean const char* as I intended when I wrote it.

I'll add the comment header, that's a good idea :)

src/t_hash.c Outdated Show resolved Hide resolved
src/defrag.c Show resolved Hide resolved
Comment on lines +51 to +53
sds hashTypeEntryGetField(const hashTypeEntry *entry) {
const unsigned char *field = entry->field_data + entry->field_offset;
return (sds)field;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the GetValue) should probably return const_sds if we add that type.

src/t_hash.c Show resolved Hide resolved
src/t_hash.c Show resolved Hide resolved
if (withvalues) value = hashTypeCurrentObjectNewSds(&hi, OBJ_HASH_VALUE);
ret = dictAdd(d, key, value);
ret = dictAdd(d, field, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: Let's replace this sdsReplyDictType with a hashtable.

@@ -1069,25 +1116,25 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) {
else {
/* Hashtable encoding (generic implementation) */
unsigned long added = 0;
listpackEntry key, value;
listpackEntry field, value;
dict *d = dictCreate(&hashDictType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: Let's replace this dict with a hashtable. Then we can delete that dict type.

Signed-off-by: Rain Valentine <[email protected]>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zuiderkwast zuiderkwast merged commit d13aad4 into valkey-io:unstable Jan 13, 2025
57 of 59 checks passed
proost pushed a commit to proost/valkey that referenced this pull request Jan 17, 2025
This PR replaces dict with the new hashtable data structure in the HASH
datatype. There is a new struct for hashtable items which contains a
pointer to value sds string and the embedded key sds string. These
values were previously stored in dictEntry. This structure is kept
opaque so we can easily add small value embedding or other optimizations
in the future.

closes valkey-io#1095

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: proost <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use new hashtable for hashes
2 participants