Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offload replication writes to IO threads #1485

Open
wants to merge 4 commits into
base: unstable
Choose a base branch
from

Conversation

uriyage
Copy link
Contributor

@uriyage uriyage commented Dec 24, 2024

This PR offloads the write to replica clients to IO threads.

Main Changes

  • Replica writes will be offloaded but only after the replica is in online mode..
  • Replica reads will still be done in the main thread to reduce complexity and because read traffic from replicas is negligible.

Implementation Details

In order to offload the writes, writeToReplica has been split into 2 parts:

  1. The write itself made by the IO thread or by the main thread
  2. The post write where we update the replication buffers refcount will be done in the main-thread after the write-job is done in the IO thread (similar to what we do with a regular client)

Additional Changes

  • In writeToReplica we now use writev in case more than 1 buffer exists.
  • Changed client nwritten field to ssize_t since with a replica the nwritten can theoretically exceed int size (not subject to NET_MAX_WRITES_PER_EVENT limit).
  • Changed parsing code to use memchr instead of strchr:
    • During parsing command, ASAN got stuck for unknown reason when called to strchr to look for the next \r
    • Adding assert for null-terminated querybuf didn't resolve the issue.
    • Switched to memchr as it's more secure and resolves the issue

Testing

  • Added integration tests
  • Added unit tests

Related issue: #761

@uriyage uriyage force-pushed the replication-offload branch from 3aee49b to 846c816 Compare December 24, 2024 07:24
Copy link

codecov bot commented Dec 24, 2024

Codecov Report

Attention: Patch coverage is 76.00000% with 24 lines in your changes missing coverage. Please review.

Project coverage is 70.85%. Comparing base (980a801) to head (846c816).
Report is 19 commits behind head on unstable.

Files with missing lines Patch % Lines
src/io_threads.c 0.00% 14 Missing ⚠️
src/networking.c 88.23% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1485      +/-   ##
============================================
+ Coverage     70.78%   70.85%   +0.07%     
============================================
  Files           119      119              
  Lines         64691    64928     +237     
============================================
+ Hits          45790    46005     +215     
- Misses        18901    18923      +22     
Files with missing lines Coverage Δ
src/replication.c 87.54% <100.00%> (+0.03%) ⬆️
src/server.h 100.00% <ø> (ø)
src/networking.c 88.39% <88.23%> (+0.09%) ⬆️
src/io_threads.c 6.82% <0.00%> (-0.70%) ⬇️

... and 25 files with indirect coverage changes

Comment on lines +2044 to +2057
listNode *last_node;
size_t bufpos;

serverAssert(c->bufpos == 0 && listLength(c->reply) == 0);
while (clientHasPendingReplies(c)) {
replBufBlock *o = listNodeValue(c->ref_repl_buf_node);
serverAssert(o->used >= c->ref_block_pos);

/* Send current block if it is not fully sent. */
if (o->used > c->ref_block_pos) {
nwritten = connWrite(c->conn, o->buf + c->ref_block_pos, o->used - c->ref_block_pos);
if (nwritten <= 0) {
c->write_flags |= WRITE_FLAGS_WRITE_ERROR;
return;
}
c->nwritten += nwritten;
c->ref_block_pos += nwritten;
/* Determine the last block and buffer position based on thread context */
if (inMainThread()) {
last_node = listLast(server.repl_buffer_blocks);
if (!last_node) return;
bufpos = ((replBufBlock *)listNodeValue(last_node))->used;
} else {
last_node = c->io_last_reply_block;
serverAssert(last_node != NULL);
bufpos = c->io_last_bufpos;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simplifying this code to reduce duplication and improve clarity, eg:

    listNode *last_node = inMainThread() ? listLast(server.repl_buffer_blocks) : c->io_last_reply_block;
    if (!last_node) return;

    size_t bufpos = inMainThread() ? 
        ((replBufBlock *)listNodeValue(last_node))->used :  c->io_last_bufpos;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the current version is clearer since we check inMainThread() only once. Additionally, we handle the !last_node case differently depending on whether we are in the main thread or not.

Comment on lines +212 to +217
replBufBlock *block = zmalloc(sizeof(replBufBlock) + 128);
block->size = 128;
block->used = 100;
block->refcount = 1;

listAddNodeTail(server.repl_buffer_blocks, block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this code is duplicated, I suggesting refactoring this into:
appendReplBufBlock(size_t size, size_t used) ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size and used values differ each time

Copy link
Member

@xbasel xbasel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi - I tested this code and I got this error. I did not see it with HEAD~1

2576783:S 06 Jan 2025 16:06:12.381 # Protocol error (Master using the inline protocol. Desync?) from client: id=6 addr=127.0.0.1:6379 laddr=127.0.0.1:56284 fd=11 name=*redacted* age=43 idle=0 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=22870 qbuf-free=18084 argv-mem=0 multi-mem=0 rbs=1024 rbp=42 obl=0 oll=0 omem=0 tot-mem=42880 events=r cmd=set user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=218001645 tot-net-out=1809 tot-cmds=1267450. Query buffer during protocol error: 'SET..$16..key:__rand_int__..$128..VXKeHogKgJ=[5V9_X^b?48OKF2jGA<' (... more 896 bytes ...) 'mcS2^N1J?ELSX@CfKQ7cM5aea\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\Rm\..'

I did run: src/valkey-benchmark -t set -d 128 -n 5000000 --threads 10
valkey with 4 io threads

Looks like data is corrupted, look at the second message:
*3\r $3\r SET\r $16\r key:000000630274\r $128\r VXKeHogKgJ=[5V9_X^b?48OKF2jGA<f:iR@50o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r *3\r $3\r SET\r $16\r key:000000420097\r $128\r VXKeHogKgJ=[5V9_X^b?48OKF2jGA<f:iR@50o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r

The payload o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r is duplicated.

Signed-off-by: Uri Yagelnik <[email protected]>
@uriyage
Copy link
Contributor Author

uriyage commented Jan 8, 2025

fyi - I tested this code and I got this error. I did not see it with HEAD~1

Many thanks @xbasel for finding it.
I fixed the issue, it was a stop condition that was mistakenly removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants