Offload replication writes to IO threads #1485

uriyage · 2024-12-24T07:17:24Z

This PR offloads the write to replica clients to IO threads.

Main Changes

Replica writes will be offloaded but only after the replica is in online mode..
Replica reads will still be done in the main thread to reduce complexity and because read traffic from replicas is negligible.

Implementation Details

In order to offload the writes, writeToReplica has been split into 2 parts:

The write itself made by the IO thread or by the main thread
The post write where we update the replication buffers refcount will be done in the main-thread after the write-job is done in the IO thread (similar to what we do with a regular client)

Additional Changes

In writeToReplica we now use writev in case more than 1 buffer exists.
Changed client nwritten field to ssize_t since with a replica the nwritten can theoretically exceed int size (not subject to NET_MAX_WRITES_PER_EVENT limit).
Changed parsing code to use memchr instead of strchr:
- During parsing command, ASAN got stuck for unknown reason when called to strchr to look for the next \r
- Adding assert for null-terminated querybuf didn't resolve the issue.
- Switched to memchr as it's more secure and resolves the issue

Testing

Added integration tests
Added unit tests

Related issue: #761

Signed-off-by: Uri Yagelnik <[email protected]>

codecov · 2024-12-24T07:39:00Z

Codecov Report

Attention: Patch coverage is 76.00000% with 24 lines in your changes missing coverage. Please review.

Project coverage is 70.85%. Comparing base (980a801) to head (846c816).
Report is 19 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/io_threads.c	0.00%	14 Missing ⚠️
src/networking.c	88.23%	10 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1485      +/-   ##
============================================
+ Coverage     70.78%   70.85%   +0.07%     
============================================
  Files           119      119              
  Lines         64691    64928     +237     
============================================
+ Hits          45790    46005     +215     
- Misses        18901    18923      +22

Files with missing lines	Coverage Δ
src/replication.c	`87.54% <100.00%> (+0.03%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/networking.c	`88.39% <88.23%> (+0.09%)`	⬆️
src/io_threads.c	`6.82% <0.00%> (-0.70%)`	⬇️

... and 25 files with indirect coverage changes

xbasel · 2025-01-02T13:31:39Z

src/networking.c

+    listNode *last_node;
+    size_t bufpos;
+
    serverAssert(c->bufpos == 0 && listLength(c->reply) == 0);
-    while (clientHasPendingReplies(c)) {
-        replBufBlock *o = listNodeValue(c->ref_repl_buf_node);
-        serverAssert(o->used >= c->ref_block_pos);
-
-        /* Send current block if it is not fully sent. */
-        if (o->used > c->ref_block_pos) {
-            nwritten = connWrite(c->conn, o->buf + c->ref_block_pos, o->used - c->ref_block_pos);
-            if (nwritten <= 0) {
-                c->write_flags |= WRITE_FLAGS_WRITE_ERROR;
-                return;
-            }
-            c->nwritten += nwritten;
-            c->ref_block_pos += nwritten;
+    /* Determine the last block and buffer position based on thread context */
+    if (inMainThread()) {
+        last_node = listLast(server.repl_buffer_blocks);
+        if (!last_node) return;
+        bufpos = ((replBufBlock *)listNodeValue(last_node))->used;
+    } else {
+        last_node = c->io_last_reply_block;
+        serverAssert(last_node != NULL);
+        bufpos = c->io_last_bufpos;
+    }


Consider simplifying this code to reduce duplication and improve clarity, eg:

listNode *last_node = inMainThread() ? listLast(server.repl_buffer_blocks) : c->io_last_reply_block; if (!last_node) return; size_t bufpos = inMainThread() ? ((replBufBlock *)listNodeValue(last_node))->used : c->io_last_bufpos;

I believe the current version is clearer since we check inMainThread() only once. Additionally, we handle the !last_node case differently depending on whether we are in the main thread or not.

xbasel · 2025-01-05T22:44:35Z

src/unit/test_networking.c

+        replBufBlock *block = zmalloc(sizeof(replBufBlock) + 128);
+        block->size = 128;
+        block->used = 100;
+        block->refcount = 1;
+
+        listAddNodeTail(server.repl_buffer_blocks, block);


Looks like this code is duplicated, I suggesting refactoring this into:
appendReplBufBlock(size_t size, size_t used) ;

The size and used values differ each time

xbasel

fyi - I tested this code and I got this error. I did not see it with HEAD~1

2576783:S 06 Jan 2025 16:06:12.381 # Protocol error (Master using the inline protocol. Desync?) from client: id=6 addr=127.0.0.1:6379 laddr=127.0.0.1:56284 fd=11 name=*redacted* age=43 idle=0 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=22870 qbuf-free=18084 argv-mem=0 multi-mem=0 rbs=1024 rbp=42 obl=0 oll=0 omem=0 tot-mem=42880 events=r cmd=set user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=218001645 tot-net-out=1809 tot-cmds=1267450. Query buffer during protocol error: 'SET..$16..key:__rand_int__..$128..VXKeHogKgJ=[5V9_X^b?48OKF2jGA<' (... more 896 bytes ...) 'mcS2^N1J?ELSX@CfKQ7cM5aea\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\Rm\..'

I did run: src/valkey-benchmark -t set -d 128 -n 5000000 --threads 10
valkey with 4 io threads

Looks like data is corrupted, look at the second message:
*3\r $3\r SET\r $16\r key:000000630274\r $128\r VXKeHogKgJ=[5V9_X^b?48OKF2jGA<f:iR@50o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r *3\r $3\r SET\r $16\r key:000000420097\r $128\r VXKeHogKgJ=[5V9_X^b?48OKF2jGA<f:iR@50o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r

The payload o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r is duplicated.

Signed-off-by: Uri Yagelnik <[email protected]>

uriyage · 2025-01-08T19:11:20Z

fyi - I tested this code and I got this error. I did not see it with HEAD~1

Many thanks @xbasel for finding it.
I fixed the issue, it was a stop condition that was mistakenly removed.

uriyage added 3 commits December 17, 2024 15:58

Support primary IO offload to IO threads

ea5ce9c

Signed-off-by: Uri Yagelnik <[email protected]>

Addressed PR comments

39f0a48

Signed-off-by: Uri Yagelnik <[email protected]>

Offload replication writes to IO threads

846c816

Signed-off-by: Uri Yagelnik <[email protected]>

uriyage force-pushed the replication-offload branch from 3aee49b to 846c816 Compare December 24, 2024 07:24

xbasel reviewed Jan 2, 2025

View reviewed changes

xbasel reviewed Jan 5, 2025

View reviewed changes

xbasel reviewed Jan 6, 2025

View reviewed changes

arukiidou mentioned this pull request Jan 7, 2025

Followup items from #758 Async IO threads #761

Open

10 tasks

fix incorrect stop condition

8b2137d

Signed-off-by: Uri Yagelnik <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offload replication writes to IO threads #1485

Offload replication writes to IO threads #1485

uriyage commented Dec 24, 2024 •

edited by zuiderkwast

Loading

codecov bot commented Dec 24, 2024

xbasel Jan 2, 2025

uriyage Jan 6, 2025

xbasel Jan 5, 2025

uriyage Jan 6, 2025

xbasel left a comment •

edited

Loading

uriyage commented Jan 8, 2025

Offload replication writes to IO threads #1485

Are you sure you want to change the base?

Offload replication writes to IO threads #1485

Conversation

uriyage commented Dec 24, 2024 • edited by zuiderkwast Loading

Main Changes

Implementation Details

Additional Changes

Testing

codecov bot commented Dec 24, 2024

Codecov Report

xbasel Jan 2, 2025

Choose a reason for hiding this comment

uriyage Jan 6, 2025

Choose a reason for hiding this comment

xbasel Jan 5, 2025

Choose a reason for hiding this comment

uriyage Jan 6, 2025

Choose a reason for hiding this comment

xbasel left a comment • edited Loading

Choose a reason for hiding this comment

uriyage commented Jan 8, 2025

uriyage commented Dec 24, 2024 •

edited by zuiderkwast

Loading

xbasel left a comment •

edited

Loading