Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valkey Patch Release 7.2.8 #1516

Merged
merged 24 commits into from
Jan 8, 2025
Merged

Valkey Patch Release 7.2.8 #1516

merged 24 commits into from
Jan 8, 2025

Conversation

hpatro
Copy link
Collaborator

@hpatro hpatro commented Jan 6, 2025

Check release notes

enjoy-binbin and others added 6 commits January 6, 2025 22:59
…y-io#1171)

The client that was killed by FUNCTION KILL received a reply of
SCRIPT KILL and the server log also showed SCRIPT KILL.

Signed-off-by: Binbin <[email protected]>
…memory usage (valkey-io#1213)

The command argument strings created while parsing inline commands (see
`processInlineBuffer()`) can contain free capacity. Since some commands
,such as `SET`, store these strings in the database, that free capacity
increases the memory usage. In the worst case, it could double the
memory usage.

This only occurs if the inline command format is used. The argument
strings are built by appending character by character in
`sdssplitargs()`. Regular RESP commands are not affected.

This change trims the strings within `processInlineBuffer()`.

this?

When the command argument string is packed into an object,
`trimStringObjectIfNeeded()` is called.

This does only trim the string if it is larger than
`PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would
ever need trimming if the command it sent using the bulk string format.

We could modify this condition, but that would potentially have a
performance impact on commands using the bulk format. Since those make
up for the vast majority of executed commands, limiting this change to
inline commands seems prudent.

* 1 million `SET [key] [value]` commands
* Random keys (16 bytes)
* 600 bytes values

Memory usage without this change:

```
used_memory:1089327888
used_memory_human:1.01G
used_memory_rss:1131696128
used_memory_rss_human:1.05G
used_memory_peak:1089348264
used_memory_peak_human:1.01G
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:1040025088
used_memory_dataset_perc:95.55%
```

Memory usage with this change:
```
used_memory:705327888
used_memory_human:672.65M
used_memory_rss:718802944
used_memory_rss_human:685.50M
used_memory_peak:705348256
used_memory_peak_human:672.67M
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:656025088
used_memory_dataset_perc:93.13%
```

If the same experiment is repeated using the normal RESP array of bulk
string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB
with and without of this change.

If a replica is attached, its memory usage is 672MB with and without
this change, since the replication link never uses inline commands.

Signed-off-by: Stefan Mueller <[email protected]>
…upKey (valkey-io#1499)

When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid
updating the last access time in `lookupKey`. An exception must be made
for the `TOUCH` command which must always update the key.

When called from a script, `server.executing_client` will point to the
`TOUCH` command, while `server.current_client` will point to e.g. an
`EVAL` command. So, we must use the former to find out the currently
executing command if defined.

This fix addresses the issue where TOUCH wasn't updating key access
times when called from scripts like EVAL.

Fixes valkey-io#1498

Signed-off-by: Simon Baatz <[email protected]>
Co-authored-by: Binbin <[email protected]>
The explanation on the original commit was wrong. Key based access must
have a `~` in order to correctly configure whey key prefixes to apply
the selector to. If this is missing, a server assert will be triggered
later.

Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: YaacovHazan <[email protected]>
Reset GC state before closing the lua VM to prevent user data to be
wrongly freed while still might be used on destructor callbacks.

Created and publish by Redis in their OSS branch.

Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: YaacovHazan <[email protected]>
@hpatro hpatro requested a review from madolson January 6, 2025 23:47
@hpatro
Copy link
Collaborator Author

hpatro commented Jan 7, 2025

Test failures:

00:12:05> Instance #5 is still a slave after some time (no failover): FAILED: Expected [RI 5 role] eq {master} (context: type eval line 3 cmd {assert {[RI 5 role] eq {master}}} proc ::test)
(Jumping to next unit after error)
FAILED: caught an error in the test 
assertion:Expected [RI 5 role] eq {master} (context: type eval line 3 cmd {assert {[RI 5 role] eq {master}}} proc ::test)
    while executing
"error "assertion:Expected [uplevel 1 [list subst -nocommands $condition]] $context""
    (procedure "assert" line 4)
    invoked from within
"assert {[RI 5 role] eq {master}}"
    ("uplevel" body line 3)
    invoked from within
"uplevel 1 $code"
Testing unit: 11-manual-takeover.tcl
00:12:10> (init) Restart killed instances: OK
00:12:10> Cluster nodes are reachable: FAILED: Node #0 keeps replying '' to PING.
(Jumping to next unit after error)
FAILED: caught an error in the test 
assertion:Node #0 keeps replying '' to PING.
    while executing
"foreach_redis_id id {
        # Every node should be reachable.
        wait_for_condition 1000 50 {
            ([catch {R $id ping} ping_reply] == 0..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code"
I/O error reading reply
    while executing
"$r set [expr rand()] [expr rand()]"
    (procedure "gen_write_load" line 7)
    invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3]"
    (file "tests/helpers/gen_write_load.tcl" line 18)
[exception]: Executing test client: ERR FAILOVER target replica is not online..
ERR FAILOVER target replica is not online.
    while executing
"$node_0 failover to $node_1_host $node_1_port"
    ("uplevel" body line 16)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 58)
    invoked from within
"test {failover command to specific replica works} {
        set initial_psyncs [s -1 sync_partial_ok]
        set initial_syncs [s -1 sync_full]

    ..."
    ("uplevel" body line 61)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {overrides {save {}}} {
    set node_0 [srv 0 client]
    set node_0_host [srv 0 host]
    set node_0_port [srv 0 port]
    set node_0_pi..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {overrides {save {}}} {
start_server {overrides {save {}}} {
    set node_0 [srv 0 client]
    set node_0_host [srv 0 host]
    set node_..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {tags {"failover external:skip"} overrides {save {}}} {
start_server {overrides {save {}}} {
start_server {overrides {save {}}} {
    set..."
    (file "tests/integration/failover.tcl" line 1)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "
sock562e4262fbd0 => (IN PROGRESS) Active defrag big keys

@madolson
Copy link
Member

madolson commented Jan 7, 2025

Still waiting on the build changes that I think are needed to make sure the build workflow works as expected.

@hpatro
Copy link
Collaborator Author

hpatro commented Jan 7, 2025

Still waiting on the build changes that I think are needed to make sure the build workflow works as expected.

I was trying to sync the .github directory but seems like it has been outdated for quite some time. Is there any particular workflow changes which is essential as per you @madolson ?

@madolson
Copy link
Member

madolson commented Jan 7, 2025

I was trying to sync the .github directory but seems like it has been outdated for quite some time. Is there any particular workflow changes which is essential as per you @madolson ?

The github directory between unstable and the release branches are different afaik. I also think we changed the secrets to do the upload, so I don't think the previous workflow would work. You can check the delta between 8.0 and 7.2. I can try releasing 8.0 and see how that works.

00-RELEASENOTES Outdated
@@ -10,6 +10,26 @@ HIGH: There is a critical bug that may affect a subset of users. Upgrade!
CRITICAL: There is a critical bug affecting MOST USERS. Upgrade ASAP.
SECURITY: There are security fixes in the release.
--------------------------------------------------------------------------------
================================================================================
Valkey 7.2.8 - Released Mon 06 Jan 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update the date to released date after it is approved.

@hwware
Copy link
Member

hwware commented Jan 7, 2025

Test failures:

00:12:05> Instance #5 is still a slave after some time (no failover): FAILED: Expected [RI 5 role] eq {master} (context: type eval line 3 cmd {assert {[RI 5 role] eq {master}}} proc ::test)
(Jumping to next unit after error)
FAILED: caught an error in the test 
assertion:Expected [RI 5 role] eq {master} (context: type eval line 3 cmd {assert {[RI 5 role] eq {master}}} proc ::test)
    while executing
"error "assertion:Expected [uplevel 1 [list subst -nocommands $condition]] $context""
    (procedure "assert" line 4)
    invoked from within
"assert {[RI 5 role] eq {master}}"
    ("uplevel" body line 3)
    invoked from within
"uplevel 1 $code"
Testing unit: 11-manual-takeover.tcl
00:12:10> (init) Restart killed instances: OK
00:12:10> Cluster nodes are reachable: FAILED: Node #0 keeps replying '' to PING.
(Jumping to next unit after error)
FAILED: caught an error in the test 
assertion:Node #0 keeps replying '' to PING.
    while executing
"foreach_redis_id id {
        # Every node should be reachable.
        wait_for_condition 1000 50 {
            ([catch {R $id ping} ping_reply] == 0..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code"
I/O error reading reply
    while executing
"$r set [expr rand()] [expr rand()]"
    (procedure "gen_write_load" line 7)
    invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3]"
    (file "tests/helpers/gen_write_load.tcl" line 18)
[exception]: Executing test client: ERR FAILOVER target replica is not online..
ERR FAILOVER target replica is not online.
    while executing
"$node_0 failover to $node_1_host $node_1_port"
    ("uplevel" body line 16)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 58)
    invoked from within
"test {failover command to specific replica works} {
        set initial_psyncs [s -1 sync_partial_ok]
        set initial_syncs [s -1 sync_full]

    ..."
    ("uplevel" body line 61)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {overrides {save {}}} {
    set node_0 [srv 0 client]
    set node_0_host [srv 0 host]
    set node_0_port [srv 0 port]
    set node_0_pi..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {overrides {save {}}} {
start_server {overrides {save {}}} {
    set node_0 [srv 0 client]
    set node_0_host [srv 0 host]
    set node_..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {tags {"failover external:skip"} overrides {save {}}} {
start_server {overrides {save {}}} {
start_server {overrides {save {}}} {
    set..."
    (file "tests/integration/failover.tcl" line 1)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "
sock562e4262fbd0 => (IN PROGRESS) Active defrag big keys

Do we have chance to pass them on Ubuntu 22?

bjosv and others added 15 commits January 7, 2025 21:43
This sets the default permission for current CI workflows to only be
able to read from the repository (scope: "contents").
When a used Github Action require additional permissions (like CodeQL)
we grant that permission on job-level instead.

This means that a compromised action will not be able to modify the repo
or even steal secrets since all other permission-scopes are implicit set
to "none", i.e. not permitted. This is recommended by
[OpenSSF](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions).

This PR includes a small fix for the possibility of missing server logs
artifacts, found while verifying the permission.
The `upload-artifact@v3` action will replace artifacts which already
exists. Since both CI-jobs `test-external-standalone` and
`test-external-nodebug` uses the same artifact name, when both jobs
fail, we only get logs from the last finished job. This can be avoided
by using unique artifact names.

This PR is part of valkey-io#211

More about permissions and scope can be found here:

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

---------

Signed-off-by: Björn Svensson <[email protected]>
This PR introduces Codecov to automate code coverage tracking for our
project's tests.

For more information about the Codecov platform, please refer to
https://docs.codecov.com/docs/quick-start

---------

Signed-off-by: Vitah Lin <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Pin the Github Action dependencies to the hash according to secure
software development best practices
recommended by the Open Source Security Foundation (OpenSSF).

When developing a CI workflow, it's common to version-pin dependencies
(i.e. actions/checkout@v4). However, version tags are mutable, so a
malicious attacker could overwrite a version tag to point to a malicious
or vulnerable commit instead.
Pinning workflow dependencies by hash ensures the dependency is
immutable and its behavior is guaranteed.
See
https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies

The `dependabot` supports updating a hash and the version comment so its
update will continue to work as before.

Links to used actions and theit tag/hash for review/validation:
https://github.com/actions/checkout/tags    (v4.1.2 was rolled back)
https://github.com/github/codeql-action/tags
https://github.com/maxim-lobanov/setup-xcode/tags
https://github.com/cross-platform-actions/action/releases/tag/v0.22.0
https://github.com/py-actions/py-dependency-install/tags
https://github.com/actions/upload-artifact/tags
https://github.com/actions/setup-node/tags
https://github.com/taiki-e/install-action/releases/tag/v2.32.2

This PR is part of valkey-io#211.

Signed-off-by: Björn Svensson <[email protected]>
Adds a job that will automatically run at the end of the daily, which
will collect all the failed tests and send them to the developer slack.
It will include a link to the job as well.

Example job that ran on my private repo:
https://github.com/madolson/valkey/actions/runs/9123245899/job/25085418567

Example notification:
<img width="662" alt="image"
src="https://github.com/valkey-io/valkey/assets/34459052/69127db4-e416-4321-bc06-eefcecab1130">
(Note: I removed the sassy text at the bottom from the PR)

Signed-off-by: Madelyn Olson <[email protected]>
Deprecate MacOS 11 build target. End of life June 2024.  Fixes valkey-io#523

---------

Signed-off-by: Siddhartha Mondal <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Co-authored-by: Roshan Khatri <[email protected]>
replace centos 7 with almalinux 8, add almalinux 9, centos stream 9, fedora stable, rawhide

Fixes valkey-io#527

---------

Signed-off-by: Jonathan Wright <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Few CI improvements witch will reduce occupation CI queue and eliminate
stale runs.

1. Kill CI jobs on PRs once PR branch gets a new push. This will prevent
situation happened today - a huge job triggered twice in less than an
hour and occupied all **org** (for all repositories) runners queue for
the rest of the day (see pic). This completely blocked valkey-glide
team.
2. Distribute nightly croned jobs on time to prevent them running
together. Keep in mind, cron's TZ is UTC, so midnight tasks incur
developers located in other timezones.

This must be backported to all release branches (`valkey-x.y` and `x.y`)

![image](https://github.com/user-attachments/assets/923d8237-3cb7-42f5-80c8-5322b3f5187d)

---------

Signed-off-by: Yury-Fridlyand <[email protected]>
In valkey-io#786, we did skip it in the daily, but not for the others.
When running ./runtest on MacOS, we will get the failure.
```
couldn't open socket: host is unreachable (nodename nor servname provided, or not known)
```

The reason is that TCL 8.5 doesn't support ipv6, so we skip tests
tagged with ipv6. This also revert valkey-io#786.

Signed-off-by: Binbin <[email protected]>
The CI job was introduced in valkey-io#1363, we should skip it in forks.

Signed-off-by: Binbin <[email protected]>
…ccess_key` (valkey-io#1363)

This PR fixes valkey-io#1346 where we can get rid of the long term credentials by
using OpenID Connect. OpenID Connect (OIDC) allows your GitHub Actions
workflows to access resources in Amazon Web Services (AWS), without
needing to store the AWS credentials as long-lived GitHub secrets.

---------

Signed-off-by: vudiep411 <[email protected]>
We have set the secret as `AWS_S3_TEST_BUCKET` for test bucket and I
missed it in the initial review.

Signed-off-by: Roshan Khatri <[email protected]>
Introduced in valkey-io#1363, the file name does not match.

Signed-off-by: Binbin <[email protected]>
- Moves `build-config.json` to workflow dir to build old versions with
new configs.
- Enables contributors to test release Wf on private repo by adding
`github.event_name == 'workflow_dispatch' ||`

---------

Signed-off-by: Roshan Khatri <[email protected]>
Our fortify workflow is running on ubuntu lunar container that is EOL
since [January 25, 2024(January 25,
2024](https://lists.ubuntu.com/archives/ubuntu-announce/2024-January/000298.html).
This case cause the workflow to fail during update actions like:
```
apt-get update && apt-get install -y make gcc-13
  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-1[3](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:3) 100
  make all-with-unit-tests CC=gcc OPT=-O3 SERVER_CFLAGS='-Werror -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3'
  shell: sh -e {0}
Ign:1 http://security.ubuntu.com/ubuntu lunar-security InRelease
Err:2 http://security.ubuntu.com/ubuntu lunar-security Release
  [4](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:4)04  Not Found [IP: 91.189.91.82 80]
Ign:3 http://archive.ubuntu.com/ubuntu lunar InRelease
Ign:4 http://archive.ubuntu.com/ubuntu lunar-updates InRelease
Ign:[5](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:5) http://archive.ubuntu.com/ubuntu lunar-backports InRelease
Err:[6](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:7) http://archive.ubuntu.com/ubuntu lunar Release
  404  Not Found [IP: 185.125.190.81 80]
Err:7 http://archive.ubuntu.com/ubuntu lunar-updates Release
  404  Not Found [IP: 185.125.190.81 80]
Err:8 http://archive.ubuntu.com/ubuntu lunar-backports Release
  404  Not Found [IP: 185.125.190.81 80]
Reading package lists...
E: The repository 'http://security.ubuntu.com/ubuntu lunar-security Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-backports Release' does not have a Release file.
update-alternatives: error: alternative path /usr/bin/gcc-[13](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:14) doesn't exist
Error: Process completed with exit code 2.
```

example:
https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209

This pr uses the latest stable ubuntu image release
[plucky](https://hub.docker.com/layers/library/ubuntu/plucky/images/sha256-dc4565c7636f006c26d54c988faae576465e825ea349fef6fd3af6bf5100e8b6?context=explore)

Signed-off-by: Ran Shidlansik <[email protected]>
@hpatro
Copy link
Collaborator Author

hpatro commented Jan 7, 2025

I've pushed the latest workflow changes.

@hpatro
Copy link
Collaborator Author

hpatro commented Jan 7, 2025

Do we have chance to pass them on Ubuntu 22?

With the workflow changes, it should now run on Ubuntu plucky, will monitor the runs.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual functional changes look good to me. Not super sure about the build tooling though.

@madolson
Copy link
Member

madolson commented Jan 8, 2025

If and when you do merge, remember to do a rebase and not a squash.

00-RELEASENOTES Outdated Show resolved Hide resolved
Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
@hpatro hpatro merged commit f685206 into valkey-io:7.2 Jan 8, 2025
50 of 56 checks passed
Copy link

codecov bot commented Jan 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (7.2@357191a). Learn more about missing BASE report.

Additional details and impacted files
@@          Coverage Diff          @@
##             7.2   #1516   +/-   ##
=====================================
  Coverage       ?       0           
=====================================
  Files          ?       0           
  Lines          ?       0           
  Branches       ?       0           
=====================================
  Hits           ?       0           
  Misses         ?       0           
  Partials       ?       0           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.