-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multipath-tools 0.11.0 #104
base: master
Are you sure you want to change the base?
Conversation
The @v1 releases are deprecated. https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/ Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
We can't use "container:" any more because upload-artifact@v4 doesn't work on Debian Jessie. Signed-off-by: Martin Wilck <[email protected]>
Running "make" in a container and "make clean" outside doesn't work (access right issues). So just use separate jobs. Signed-off-by: Martin Wilck <[email protected]>
Avoid "text file busy" error on GitHub. dmevents-test: Text file busy Makefile:74: recipe for target 'dmevents.out' failed Signed-off-by: Martin Wilck <[email protected]>
checker_check() is now a void function that stores the path state in the checker struct. It can be retrieved later using checker_get_state(). Right now, this is called immediately after checker_check(), but in the future, it can be deferred to another time. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
This patch adds a new optional symbol for the dynamic path checker libraries, libcheck_pending. This is currently unused, but can be called on pending checkers to check if they've completed and return their value. The "tur" and "directio" checkers are the only ones which can return PATH_PENDING. They now implement libcheck_pending() as a wrapper around the code that libcheck_check uses to wait for pending paths, which has been broken out into its own function. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
When the tur and directio checkers start an asynchronous checker, they now immediately return with the path in PATH_PENDING, instead of waiting in checker_check(). Instead the wait now happens in checker_get_state(). Additionally, the directio checker now waits for 1 ms, like the tur checker does. Also like the tur checker it now only waits once. If it is still pending after the first call to checker_get_state(). It will not wait at all on future calls, and will just process the already completed IOs. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Make the directio tests work with libcheck_pending() being separate from libcheck_check Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
get_state() is now split into start_checker(), which runs the checker but doesn't wait for async checkers, and get_state(), which returns the state from the checker, after waiting for async checkers if necessary. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Instead of pp->offline being a binary value, change it to show the actual result of looking at the sysfs state, and rename it to pp->sysfs_state. Also change the function name from path_offline() to path_sysfs_state(). Adapt the tests for pp->offline. This should not change how multipath currently works. pp->sysfs_state will be used in future patches. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
check_path_state() is now split into start_path_check(), which calls path_sysfs_state() and if the path is up also calls start_checker(), and get_new_state() which gets the new state from either pp->sysfs_state or get_state(). Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Move the code that starts the path checker from do_check_path() into check_path(), rename the remainder of do_check_path() to update_path_state() and call that from check_path(). Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Split out the code that updates a path's state and sets up the next check time into its own function, update_path(). Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Split handle_uninitialized_path() into check_uninitialized_path, which handles udev retriggers for INIT_MISSING_UDEV paths and starts the path checker for INIT_FAILED and INIT_NEW paths, and update_uninitialized_path() which gets the path checker result and reruns pathinfo if the path is up. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Instead of checking and updating each path, the checkerloop now starts the checkers on all the paths in check_paths(), and then goes back and updates all the paths in update_paths(). Since the async checkers use an absolute time to wait for before returning PATH_PENDING, only one checker actually needs to be waited for in update_paths(). The rest will already have reached their timeout when update_path() is called for them. The check_paths() and update_paths() loop over the pathvec instead of looping through the multipath device paths to avoid having to restart checking of the device paths when the multipath device needs to be resynced while updating the paths. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Now that multipathd can drop the vecs lock and sleep in-between when the checker runs and when the path gets updated, it is possible for the user to either fail or reinstate the path in this window. If a path gets manually failed in the window between when the checker starts and the path is updated, the checker will have already been started, on the path, and if multipathd read the results like normal, it could simply reinstate the path. To avoid this, when a path checker is disabled, the checker path_state and message are updated, just like they are when checker_check() is run on a disabled path, so that when checker_get_state() is called, it will always return the same results for a disabled checker, regardless of when it was disabled. Reinstating the path doesn't cause many problems, but still can be improved. Since the checker was disabled when it would have been started, it didn't run during this cycle, only the kernel state will get updated. The rest of the path update changes won't happen until the next time the checker runs. This is the case regardless of whether or not the path was reinstated in the window between when the checker starts and the path is updated. To make reinstated paths get updated sooner, pp->tick is now set to 1 when the path is reinstated. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Instead of updating the path priorities and possibly reloading the multipath device when each path is updated, wait till all paths have been updated, and then go through the multipath devices updating the priorities once, reloading if necessary. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
The code needs to check that pp->pgindex equals pp->mpp->bestpg and that they aren't both 0 (which is an invalid pathgroup id). Since we're already checking that they are equal, we only need to check one of them to see if it's non-zero. Instead of checking if pp->pgindex is non-zero for every path, just check mpp->bestpg once for the multipath device and return immediately if it's zero. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
The way that multipathd was handling priority refreshes had some issues. Some of the multipath prioritizers can hang when run on a failed path. Multipathd was only skipping paths in PATH_DOWN, but there are other states where the prioritizer is also likely to hang, such as PATH_TIMEOUT, PATH_SHAKY, and to a lesser extent, PATH_DELAYED. Also, before the recent patch splitting the priority updating from the path checking, multipathd wasn't consistent with which states would cause paths to get their priorities updated. If a path changed its state to anything other than PATH_UP or PATH_GHOST, it wouldn't get its priority updated. But if a path kept the same state its priority would get updated as long at the state wasn't PATH_DOWN. For safety's sake, a path's priority should only get refreshed when its in the PATH_UP or PATH_GHOST state. This shouldn't cause problems. Only paths that are in the PATH_UP or PATH_GHOST state are usable by the kenel and contibute to the pathgroup's priority. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Since we just returned if newstate wasn't PATH_UP or PATH_GHOST, it must obviously be PATH_UP or PATH_GHOST at this point in the code. We don't even wrap all the code for dealing with a path that just came up in this if-block, It's only the persistent resrvation code. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
If reload_and_sync_map() removes the multipath device, deferred_failback_tick() needs to decrement the counter so that it doesn't skip the following device. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Add a new optional checker class function, libcheck_need_wait() and a new function to call it, checker_need_wait(). This can be used to see if a path_checker is currently running. This will be used to determine if there are pending checkers that need to be waited for. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Disable waiting in the libcheck_pending() checker class functions. Instead, they now just check if the checker has already completed. A future patch will re-add waiting outside of libcheck_pending(). Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Multipath again waits for running checkers to complete. In pathinfo(), mutipath will just wait 1ms if the checker is running. In checkerloop(), if there are running checkers, multipathd will drop the vecs lock and wait for 5ms. The difference it wait times is because multipathd cannot drop the vecs lock in pathinfo. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Make the directio tests work with no waiting in the libcheck_pending() and the new libcheck_need_wait() call. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Useless, automatically handled by the kernel since 4.3. (4.3 was released nine years ago) Cc: Martin Wilck <[email protected]> Cc: Benjamin Marzinski <[email protected]> Cc: Christophe Varoqui <[email protected]> Cc: DM-DEVEL ML <[email protected]> Signed-off-by: Xose Vazquez Perez <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
We import DM_COLDPLUG_SUSPENDED in all code flows below mpath_coldplug_end. Clarify this in the code. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
Since b22c273 ("11-dm-mpath.rules: Don't force activation while device is suspended"), we've handled the case where a device is suspended while an uevent is processed (e.g. because multipathd is reloading the map again at the same time). But we were missing the case where The device had never been initialized before. If .MPATH_DEVICE_READY_OLD was empty, we'd jump to scan_import without setting MPATH_DEVICE_READY to 0. This can cause a device not to be fully activated at boot time, because in follow-up uevents we'd assume that the device had already been set up. Treat the case in which an uevent is processed for a previously not fully set-up, suspended device like other situations where we set MPATH_DEVICE_READY to 0. Fixes: b22c273 ("11-dm-mpath.rules: Don't force activation while device is suspended") Reviewed-by: Benjamin Marzinski <[email protected]>
Typo at: https://github.com/openSUSE/multipath-tools/blob/4c3ade25dcbdc296bd74ca2ac971610b7ef05e12/README.md?plain=1#L37 |
We need one more fix, for systemd/systemd#35135. I'll add this asap. |
Added some reviewed patches. I am waiting for the final version of @bmarzins' patch set for fixing treatment of empty maps. When that's done, I'll rebase this PR. |
Reported by coverity: "i--" may cause an underflow, which will again cause an overflow when the loop continues. Use a signed int for loops like this to make coverity happy. Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
The ABI should never change on a stable branch. This workflow asserts that. Signed-off-by: Martin Wilck <[email protected]>
In libmp_mapinfo__(), if is_mpath_part_uuid() succeeded, instead of ending the 'if' statement, is_mpath_uuid() would be called because of the OR operator. This would always fail if is_mpath_part_uuid() passed. This meant that libmp_mapinfo__() could never match partitions with MAPINFO_CHECK_UUID. Fix that by not calling is_mpath_uuid() if MAPINFO_PART_ONLY is set. Fixes: c1aa028 ("libmultipath: make MAPINFO_CHECK_UUID work with partitions") Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Only multipathd needs to take care of notifying systemd. There's no need to track this information in struct config, or to limit our checker interval to it, as checkerloop() wakes up every second anyway. While at it, fix the watchdog enablement logic: - the watchdog should only be active if WATCHDOG_PID is either unset, or matches the daemon's PID, and if WATCHDOG_USEC is not 0. - the watchdog should trigger twice per systemd-set interval. - if WatchdogSec= is set to an unreasonable value, make a smarter choice than just disabling the watchdog, and print a more meaningful error message. Use timestamp comparison to make sure the watchdog is triggered even if a checkerloop iteration takes more than a second. Fixes: 9366cfb ("multipathd: Implement systemd watchdog integration") Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
Older versions of checkout cause deprecation warnings. Signed-off-by: Martin Wilck <[email protected]>
if libmp_mapinfo() is run on a device that has no active table, it will now return with a new exit code, DMP_EMPTY, to signal this. Currently all code paths treat this identically to DMP_NOT_FOUND. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Add tests for the check for empty table introduced in ("libmultipath: signal device with no table in libmp_mapinfo"). Signed-off-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
dm_flush_nap_nosync() only removes a device if it has a multipath table. On failed removes, there is no table, so this function does nothing. Instead, if libmp_mapinfo() returns DMP_EMPTY, then there is an empty map, and it is removed using dm_device_remove(). Also, since there are no longer any callers of dm_flush_map_nosync(), remove it. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Instead of only setting the uuid, name, and dmi on DMP_OK, return them whenever we find a device, even if that device doesn't match. To make the code simpler, we set these values before checks that could possibly return DMP_ERR. The only caller of libmp_mapinfo() that could update a multipath value incorrectly because of this is update_multipath_table(). Fix that. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
This function is only supposed to work on multipath devices (it adds the multipath uuid prefix to the passed in wwid). Check this, and adapt cli_add_map() to handle the corner case where there is a non-multipath device with a multipath uuid. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Multipath already notices if there is any empty device with the same alias and wwid as a device it is trying to create, and switches to a reload. However, if the empty device has a different alias but the same wwid, instead of reloading and renaming the device like it would for a device with a table. multipath will attempt to create a new device and fail. Fix this by checking for these devices, and doing a reload and rename. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Make dm_flush_map__() remove empty devices, as long as they have valid multipath uuids. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Move the code to check if a device is a valid partition of a multipath device out into its own function. A future patch will make this code more complicated. No functional changes. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
cancel_remove_partmap() and rename_partmap() both could end up calling do_foreach_partmaps() with the partition instead of the original multipath device. This won't ever do anything, since do_foreach_partmaps() checks that it's called with a multipath device. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
If dm_get_opencount() fails, open_count will be negative, which means mpath_in_use() will never fail. However, dm_flush_map__() will eventually fail. There are multiple callers of dm_get_opencount() that run when trying to remove a device. All the others treat a failed call as if the device was in use, including one that will later be called directly on this device. So save ourselves the work, and exit early Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
If the partition is empty, is_valid_partmap() only checks that the uuid matches the multipath device, since there is no table to check for mapping over the multipath device. The only partmap_func that needs changes to handle empty devices is count_partitions(), which should only count a partition if it has a table (since otherwise it will not have the multipath device open). Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
multipath makes many calls just to check if a device exists. Don't warn on each of these. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
This comment has been minimized.
This comment has been minimized.
Force-pushed again. I removed the version bump for now because I want to have it on top of the queue and I've needed multiple force-pushes already. |
Signed-off-by: Martin Wilck <[email protected]>
Based on: https://github.com/SCST-project/scst/blob/master/scst/README#L1905-L1908 Cc: Gleb Chesnokov <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Vladislav Bolkhovitin <[email protected]> Cc: Martin Wilck <[email protected]> Cc: Benjamin Marzinski <[email protected]> Cc: Christophe Varoqui <[email protected]> Cc: DM-DEVEL ML <[email protected]> Signed-off-by: Xose Vazquez Perez <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
https://support.hpe.com/connect/s/product?kmpmoid=1014856412 Just guessing, confirmation from manufacturer is needed. Cc: Jon Paul <[email protected]> Cc: Martin Wilck <[email protected]> Cc: Benjamin Marzinski <[email protected]> Cc: Christophe Varoqui <[email protected]> Cc: DM-DEVEL ML <[email protected]> Signed-off-by: Xose Vazquez Perez <[email protected]> Acked-by: Jon Paul <[email protected]>
Stable branches and new versioning scheme
Beginning with 0.11, the second digit in the multipath-tools version will be
used for upstream "major" releases. The 3rd and last digit will denote stable
releases in the future, containing only bug fixes on top of the last major
release. These bug fixes will be tracked in stable branches.
See README.md for additional information.
multipath-tools 0.11.0, 2024/11
User-visible changes
multipathd.service
such that multipathd will nowrestart after a failure or crash.
Fixes #100.
Other major changes
Rework of the path checking algorithm
This is a continuation of the checker-related work that went into 0.10.0. For
asynchronous checker algorithms (i.e. tur and directio), the start of the
check and the retrieval of the results are done at separate points in time,
which reduces the time for waiting for the checker results of individual paths
and thus strongly improves the performance of the checker algorithm, in
particular on systems with a large a amount of paths.
The algorithm has the following additional properties:
they have not paths
loop, the multipath device will only get synchronized with the kernel once.
the same time (at least as much as their differing checker intervals will
allow).
out as much as possible so multipathd doesn't do all of it's checking in
a burst.
the uninitialized paths are handled after all the adopted paths are
handled).
Bug fixes
of device mapper devices, in particular devices with multiple DM targets.
The problem was introduced in 0.10.0.
Fixes #102.
activated during boot if a cold plug uevent is processed for a previously
not configured multipath map while this map was suspended. This problem existed
since 0.9.8.
no_path_retry fail
and no settingfor
dev_loss_tmo
might get thedev_loss_tmo
set to 0, causing thedevice to be deleted immediately in the event of a transport disruption.
This bug was introduced in 0.9.6.
(
failback
value > 0 inmultipath.conf
), some maps might fail back laterthan configured. The problem existed since 0.9.6.
WATCHDOG_USEC
environment variable had the value "0", which means that the watchdog is simply disabled. This (minor) problem existed since 0.4.9.but don't contain a device-mapper table. Such maps can occur in very rare
cases if some device-mapper operation has failed, or if a tool has been
killed in the process of map creation. multipathd will now detect such
cases, and either remove these maps or reload them as appropriate.
matching UUID and matching type was already present. multipathd
previously failed to set up such maps. Now it will reload them with the
correct content.
Other
hardware_handler
have been removed from theinternal hardware table. These settings have been obsoleted by the Linux
kernel 4.3 and later, which automatically selects hardware handlers when
SCSI devices are added. See the notes about
SCSI_DH_MODULES_PRELOAD
inREADME.md.
Free Software Foundation.
Internal
7
libmp_mapinfo()
now fills in thename
,uuid
, anddmi
fieldsif requested by the caller, even if it encounters an error or an empty map.
Shortlog
@bmarzins (40):
libmultipath: store checker_check() result in checker struct
libmultipath: add missing checker function prototypes
libmultipath: split out the code to wait for pending checkers
libmultipath: remove pending wait code from libcheck_check calls
multipath-tools tests: fix up directio tests
libmultipath: split get_state into two functions
libmultipath: change path_offline to path_sysfs_state
multipathd: split check_path_state into two functions
multipathd: rename do_check_path to update_path_state
multipathd: split check_path into two functions
multipathd: split handle_uninitialized_path into two functions
multipathd: split check_paths into two functions
multipathd: fix "fail path" and "reinstate path" commands
multipathd: update priority once after updating all paths
multipathd: simplify checking for followover_should_failback
multipathd: only refresh prios on PATH_UP and PATH_GHOST
multipathd: remove pointless check
multipathd: fix deferred_failback_tick for reload removes
libmultipath: add libcheck_need_wait checker function
libmultipath: don't wait in libcheck_pending
multipathd: wait for checkers to complete
multipath-tools tests: fix up directio tests
multipathd: checker port_state before setting it.
multipathd: use timestamps to tell when the directio checker timed out
libmultipath: check DM UUID earlier in libmp_mapinfo__
libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath
multipathd: print an error when failing to connect to multipathd
multipathd.service: restart multipathd on failure
libmultipath: Fix MAPINFO_CHECK_UUID with partitions
libmultipath: signal device with no table in libmp_mapinfo
libmultipath: fix removing device after failed creation
libmultipath: set uuid, name, and dmi if a device is found
libmultipath: check table type in dm_find_map_by_wwid
libmultipath: handle a create corner case for empty devices
libmultipath: handle empty maps in dm_flush_map__
libmultipath: factor out code to check if a device is a partition
libmultipath: remove recursive calls in partmap_funcs
libmultipath: assume device is in use if dm_get_opencount fails
libmultipath: accept empty partitions as valid in do_foreach_partmaps
libmultipath: reduce log level for DMP_NOT_FOUND
@mwilck (32):
GitHub workflows: use {upload,download}-artifact@v4
GitHub workflows: update dawidd6/action-download-artifact
Github workflows: native.yml: use mosteo-actions/docker-run
GitHub workflows: native.yml: use extra job for clang
GitHub workflows: native.yaml: make test and archive separately
11-dm-mpath.rules.in: import DM_COLDPLUG_SUSPENDED only once
11-dm-mpath.rules.in: handle inactive suspended devices correctly
11-dm-mpath.rules.in: clarify DM_ACTIVATION logic
11-dm-mpath-rules.in: skip one .DM_NOSCAN check
11-dm-mpath.rules.in: set .DM_NOSCAN if MPATH_UNCHANGED is set
libmpathutil: avoid -Wcast-function-type-mismatch error with clang 19
multipath-tools tests: actually sleep in directio test
libmultipath: dm_get_maps(): don't bail out for single-map failures
libmultipath: libmp_mapinfo(): return DMP_NO_MATCH for multi-target maps
libmultipath: make MAPINFO_CHECK_UUID work with partitions
libmultipath: check map UUID in do_foreach_partmaps
libmultipath: increase log level for removing partitions
libmultipath: reduce log level of libmp_mapinfo() messages
libmultipath: don't log boring state messages at level 3
libmultipath: don't set dev_loss_tmo to 0 for NO_PATH_RETRY_FAIL
README.md: mention NEWS.md in Changelog section
NEWS.md: add section for 0.11.0
README.md: mention stable branches
NEWS.md: mention stable branches
libmultipath, libmpathpersist: bump ABI versions
multipathd: fix an unsigned int ovwerflow
GitHub workflows: enable unit tests for stable branches
GitHub Workflows: add abi check for stable branches
multipathd: move systemd watchdog handling into daemon
GitHub workflows: update checkout to @v4
multipath-tools tests: fix mapinfo tests
NEWS.md: describe latest changes
@xosevp (4):
multipath-tools: remove hwhandler from all hwtable configs
multipath-tools: hwtable housekeeping
multipath-tools: update comments in hwtable
multipath-tools: refresh fsf licences