Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipath-tools 0.11.0 #104

Merged
merged 82 commits into from
Jan 23, 2025
Merged

multipath-tools 0.11.0 #104

merged 82 commits into from
Jan 23, 2025

Conversation

mwilck
Copy link
Collaborator

@mwilck mwilck commented Nov 13, 2024

Stable branches and new versioning scheme

Beginning with 0.11, the second digit in the multipath-tools version will be
used for upstream "major" releases. The 3rd and last digit will denote stable
releases in the future, containing only bug fixes on top of the last major
release. These bug fixes will be tracked in stable branches.

See README.md for additional information.

multipath-tools 0.11.0, 2025/01

User-visible changes

  • Modified the systemd unit multipathd.service such that multipathd will now
    restart after a failure or crash.
    Fixes #100.
  • Logging changes for verbosity level 3:
    • silenced logging of path status if the status is unchanged
    • silenced some unhelpful messages from scanning of existing maps
    • added a message when partition mappings are removed.

Other major changes

Rework of the path checking algorithm

This is a continuation of the checker-related work that went into 0.10.0. For
asynchronous checker algorithms (i.e. tur and directio), the start of the
check and the retrieval of the results are done at separate points in time,
which reduces the time for waiting for the checker results of individual paths
and thus strongly improves the performance of the checker algorithm, in
particular on systems with a large a amount of paths.

The algorithm has the following additional properties:

  1. multipath devices get synchronized with the kernel occasionally, even if
    they have not paths
  2. If multiple paths from a multipath device are checked in the same
    loop, the multipath device will only get synchronized with the kernel once.
  3. All the paths of a multipath device will converge to being checked at
    the same time (at least as much as their differing checker intervals will
    allow).
  4. The times for checking the paths of each multipath device will spread
    out as much as possible so multipathd doesn't do all of it's checking in
    a burst.
  5. path checking is done by multipath device (for initialized paths,
    the uninitialized paths are handled after all the adopted paths are
    handled).

Bug fixes

  • Fixed the problem that multipathd wouldn't start on systems with certain types
    of device mapper devices, in particular devices with multiple DM targets.
    The problem was introduced in 0.10.0.
    Fixes #102.
  • Fixed a corner case in the udev rules which could cause a device not to be
    activated during boot if a cold plug uevent is processed for a previously
    not configured multipath map while this map was suspended. This problem existed
    since 0.9.8.
  • Fixed the problem that devices with no_path_retry fail and no setting
    for dev_loss_tmo might get the dev_loss_tmo set to 0, causing the
    device to be deleted immediately in the event of a transport disruption.
    This bug was introduced in 0.9.6.
  • Fixed the problem that, if there were multiple maps with deferred failback
    (failback value > 0 in multipath.conf), some maps might fail back later
    than configured. The problem existed since 0.9.6.
  • Removed a warning message that multipathd would print if systemd's WATCHDOG_USEC environment variable had the value "0", which means that the watchdog is simply disabled. This (minor) problem existed since 0.4.9.
  • Fixed handling of empty maps, i.e. multipath maps that have a multipath UUID
    but don't contain a device-mapper table. Such maps can occur in very rare
    cases if some device-mapper operation has failed, or if a tool has been
    killed in the process of map creation. multipathd will now detect such
    cases, and either remove these maps or reload them as appropriate.
  • During map creation, fixed the case where a map with different name, but
    matching UUID and matching type was already present. multipathd
    previously failed to set up such maps. Now it will reload them with the
    correct content.
  • Fixed the logic for enabling the systemd watchdog (WatchdogSec= in the
    systemd unit file for multipathd).
  • Fixed a memory leak in the nvme foreign library. The bug existed since
    0.7.8.
  • Fixed a problem in the marginal path detection algorithm that could cause
    the io error check for a recently failed path to be delayed. This bug
    existed since 0.7.4.

Other

  • Default settings for hardware_handler have been removed from the
    internal hardware table. These settings have been obsoleted by the Linux
    kernel 4.3 and later, which automatically selects hardware handlers when
    SCSI devices are added. See the notes about SCSI_DH_MODULES_PRELOAD in
    README.md.
  • Added a hardware table entry for the SCSI Target Subsystem for Linux (SCST).
  • The text of the licenses has been updated to the latest versions from the
    Free Software Foundation.

Internal

  • libmp_mapinfo() now fills in the name, uuid, and dmi fields
    if requested by the caller, even if it encounters an error or an empty map.

Shortlog

@bmarzins (41):
libmultipath: store checker_check() result in checker struct
libmultipath: add missing checker function prototypes
libmultipath: split out the code to wait for pending checkers
libmultipath: remove pending wait code from libcheck_check calls
multipath-tools tests: fix up directio tests
libmultipath: split get_state into two functions
libmultipath: change path_offline to path_sysfs_state
multipathd: split check_path_state into two functions
multipathd: rename do_check_path to update_path_state
multipathd: split check_path into two functions
multipathd: split handle_uninitialized_path into two functions
multipathd: split check_paths into two functions
multipathd: fix "fail path" and "reinstate path" commands
multipathd: update priority once after updating all paths
multipathd: simplify checking for followover_should_failback
multipathd: only refresh prios on PATH_UP and PATH_GHOST
multipathd: remove pointless check
multipathd: fix deferred_failback_tick for reload removes
libmultipath: add libcheck_need_wait checker function
libmultipath: don't wait in libcheck_pending
multipathd: wait for checkers to complete
multipath-tools tests: fix up directio tests
multipathd: checker port_state before setting it.
multipathd: use timestamps to tell when the directio checker timed out
libmultipath: check DM UUID earlier in libmp_mapinfo__
libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath
multipathd: print an error when failing to connect to multipathd
multipathd.service: restart multipathd on failure
libmultipath: Fix MAPINFO_CHECK_UUID with partitions
libmultipath: signal device with no table in libmp_mapinfo
libmultipath: fix removing device after failed creation
libmultipath: set uuid, name, and dmi if a device is found
libmultipath: check table type in dm_find_map_by_wwid
libmultipath: handle a create corner case for empty devices
libmultipath: handle empty maps in dm_flush_map__
libmultipath: factor out code to check if a device is a partition
libmultipath: remove recursive calls in partmap_funcs
libmultipath: assume device is in use if dm_get_opencount fails
libmultipath: accept empty partitions as valid in do_foreach_partmaps
libmultipath: reduce log level for DMP_NOT_FOUND
libmultipath/foreign: fix memory leak in nvme foreign handler

@mwilck (34):
GitHub workflows: use {upload,download}-artifact@v4
GitHub workflows: update dawidd6/action-download-artifact
Github workflows: native.yml: use mosteo-actions/docker-run
GitHub workflows: native.yml: use extra job for clang
GitHub workflows: native.yaml: make test and archive separately
11-dm-mpath.rules.in: import DM_COLDPLUG_SUSPENDED only once
11-dm-mpath.rules.in: handle inactive suspended devices correctly
11-dm-mpath.rules.in: clarify DM_ACTIVATION logic
11-dm-mpath-rules.in: skip one .DM_NOSCAN check
11-dm-mpath.rules.in: set .DM_NOSCAN if MPATH_UNCHANGED is set
libmpathutil: avoid -Wcast-function-type-mismatch error with clang 19
multipath-tools tests: actually sleep in directio test
libmultipath: dm_get_maps(): don't bail out for single-map failures
libmultipath: libmp_mapinfo(): return DMP_NO_MATCH for multi-target maps
libmultipath: make MAPINFO_CHECK_UUID work with partitions
libmultipath: check map UUID in do_foreach_partmaps
libmultipath: increase log level for removing partitions
libmultipath: reduce log level of libmp_mapinfo() messages
libmultipath: don't log boring state messages at level 3
libmultipath: don't set dev_loss_tmo to 0 for NO_PATH_RETRY_FAIL
README.md: mention NEWS.md in Changelog section
NEWS.md: add section for 0.11.0
README.md: mention stable branches
NEWS.md: mention stable branches
libmultipath, libmpathpersist: bump ABI versions
multipathd: fix an unsigned int ovwerflow
GitHub workflows: enable unit tests for stable branches
GitHub Workflows: add abi check for stable branches
multipathd: move systemd watchdog handling into daemon
GitHub workflows: update checkout to @v4
multipath-tools tests: fix mapinfo tests
NEWS.md: describe latest changes
Update NEWS.md
libmultipath: bump version to 0.11.0

@xosevp (6):
multipath-tools: remove hwhandler from all hwtable configs
multipath-tools: hwtable housekeeping
multipath-tools: update comments in hwtable
multipath-tools: refresh fsf licences
multipath-tools: add SCST to hwtable
multipath-tools: add HPE MSA Gen7 (2070/2072) to hwtable

[email protected] (1):
libmultipath: add condition for enqueueing path to io error check

mwilck and others added 30 commits September 13, 2024 10:06
We can't use "container:" any more because upload-artifact@v4 doesn't
work on Debian Jessie.

Signed-off-by: Martin Wilck <[email protected]>
Running "make" in a container and "make clean" outside doesn't
work (access right issues). So just use separate jobs.

Signed-off-by: Martin Wilck <[email protected]>
Avoid "text file busy" error on GitHub.

dmevents-test: Text file busy
Makefile:74: recipe for target 'dmevents.out' failed

Signed-off-by: Martin Wilck <[email protected]>
checker_check() is now a void function that stores the path state in
the checker struct. It can be retrieved later using checker_get_state().
Right now, this is called immediately after checker_check(), but in
the future, it can be deferred to another time.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
This patch adds a new optional symbol for the dynamic path checker
libraries, libcheck_pending. This is currently unused, but can be called
on pending checkers to check if they've completed and return their value.
The "tur" and "directio" checkers are the only ones which can return
PATH_PENDING. They now implement libcheck_pending() as a wrapper around
the code that libcheck_check uses to wait for pending paths, which has
been broken out into its own function.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
When the tur and directio checkers start an asynchronous checker, they
now immediately return with the path in PATH_PENDING, instead of
waiting in checker_check(). Instead the wait now happens in
checker_get_state().

Additionally, the directio checker now waits for 1 ms, like the tur
checker does. Also like the tur checker it now only waits once. If it
is still pending after the first call to checker_get_state(). It will
not wait at all on future calls, and will just process the already
completed IOs.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Make the directio tests work with libcheck_pending() being separate from
libcheck_check

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
get_state() is now split into start_checker(), which runs the checker
but doesn't wait for async checkers, and get_state(), which returns the
state from the checker, after waiting for async checkers if necessary.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Instead of pp->offline being a binary value, change it to show the
actual result of looking at the sysfs state, and rename it to
pp->sysfs_state. Also change the function name from path_offline() to
path_sysfs_state(). Adapt the tests for pp->offline. This should not
change how multipath currently works. pp->sysfs_state will be used in
future patches.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
check_path_state() is now split into start_path_check(), which calls
path_sysfs_state() and if the path is up also calls start_checker(), and
get_new_state() which gets the new state from either pp->sysfs_state
or get_state().

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Move the code that starts the path checker from do_check_path() into
check_path(), rename the remainder of do_check_path() to
update_path_state() and call that from check_path().

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Split out the code that updates a path's state and sets up the next
check time into its own function, update_path().

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Split handle_uninitialized_path() into check_uninitialized_path, which
handles udev retriggers for INIT_MISSING_UDEV paths and starts the path
checker for INIT_FAILED and INIT_NEW paths, and
update_uninitialized_path() which gets the path checker result and
reruns pathinfo if the path is up.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Instead of checking and updating each path, the checkerloop now starts
the checkers on all the paths in check_paths(), and then goes back and
updates all the paths in update_paths(). Since the async checkers use an
absolute time to wait for before returning PATH_PENDING, only one
checker actually needs to be waited for in update_paths(). The rest will
already have reached their timeout when update_path() is called for
them.

The check_paths() and update_paths() loop over the pathvec instead of
looping through the multipath device paths to avoid having to restart
checking of the device paths when the multipath device needs to be
resynced while updating the paths.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Now that multipathd can drop the vecs lock and sleep in-between when the
checker runs and when the path gets updated, it is possible for the
user to either fail or reinstate the path in this window.

If a path gets manually failed in the window between when the checker
starts and the path is updated, the checker will have already been
started, on the path, and if multipathd read the results like normal,
it could simply reinstate the path. To avoid this, when a path checker
is disabled, the checker path_state and message are updated, just
like they are when checker_check() is run on a disabled path, so that
when checker_get_state() is called, it will always return the same
results for a disabled checker, regardless of when it was disabled.

Reinstating the path doesn't cause many problems, but still can be
improved. Since the checker was disabled when it would have been
started, it didn't run during this cycle, only the kernel state will get
updated. The rest of the path update changes won't happen until the next
time the checker runs. This is the case regardless of whether or not the
path was reinstated in the window between when the checker starts and
the path is updated. To make reinstated paths get updated sooner,
pp->tick is now set to 1 when the path is reinstated.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Instead of updating the path priorities and possibly reloading the
multipath device when each path is updated, wait till all paths
have been updated, and then go through the multipath devices updating
the priorities once, reloading if necessary.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
The code needs to check that pp->pgindex equals pp->mpp->bestpg and that
they aren't both 0 (which is an invalid pathgroup id). Since we're
already checking that they are equal, we only need to check one of them
to see if it's non-zero. Instead of checking if pp->pgindex is non-zero
for every path, just check mpp->bestpg once for the multipath device and
return immediately if it's zero.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
The way that multipathd was handling priority refreshes had some issues.
Some of the multipath prioritizers can hang when run on a failed path.
Multipathd was only skipping paths in PATH_DOWN, but there are other
states where the prioritizer is also likely to hang, such as
PATH_TIMEOUT, PATH_SHAKY, and to a lesser extent, PATH_DELAYED.

Also, before the recent patch splitting the priority updating from the
path checking, multipathd wasn't consistent with which states would
cause
paths to get their priorities updated. If a path changed its state to
anything other than PATH_UP or PATH_GHOST, it wouldn't get its priority
updated.  But if a path kept the same state its priority would get
updated as long at the state wasn't PATH_DOWN.

For safety's sake, a path's priority should only get refreshed when its
in the PATH_UP or PATH_GHOST state. This shouldn't cause problems. Only
paths that are in the PATH_UP or PATH_GHOST state are usable by the
kenel and contibute to the pathgroup's priority.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Since we just returned if newstate wasn't PATH_UP or PATH_GHOST, it
must obviously be PATH_UP or PATH_GHOST at this point in the code. We
don't even wrap all the code for dealing with a path that just came up
in this if-block, It's only the persistent resrvation code.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
If reload_and_sync_map() removes the multipath device,
deferred_failback_tick() needs to decrement the counter so that it
doesn't skip the following device.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Add a new optional checker class function, libcheck_need_wait() and a
new function to call it, checker_need_wait(). This can be used to see if
a path_checker is currently running. This will be used to determine if
there are pending checkers that need to be waited for.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Disable waiting in the libcheck_pending() checker class functions.
Instead, they now just check if the checker has already completed.
A future patch will re-add waiting outside of libcheck_pending().

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Multipath again waits for running checkers to complete. In pathinfo(),
mutipath will just wait 1ms if the checker is running. In checkerloop(),
if there are running checkers, multipathd will drop the vecs lock and
wait for 5ms. The difference it wait times is because multipathd cannot
drop the vecs lock in pathinfo.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Make the directio tests work with no waiting in the libcheck_pending()
and the new libcheck_need_wait() call.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Useless, automatically handled by the kernel since 4.3.
(4.3 was released nine years ago)

Cc: Martin Wilck <[email protected]>
Cc: Benjamin Marzinski <[email protected]>
Cc: Christophe Varoqui <[email protected]>
Cc: DM-DEVEL ML <[email protected]>
Signed-off-by: Xose Vazquez Perez <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
We import DM_COLDPLUG_SUSPENDED in all code flows below mpath_coldplug_end.
Clarify this in the code.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Since b22c273 ("11-dm-mpath.rules: Don't force activation while device is
suspended"), we've handled the case where a device is suspended while
an uevent is processed (e.g. because multipathd is reloading the
map again at the same time). But we were missing the case where
The device had never been initialized before. If .MPATH_DEVICE_READY_OLD
was empty, we'd jump to scan_import without setting MPATH_DEVICE_READY
to 0. This can cause a device not to be fully activated at boot time,
because in follow-up uevents we'd assume that the device had already
been set up.

Treat the case in which an uevent is processed for a previously not
fully set-up, suspended device like other situations where we set
MPATH_DEVICE_READY to 0.

Fixes: b22c273 ("11-dm-mpath.rules: Don't force activation while device is
suspended")
Reviewed-by: Benjamin Marzinski <[email protected]>

This comment has been minimized.

mwilck and others added 19 commits January 17, 2025 22:11
Only multipathd needs to take care of notifying systemd. There's
no need to track this information in struct config, or to limit our
checker interval to it, as checkerloop() wakes up every second anyway.

While at it, fix the watchdog enablement logic:

- the watchdog should only be active if WATCHDOG_PID is either unset,
  or matches the daemon's PID, and if WATCHDOG_USEC is not 0.
- the watchdog should trigger twice per systemd-set interval.
- if WatchdogSec= is set to an unreasonable value, make a smarter
  choice than just disabling the watchdog, and print a more meaningful
  error message.

Use timestamp comparison to make sure the watchdog is triggered even
if a checkerloop iteration takes more than a second.

This patch changes the ABI, because a member is deleted from struct config.

Fixes: 9366cfb ("multipathd: Implement systemd watchdog integration")
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Older versions of checkout cause deprecation warnings.

Signed-off-by: Martin Wilck <[email protected]>
if libmp_mapinfo() is run on a device that has no active table,
it will now return with a new exit code, DMP_EMPTY, to signal this.
Currently all code paths treat this identically to DMP_NOT_FOUND.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Add tests for the check for empty table introduced in ("libmultipath:
signal device with no table in libmp_mapinfo").

Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
dm_flush_nap_nosync() only removes a device if it has a multipath table.
On failed removes, there is no table, so this function does nothing.
Instead, if libmp_mapinfo() returns DMP_EMPTY, then there is an empty map,
and it is removed using dm_device_remove().

Also, since there are no longer any callers of dm_flush_map_nosync(),
remove it.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Instead of only setting the uuid, name, and dmi on DMP_OK, return them
whenever we find a device, even if that device doesn't match. To make the
code simpler, we set these values before checks that could possibly return
DMP_ERR.

The only caller of libmp_mapinfo() that could update a multipath value
incorrectly because of this is update_multipath_table(). Fix that.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
This function is only supposed to work on multipath devices (it adds the
multipath uuid prefix to the passed in wwid). Check this, and adapt
cli_add_map() to handle the corner case where there is a non-multipath
device with a multipath uuid.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Multipath already notices if there is any empty device with the same
alias and wwid as a device it is trying to create, and switches to a
reload. However, if the empty device has a different alias but the same
wwid, instead of reloading and renaming the device like it would for
a device with a table. multipath will attempt to create a new device
and fail.

Fix this by checking for these devices, and doing a reload and rename.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Make dm_flush_map__() remove empty devices, as long as they have
valid multipath uuids.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Move the code to check if a device is a valid partition of a multipath
device out into its own function. A future patch will make this code
more complicated. No functional changes.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
cancel_remove_partmap() and rename_partmap() both could end up calling
do_foreach_partmaps() with the partition instead of the original
multipath device. This won't ever do anything, since
do_foreach_partmaps() checks that it's called with a multipath device.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
If dm_get_opencount() fails, open_count will be negative, which means
mpath_in_use() will never fail. However, dm_flush_map__() will
eventually fail. There are multiple callers of dm_get_opencount() that
run when trying to remove a device. All the others treat a failed call
as if the device was in use, including one that will later be called
directly on this device. So save ourselves the work, and exit early

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
If the partition is empty, is_valid_partmap() only checks that the uuid
matches the multipath device, since there is no table to check for
mapping over the multipath device.

The only partmap_func that needs changes to handle empty devices is
count_partitions(), which should only count a partition if it has a
table (since otherwise it will not have the multipath device open).

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
multipath makes many calls just to check if a device exists. Don't warn
on each of these.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Based on: https://github.com/SCST-project/scst/blob/master/scst/README#L1905-L1908

Cc: Gleb Chesnokov <[email protected]>
Cc: Bart Van Assche <[email protected]>
Cc: Vladislav Bolkhovitin <[email protected]>
Cc: Martin Wilck <[email protected]>
Cc: Benjamin Marzinski <[email protected]>
Cc: Christophe Varoqui <[email protected]>
Cc: DM-DEVEL ML <[email protected]>
Signed-off-by: Xose Vazquez Perez <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
https://support.hpe.com/connect/s/product?kmpmoid=1014856412

Just guessing, confirmation from manufacturer is needed.

Cc: Jon Paul <[email protected]>
Cc: Martin Wilck <[email protected]>
Cc: Benjamin Marzinski <[email protected]>
Cc: Christophe Varoqui <[email protected]>
Cc: DM-DEVEL ML <[email protected]>
Signed-off-by: Xose Vazquez Perez <[email protected]>
Acked-by: Jon Paul <[email protected]>
In function io_err_stat_handle_pathfail(), path->io_err_dis_reinstate_time
is set to 0 to enqueue path to io error check as soon as possible. But
multipathd can not do it within marginal_path_err_recheck_gap_time seconds
after power-on, because curr_time is less than
marginal_path_err_recheck_gap_time.

To handle the early marginal path, we can enqueue path when
io_err_dis_reinstate_time is 0.

Signed-off-by: chenrenhui <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>

>
_find_controllers() needs to free the udev device if it doesn't get
added to a path. Otherwise it can leak memory whenever check_foreign()
is called, causing multipathd's memory usage to continually grow.

Fixes: 7b47762 ("libmultipath: nvme: fix path detection for kernel 4.16")
Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
@mwilck mwilck force-pushed the queue branch 2 times, most recently from 4fb9e7d to b24ce92 Compare January 17, 2025 21:21
@mwilck
Copy link
Collaborator Author

mwilck commented Jan 17, 2025

Sorry for the multiple rebases. I think this PR is good now. There are pending reviewed patches on the mailing list (among other things, fixes for #105), but I'd rather delay them to another release.

Ben's mentioned patch set for empty maps is now included, and also a few minor additional bug fixes and hwtable additions that came in since I first drafted this PR.

I also bumped the libmultipath ABI version once more to account for the watchdog patch.

@mwilck mwilck marked this pull request as ready for review January 17, 2025 21:30
@mwilck
Copy link
Collaborator Author

mwilck commented Jan 23, 2025

@cvaroqui, @bmarzins: I believe we should move forward with this release. Unless you have strong objections, please let's get it merged. There's a rather long queue of patches waiting to get into 0.12.0.

@cvaroqui cvaroqui merged commit de16cf8 into opensvc:master Jan 23, 2025
61 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants