-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port scripts to strict POSIX implementations #41
Conversation
The script stops at these two lines 257-258
When commenting them out, it proceeds, but it stops here. EDIT: by “proceeds” I mean it moves further. Yes the process fails entirely without them. |
It won't work at all if those are commented out, hrm it works here. Maybe something specific about FreeBSD EDIT: You can help debug by editing the first line of the script from |
Yes I'm testing that way. The function works because it shows srcSnaps=' ' on the last line of the script. |
848baa9
to
f3fc85d
Compare
This is working from my FreeBSD host with a basic test ...
|
Did you remove all snapshots from src and dst? |
No, I didn't but thinking about it that might make sense. In a POSIX world that could be an error - no count when it comes to the next few commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronhurt This commit isn't signed. Can you please sign before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zfs-replicate-posix.sh
ran without error, but there's nothing in the destination dataset.
$ cat config.sh
REPLICATE_SETS="zroot/test:zroot/fun@localhost"
Fixed the issue with no snaps on either side. It was a counting issue. The POSIX way of using |
When replicating locally you dont really need the @localhost |
I recommend this small change: f302e68 - (HEAD -> pr-41) Update example config script to use sh. (2 minutes ago) <Yonas> [email protected] SHA256:5ZC2NLyr/ExZaQKf7DPl2f6V65LX8roqeB01qtT4izE
diff --git a/config.sample.sh b/config.sample.sh
index afadb6d..a6342bd 100644
--- a/config.sample.sh
+++ b/config.sample.sh
@@ -1,4 +1,5 @@
-#!/usr/bin/env bash
+#!/usr/bin/sh
+
## zfs-replicate configuration file
# shellcheck disable=SC2034 |
Thanks 👍 |
Yes, that should be changed if/when the Bash script is completely replaced with the POSIX version. Although, it probably doesn't hurt now since it's just an ENV file. |
Replication works for me now 🎉 |
It should still be |
If @tschettervictor is happy I'll work on re-working the test cases and I guess replace the bash version with this POSIX version on merge. |
Yes, I agree. I just checked Ubuntu, and it was in |
Ubuntu |
I'm going to call it a night, but feel free to leave comments and I'll take a look tomorrow. |
I will test with all combos and report back. |
So. However, I ran into one problem. The way I'm running this test is by a REPLICATE_SETS="tank/local/src:tank/local/dst tank/src/1/2:rmttank/dst/1/[email protected] rmttank/src/1/[email protected]:tank/dst/1/2" For some reason it is kind of messing up the third set. I believe it might be due to the fact that we are doing a pull and push in the same run, but still, it needs to be fixed. When doing a pull separate from any other sets, it does work. Also, when the above happens it doesn't error out with a fail (probably due to the fact that it failed to list the dataset and never made it to the skip string). There are other cases where the script also did not error out, like the earlier issue we had, but that is probably a POSIX thing. It would be nice though to add some more error checking. The log did show that it didn't skip any datasets when using the above example, even though the third set failed at listing the dataset. |
Looks like clearing the variable right before doing the "contains" function solves the issue. See my commit. EDIT I've ran all the tests I usually do, and all passed. Just for reference, here they are.
I have not tested root datasets, as I don't use that at all, and wouldn't recommend it. As of right now, with the below commit about resetting the host right before checking the datasets, I'm pretty happy to accept this PR. |
@tschettervictor Very nice. I have just one last suggestion, since diff --git a/zfs-replicate-posix.sh b/zfs-replicate-posix.sh
index 9c040db..6693e10 100755
--- a/zfs-replicate-posix.sh
+++ b/zfs-replicate-posix.sh
@@ -1,6 +1,12 @@
#!/usr/bin/env sh
+
## zfs-replicate.sh
set -e
+if [ -n "$(which freebsd-version)" -a -n "$(which bc)" ]; then
+ if [ 1 -eq $(echo $(freebsd-version | cut -d'-' -f1)" >= 11.3" | bc) ]; then
+ set -o pipefail
+ fi
+fi
############################################
##### warning gremlins live below here ##### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yonas, it's nice to see that |
Do we want to test some more? From my end it looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve tested all combinations using the new POSIX script. All have passed.
Done, and the test case covers just that example now.
|
Initial run passed. Testing commencing. Stand by… |
Testing completed. Just for reference, here are my test runs again. My REPLICATE_SETS include one local set, one remote push set, and one remote pull set.
Looks good from my end. |
I do notice though (and this might be a good thing) that when the script encounters an error, it will exit even if it successfully completed replicating the first 2 pairs. This makes it so the end message is ERROR: etc... But this has only happened when a bug has popped up. Otherwise it will show the "skipped" part at the end. |
Definitely document those in issues if you can reproduce them. They’ll make great test cases to add to further improve the script. |
|
68f5070
to
9accb04
Compare
I’m good to merge. Test have all passed. I’ll try some more intentional errors. |
Sounds good, let me know if you find anything. Otherwise I'll merge this tonight or tomorrow morning, but I want to give it a little time to bake. |
But: when it skips a dataset that failed the “checkDataset “ function, it should stop processing that dataset, otherwise it tries to delete it. The reason is the snapDestroy function. It tries to delete the snapshot, but since none is found, there is no @autorep- suffix. So it tries to delete the dataset entirely.
It does fail though… An if, for, continue, would probably solve it inside the createSnap function right when it does the checkDataset |
I'm not seeing this same behavior for the host check failure. Let me try to simulate a dataset failure.
|
EDIT: My bad. When I copied the new config.sample.sh file I forgot to reset the RECURSE_CHILDREN variable. |
It shouldn't matter, ssh should return non-zero if the remote command returns non-zero.
|
See above. User error. |
Let me see if I can mock up a test, and then handle it. |
I wasn't able to reproduce the same behavior. Would you happen to have a longer log showing all the steps that happened when it tried to remove a dataset? |
Looking at the main scripts, All jokes aside though, that's why it's good to let these things bake for a while. Things you didn't see at first come to light after you stop looking. |
Here is the log. I first did a replication with RECURSE_CHILDREN=1, the tried doing the exact same replication with RECURSE_CHILDREN=0
You should really never go from recursive to non-recursive, but this is what happens. It tries to delete all the leftover snapshots from the recursive replication, and it succeeds, but it also tries to rename and destroy existing datasets. Notice the rename to "receive-xxxx" then destroying that same one. EDIT I don't think it actually deletes anything of value. It just removes the ones that aren't supposed to be recursive on the destination. So this might not even be a bug. |
Hrm, yeah that's definitely odd but I see what it's trying to do. That's all zfs itself trying to reconcile state for the replication stream. The script isn't sending those |
Before it can complete the send, the destination has to match up. So it destroys the existing ones, then completes the send. |
In any case I just switched back and forth between RECURSE_CHILDREN=1/0 and it seemed to work fine. |
It skips because it is doing a non-recursive replication with nested datasets. It’s seeing the datasets, but also seeing it’s not supposed to do a recursive send. So it’s skips the nested ones (which are included in the send anyway, except as directories I suppose) |
There is a new script in this PR
zfs-replicate-posix.sh
that adds POSIX shell support. This script should function via standard Bourne Shell (sh) and/or strict POSIX shells like Dash. This script is currently considered experimental. Feedback is welcome.