-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gathering of resources from last migrated namespaces, gather PVs, StorageClasses (cluster-scoped), Routes, Service (just registry namespaces) #34
Changes from all commits
ca5a3fc
f58e298
f5dc371
aff73ac
cd9c64b
7e9c055
e380083
5ee727b
6e8cc97
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,12 @@ for localns in $(/usr/bin/oc get migrationcontrollers.migration.openshift.io --a | |
done | ||
echo "Will collect debug info from migclusters [${clusters[@]}]" | ||
|
||
# Find the latest migration, plan, and associated namespaces so we can gather additional info from those | ||
latest_migration=$(oc -n ${localns} get migmigration -o json | jq -r '.items|=sort_by(.metadata.creationTimestamp)' | jq -r '.items[-1].metadata.name') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not familliar with jq. Does items[-1] fail when there are no migmigrations present? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it results in "null" getting set to the var and a 0 return code. Will try a full must-gather in this scenario and make sure it doesn't crash.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, so basically the end-result is that all of the Output from must-gather
So I think this is harmless and works as-is, however I'm open to input on whether we should squash the jq stderr and send it to /dev/null so that the error never shows up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @djwhatle I was under impression that if a script in must-gather returns non-zero exit code, the temporaray must-gather pod exits with non-zero status. But maybe that is incorrect. I think it'd be OK to have stderr printed in this way since this doesn't really break the must-gather container. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're correct, it's just that the RC is always 0 even when |
||
latest_migration_plan=$(oc -n ${localns} get migmigration ${latest_migration} -o jsonpath={.spec.migPlanRef.name}) | ||
latest_migration_namespaces=$(oc -n ${localns} get migplan ${latest_migration_plan} -o json | jq -r '.spec.namespaces[]') | ||
latest_migration_clusters=$(oc -n ${localns} get migplan ${latest_migration_plan} -o json | jq -r '.spec.destMigClusterRef.name, .spec.srcMigClusterRef.name') | ||
|
||
# Iterate over all connected non-host OpenShift clusters | ||
for cluster in ${clusters[@]}; do | ||
unset KUBECONFIG | ||
|
@@ -41,12 +47,29 @@ for localns in $(/usr/bin/oc get migrationcontrollers.migration.openshift.io --a | |
namespaces+=(${ns}) | ||
done | ||
|
||
# Check if this cluster has a migrated namespace we want to oc adm inspect | ||
for ns in ${latest_migration_namespaces}; do | ||
echo "[cluster=${cluster}][namespace=${ns}][migration=${latest_migration}][plan=${latest_migration_plan}] Running oc adm inspect on namespace that is part of latest migration and plan" | ||
mkdir -p must-gather/clusters/${cluster}/migrated-namespaces | ||
/usr/bin/oc adm inspect --dest-dir must-gather/clusters/${cluster}/migrated-namespaces ns/${ns} & | ||
done | ||
|
||
# Collect all resources in MTC namespaces with must-gather | ||
for ns in ${namespaces[@]}; do | ||
echo "[cluster=${cluster}][namespace=${ns}] Running oc adm inspect" | ||
/usr/bin/oc adm inspect --dest-dir must-gather/clusters/${cluster} --all-namespaces ns/${ns} & | ||
done | ||
|
||
# Collect PV and StorageClass info for cluster | ||
echo "[cluster=${cluster}] Running oc adm inspect storageclasses,persistentvolumes" | ||
usr/bin/oc adm inspect --dest-dir must-gather/clusters/${cluster} storageclasses,persistentvolumes & | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, any specific reason for not including PVCs here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PVCs are namespace scoped resources, so they are gathered by the other invocations of |
||
|
||
# Collect Route and Service info needed to troubleshoot direct image migration connection | ||
for ns in default openshift-image-registry; do | ||
echo "[cluster=${cluster}][namespace=${ns}] Running oc adm inspect route,service" | ||
/usr/bin/oc adm inspect --dest-dir must-gather/clusters/${cluster} -n ${ns} route,service & | ||
done | ||
|
||
# Collect the migration and velero CRs | ||
echo "[cluster=${cluster}] Gathering MTC and Velero CRs for namespaces [${namespaces[@]}]" | ||
/usr/bin/gather_crs ${cluster} ${namespaces} & | ||
|
@@ -84,6 +107,16 @@ else | |
find /must-gather/clusters/*/namespaces/*/pods/ -name '*.log' -delete | ||
fi | ||
|
||
# Wipe secrets from migrated-namespaces data | ||
echo "Scrubbing secrets collected from migrated namespaces" | ||
find must-gather/clusters/*/migrated-namespaces/namespaces/*/*/ -name *secrets.yaml* -delete | ||
|
||
# Shorten logs from migrated-namespaces to last 100 lines | ||
echo "Shortening logs collected from migrated namespaces to last 100 lines" | ||
for logfile in $(find must-gather/clusters/*/migrated-namespaces/namespaces/*/*/ -name *.log); do | ||
tail -100 "${logfile}" > tmp.log && mv tmp.log "${logfile}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I understand where these logs are coming from. Are these logs from the Rsync/Stunnel pods? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is any Pod logs gathered from the The amount of logs that could be in these namespaces is potentially unbounded since this is where user apps run, so I thought it would make sense to only capture the last bits which would probably contain errors for Rsync or Stunnel or whatever. Do you think 100 lines is enough? I also considered if we should only keep the logs for Rsync, Stunnel, Stage Pods, but since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @djwhatle Agreed. I am unsure about gathering user application logs though. Are we supposed to be doing that at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's not a great reason for us to. I think you make a good point, I should refine this further so that it only gathers logs for Pods belonging to us. This would also make me more comfortable keeping the full logs, which would be more useful. |
||
done | ||
|
||
# Tar all must-gather artifacts for faster transmission | ||
echo "Tarring must-gather artifacts..." | ||
archive_path="/must-gather-archive" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "sh" intentional here? It won't show up on README I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so this bit just signals to the markdown renderer that this block of code should be rendered with syntax highlighting for shell scripts.
I believe this is the full list of supported syntax highlighting.