-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpacking to Singularity #305
base: 1.x
Are you sure you want to change the base?
Conversation
|
||
def extract_reprozip_file(filename): | ||
bashCommand = " tar -xf {}".format(filename) | ||
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably pass a list directly here, rather than format()
then split()
if data_file: | ||
bashCommand = " tar -xf {}".format(data_file) | ||
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE) | ||
output, error = process.communicate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use --strip-components=1
to get rid of the DATA
name.
But using tarfile
is the way you should go eventually, so that you don't have to write DATA.tar.gz
to disk, and because eventually you are writing into a new tar file instead of a disk folder.
def make_environment_file(): | ||
env_files = [run_env_file,apps_file,base_file] | ||
for file in env_files: | ||
open(os.path.join(ENV_DIR,file), 'a').close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do those files really need to exist?
print("no sh in bin") | ||
bashCommand = "cp ../bin/sh {}".format(SHELL_DIR) | ||
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE) | ||
output, error = process.communicate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably always copy busybox, for example to /busybox
(that's what reprounzip-docker does).
It can be that the /bin/sh
in the package does not work, or does not support some things we need (like exec -a
which we should use eventually)
home = os.environ['HOME'] | ||
print(home) | ||
bashCommand = "singularity run -C -H {}:/something DATA".format(home) | ||
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use check_call()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @khushnaseeb! I finally went over the code. I can't run it because you depend on additional files to create the singularity.d and app file structure.
Overall, try to be more Pythonic:
- don't use programs to do operations like chmod
- don't rely so much on file existing in a particular location, especially relative to the CWD -- use function arguments
- try not to rely on files or directory structures existing on disk to copy them in the TAR (ex: empty dirs), and definitely don't mutate those (in a real deployment, those would be part of the Python package, and read-only)
I know that using tarfile
is difficult, the API is inconsistent and they don't really expose a lot of useful internals (e.g. Tarfile.type). But this can be done I promise!
# Add the missing folders - proc,run, sys and temp_home | ||
folders = ['proc','dev','sys','temp_home','mnt'] | ||
for folder in folders: | ||
new.add("../../missing_folders/"+folder,folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add an empty directory to the tar using:
new_info = tarfile.TarInfo(folder)
new_info.type = tarfile.DIRTYPE
new.addfile(new_info)
new.add("../../missing_folders/"+folder,folder) | ||
print("added missing folder") | ||
|
||
rpz.extractall() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of the code above that I sent you is to not extract the files to disk. Remember that file permissions will not round-trip (also it uses at least twice as much disk space).
print("multiple runs") | ||
# App structure is present | ||
# First add the SCIF folder | ||
tar.add(SCIF_DIR, arcname="scif") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create this without copying this app_structure
folder, see TarInfo
trick below 😉
runs = my_dict['runs'] | ||
copy_busybox(tar) | ||
deleteContent(os.path.join(ENV_DIR, "94-appsbase.sh")) | ||
if len(runs) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just always create the apps? I don't see a problem with there only being one.
|
||
|
||
|
||
def create_overlay_image(OVERLAY_IMAGE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that upload()
(and download()
) also need the overlay image to exist, so you can probably just create it during setup.
f.write(run_cmd) | ||
f.write(upload_download_cmd) | ||
|
||
def make_main_app_base_script(app_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to "sanitize" the app_name
since it might not be suitable for a filename or (unquoted) variable name 😉 People might get very descriptive now that they'll have Binal's GUI.
else: | ||
write_env_file(runs[0].get('environ'),RUN_ENV_FILE) | ||
make_runscript(runs[0]['workingdir'], runs[0]['binary'], runs[0]['argv'][1]) | ||
tar.add(SINGULARITY_DIR, arcname=".singularity.d") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a .singularity.d/runscript
in the app case too? Does it run all the apps in order?
env_file = os.path.join(ENV_DIR, env_file) | ||
with open(env_file, 'w+') as f: | ||
for key,value in env.items(): | ||
f.write(key+"='"+value+"'\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are going to need more robust quoting for values (and possibly keys). Simply enclosing in ' will break if the value contains ' (and backslashes, ", $, etc).
You can use https://github.com/ViDA-NYU/reprozip/blob/1.0.13/reprounzip/reprounzip/unpackers/common/misc.py#L94-L111
This is a draft. PR opened for code review.
This is a script that reads an RPZ file and generates a Singularity container. It will eventually be turned into an unpacker.
A restriction is that we don't want to require root access, so we can't use
singularity build
. We want to write an image in single-file format (TAR, because squashfs is hard to write) directly.cc @khushnaseeb
see #267