Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacking to Singularity #305

Draft
wants to merge 8 commits into
base: 1.x
Choose a base branch
from
Draft

Unpacking to Singularity #305

wants to merge 8 commits into from

Conversation

remram44
Copy link
Member

@remram44 remram44 commented Jun 21, 2018

This is a draft. PR opened for code review.

This is a script that reads an RPZ file and generates a Singularity container. It will eventually be turned into an unpacker.

A restriction is that we don't want to require root access, so we can't use singularity build. We want to write an image in single-file format (TAR, because squashfs is hard to write) directly.

cc @khushnaseeb
see #267

@remram44 remram44 added R-invalid Resolution: This is not an issue for this project C-unpacker Component: New unpacker or common unpacker functionalities labels Jun 21, 2018

def extract_reprozip_file(filename):
bashCommand = " tar -xf {}".format(filename)
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably pass a list directly here, rather than format() then split()

if data_file:
bashCommand = " tar -xf {}".format(data_file)
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
Copy link
Member Author

@remram44 remram44 Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use --strip-components=1 to get rid of the DATA name.

But using tarfile is the way you should go eventually, so that you don't have to write DATA.tar.gz to disk, and because eventually you are writing into a new tar file instead of a disk folder.

def make_environment_file():
env_files = [run_env_file,apps_file,base_file]
for file in env_files:
open(os.path.join(ENV_DIR,file), 'a').close()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do those files really need to exist?

print("no sh in bin")
bashCommand = "cp ../bin/sh {}".format(SHELL_DIR)
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably always copy busybox, for example to /busybox (that's what reprounzip-docker does).

It can be that the /bin/sh in the package does not work, or does not support some things we need (like exec -a which we should use eventually)

home = os.environ['HOME']
print(home)
bashCommand = "singularity run -C -H {}:/something DATA".format(home)
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use check_call()

@remram44 remram44 changed the title Unpacking to Singularity Code review - Unpacking to Singularity Jun 21, 2018
Copy link
Member Author

@remram44 remram44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @khushnaseeb! I finally went over the code. I can't run it because you depend on additional files to create the singularity.d and app file structure.

Overall, try to be more Pythonic:

  • don't use programs to do operations like chmod
  • don't rely so much on file existing in a particular location, especially relative to the CWD -- use function arguments
  • try not to rely on files or directory structures existing on disk to copy them in the TAR (ex: empty dirs), and definitely don't mutate those (in a real deployment, those would be part of the Python package, and read-only)

I know that using tarfile is difficult, the API is inconsistent and they don't really expose a lot of useful internals (e.g. Tarfile.type). But this can be done I promise!

# Add the missing folders - proc,run, sys and temp_home
folders = ['proc','dev','sys','temp_home','mnt']
for folder in folders:
new.add("../../missing_folders/"+folder,folder)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an empty directory to the tar using:

new_info = tarfile.TarInfo(folder)
new_info.type = tarfile.DIRTYPE
new.addfile(new_info)

new.add("../../missing_folders/"+folder,folder)
print("added missing folder")

rpz.extractall()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of the code above that I sent you is to not extract the files to disk. Remember that file permissions will not round-trip (also it uses at least twice as much disk space).

print("multiple runs")
# App structure is present
# First add the SCIF folder
tar.add(SCIF_DIR, arcname="scif")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create this without copying this app_structure folder, see TarInfo trick below 😉

runs = my_dict['runs']
copy_busybox(tar)
deleteContent(os.path.join(ENV_DIR, "94-appsbase.sh"))
if len(runs) > 1:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just always create the apps? I don't see a problem with there only being one.




def create_overlay_image(OVERLAY_IMAGE):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that upload() (and download()) also need the overlay image to exist, so you can probably just create it during setup.

f.write(run_cmd)
f.write(upload_download_cmd)

def make_main_app_base_script(app_name):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to "sanitize" the app_name since it might not be suitable for a filename or (unquoted) variable name 😉 People might get very descriptive now that they'll have Binal's GUI.

else:
write_env_file(runs[0].get('environ'),RUN_ENV_FILE)
make_runscript(runs[0]['workingdir'], runs[0]['binary'], runs[0]['argv'][1])
tar.add(SINGULARITY_DIR, arcname=".singularity.d")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a .singularity.d/runscript in the app case too? Does it run all the apps in order?

env_file = os.path.join(ENV_DIR, env_file)
with open(env_file, 'w+') as f:
for key,value in env.items():
f.write(key+"='"+value+"'\n")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going to need more robust quoting for values (and possibly keys). Simply enclosing in ' will break if the value contains ' (and backslashes, ", $, etc).

You can use https://github.com/ViDA-NYU/reprozip/blob/1.0.13/reprounzip/reprounzip/unpackers/common/misc.py#L94-L111

@remram44 remram44 changed the title Code review - Unpacking to Singularity Unpacking to Singularity Sep 12, 2018
@remram44 remram44 added this to the 1.0.x milestone Nov 16, 2018
@remram44 remram44 marked this pull request as draft January 29, 2021 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-unpacker Component: New unpacker or common unpacker functionalities R-invalid Resolution: This is not an issue for this project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant