-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run ReproZip for part of script #358
Comments
Unfortunately ReproZip is meant to track a process, from its creation. When you reach the section of interest to you, there is no way for you to find out which part of the already-loaded files are required for this new section. For example, numpy might already have been loaded because it's a requirement of your UI package, and you won't see it getting loaded when you load pandas at the start of your simulation code (because it's already been loaded). ReproZip can't automatically determine that you want numpy but not the UI package. Would it be possible for you to split this script into two separate script? You could have a first script set everything up through a UI, then call the simulation script, passing the simulation parameters on the command line or via a file. Then you can easily interpose ReproZip to trace this second process. |
Thanks for the super quick response. Your suggestion about splitting into two scripts was my "plan B" and I intend to try it out soon. Will update you on this shortly. p.s. Is there a provision for recording package versioning info (wherever possible)... like a |
Yes! I'm hoping to add support for common interpreters (Python, R, Ruby) so that version information can be recorded. I completely agree that this information should be in the bundle. |
Out of curiosity.... do you have an idea by when this feature might be available? For now, I have been planning to include certain parts of "Sumatra" package to do this version tracking. If this is expected within ReproZip in the near future, I would be inclined to wait :-) This might be useful for the Python implementation: |
Unfortunately, ReproZip is not hooked into the experiment's Python interpreter, so I have to take a different approach. Probably simply reading the |
[edit: moved to #359] |
I have been attempting to splitting my script into two parts, one of which would be invoked through reprozip. The workflow seems to work in general, but I have the following concerns:
|
For
That's currently not possible, sorry. We would need a lot of different commands to support every use case. You can however change this file from Python using PyYaml. Note that changing the environment might cause the experiment not to run though, since some variables are necessary for the reproduction (I'm thinking
Some things might not be strictly be needed like fonts (#360) but usually all that gets packed is required for the experiment to run. You can omit your data if it's repeated between all the experiments you trace; there is no automated way to put the data in, but running the |
Thanks for the quick reply. I have implemented your suggestions and have got a working prototype ready. Will start testing this out and collecting feedback from others users. Will get back to you with any further developments. |
Glad I could help! I am very interested in your feedback and experience as you attempt this, so don't hesitate to share what you can. Closing this ticket in favor of #359. |
Would it possible to run ReproZip for part of a python script?
To elaborate a bit... we are developing a tool whereby users would specify multiple parameters, and based on this different models and protocols would be employed for the simulations (i.e. it involves user interactivity). Naturally, the packages and files that are invoked would also vary based on the above, and the 'environment' I wish to save should exclude these initial parts and other housekeeping tasks, and focus solely on the loading and execution of the model.
With this in mind, is it possible to invoke ReproZip from within a python script (as opposed to calling from the terminal CLI) so that I can track (and save) the files/packages that are required between, e.g. , line number
x
andy
of my script (i.e. to be able to enable/disable ReproZip tracing inside a python script)?I suppose ReproZip wasn't intended to run in this fashion, but I am curious to know if I could employ certain sub-modules or methods to achieve this. I also took a look at the Jupyter plugin to see if some bits might be useful.
I intend to dig deeper, but felt it was much better to ask here to get a better idea of the lay of the land. Thanks in advance.
(apologies if a similar question has been answered previously elsewhere)
The text was updated successfully, but these errors were encountered: