Skip to content

Commit

Permalink
more troubleshoot
Browse files Browse the repository at this point in the history
  • Loading branch information
ErinWeisbart committed Mar 27, 2024
1 parent a2d69a8 commit 73d6c56
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion documentation/DCP-documentation/troubleshooting_runs.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
| Jobs completing(total messages decreasing) much more quickly than expected. |"==OUT, SUCCESS"| No outcome/saved files on s3 | | There is a mismatch in your metadata somewhere. |Check the Metadata_ columns in your LoadData.csv for typos or a mismatch with your jobs file. The most common sources of mismatch are case and zero padding (e.g. A01 vs a01 vs A1). Check for these mismatches and edit the job file accordingly. If you use pe2loaddata to create your csvs and the plate was imaged multiple times, pay particular attention to the Metadata_Plate column as numbering reflecting this will be automatically passed into the Load_data.csv |
| | Your specified output structure does not match the Metadata passed. |Expected output is seen.| | This is not necessarily an error. If the input grouping is different than the output grouping (e.g. jobs are run by Plate-Well-Site but are all output to a single Plate folder) then this will print in the Cloudwatch log that matches the input structure but actual job progress will print in the Cloudwatch log that matches the output structure. | |
| | Your perinstance logs have an IOError indicating that an .h5 batchfile does not exist | No outcome/saved files on s3 | | No batchfiles exist for your project. | Either you need to create the batch files and make sure that they are in the appropriate directory OR re-start and use MakeAnalysisJobs() instead of MakeAnalysisJobs(mode=‘batch’) in run_batch_general.py |
| | | | Machines made in EC2 and dockers are made in ECS but the dockers are not placed on the machines | 1) There is a mismatch in your DCP config file. OR 2) You haven't set up permissions correctly. | 1) Confirm that the MEMORY matches the MACHINE_TYPE set in your config. Confirm that there are no typos in your DOCKERHUB_TAG set in your config. 2) Check that you have set up permissons correctly for the user or role that you have set in your config under AWS_PROFILE. |
| | | | Machines made in EC2 and dockers are made in ECS but the dockers are not placed on the machines | 1) There is a mismatch in your DCP config file. OR 2) You haven't set up permissions correctly. | 1) Confirm that the MEMORY matches the MACHINE_TYPE set in your config. Confirm that there are no typos in your DOCKERHUB_TAG set in your config. 2) Check that you have set up permissons correctly for the user or role that you have set in your config under AWS_PROFILE. Confirm that your `ecsInstanceRole` is able to access the S3 bucket where your `ecsconfigs` have been uploaded. |
| | Your perinstance logs have an IOError indicating that CellProfiler cannot open your pipeline | | | You have a corrupted pipeline. | Check if you can open your pipeline locally. It may have been corrupted on upload or it may have an error within the pipeline itself. |
| |"== ERR move failed:An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 4): Please reduce your request rate." Error may not show initially and may become more prevalent with time. | | | Too many jobs are finishing too quickly creating a backlog of jobs waiting to upload to S3. | You can 1) check out fewer machines at a time, 2) check out smaller machines and run fewer copies of DCP at the same time, or 3) group jobs in larger groupings (e.g. by Plate instead of Well or Site). If this happens because you have many jobs finishing at the same time (but not finishing very rapidly such that it's not creating an increasing backlog) you can increase SECONDS_TO_START in config.py so there is more separation between jobs finishing.|
| | "/home/ubuntu/bucket: Transport endpoint is not connected" | Cannot be accessed by fleet. | | S3FS has stochastically dropped/failed to connect. | Perform your run without using S3FS by setting DOWNLOAD_FILES = TRUE in your config.py. Note that, depending upon your job and machine setup, you may need to increase the size of your EBS volume to account for the files being downloaded. |
Expand Down

0 comments on commit 73d6c56

Please sign in to comment.