-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in rule makefit_combined #4
Comments
Dear Colin, |
Thanks Cèdric, I’ll hold off rerunning until your next version is pushed. I have another question about the experimental design. In our data we quadruplicate CHX/RNASeq pairs for two conditions. When I try include all 8 samples - snakemake throws and error I presume this is because there is a limitation for triplicates? Also for the first column of the samples tsv how should this be set and should the sample I’d match between the chx and RNASeq or is it sufficient to designate the replicate and columns Thank you, |
Dear Colin, Best, Cédric Example 1 (one fit per sample, 2 output): Example 2 (merged replicates, 1 output): |
Thanks Cedric, that's clear on the replicates! Looking forward to working with the new version. Colin |
Thanks Cedric, I've pulled the latest version. Just a note on the definition of the A site position in config.yaml - the comments says:
I presume this should be 15 or 16? I'm also having a problem when downloading from local fastq files; looking at the Snakemake file - it seems to alway expect the data to be on SRA. Colin |
Hi Colin, Position of the A site: Because the reads are shifted according to the CDS reading frame, we are using codon coordinates and not nucleotide. Depending of the libary type and species, it is either set to 5 or 6. The best is to run the pipeline and check that the pattern matches the position of the A site. Fastq files (see readme): In case fastq files are directly provided, files names must be specified under the "SRR" field without the ".fastq" extension and placed in the "/workdir/data/raw/" output folder. Tell me if it's working. I hope this was helpful. Best, Cédric |
Thanks Cedric, sorry for confusion on the A site. I have included the fastq files in the data/raw output folder - the pipe line still tries to download from sra using the sample name. on my side when I look at the Snakemake file there's no option to bypass SRA I also see the following error when I include more than 1 line in the sample.tsv file. Colin |
There is no defined rule in the snakefile to bypass SRA but because snakemake rules are based on input file names and wildcards, it's working. Snakemake will identify that the fastq files are already in the folder and will proceed with the mergefastq rule. You have to be sure that the SRR field is not the SRA anymore but the fastq name without extension. Example Here is my samples.tsv: I hope this is working. Cédric |
Which command do you use to run snakemake ? Can you try to add "--reason" to understand while snakemake is triggering the download rule while ntsr1.fastq is existing. And can you try "--touch" as well - when the output file is newer that the input file, snakemake will rerun the rule. |
Thanks Cedric, you were right - the mistake was on my side I used "--touch" and everything runs smoothy now up until the combined fit of the Ribo and RNA. I hit another error there - the glm doesn't converge after 200 iterations. I've increased the number to 1,000 and started again. I suspect this is specific to my data though and it won't converge with the increased MAXITER. For this run, I've preprocessed my Ribo data by removing reads mapping to rRNA, snoRNA and tRNA. I then filter the reads to the lengths that are (>60%) in phase. Do you recommend using raw data or preprocessed? |
Hi Cedric, Just an update, the problem with the MAXITER was on my side, I restricted the upper and lower bounds of the read length too stringently, when I used the defaults everything worked smoothly. I was wondering could you explain a little more about those please? A minor error occurs when loading the Ensembl.cds.parse.fa.rData file - when running multiple jobs that overlap that file is incomplete. When I run the pipeline there is no problem. Colin |
Hi Colin, I'm not recommending to preprocess the data but to directly start with the raw data. Which organism are you analysing? Best, Cédric |
Hi Cedric, The read coverage is > 100million RPFs (28-31nt) post filtering. For run I completed, I didn't select read lengths by phasing - I used the default values in the config.yaml. When I restricted the by read length to the narrow region the glm did not converge. I haven't gone through the results in detail as of yet, but intend to in the coming days. The organism is a cell line from the Chinese hamster. At the moment I'm using an Ensembl GTF, however a better annotation is available through NCBI -do you plan on incorporating different GTFs in the future? Regards, |
Your coverage is indeed very high. |
Hi Cyril,
Thanks for your work - codonDT has the potential to be quite useful for our work.
I have matched riboseq and rnaseq data. I've run a test sample with only the riboseq to completion. Now I'm trying to run both the RNA and Riboseq. And hit this R error.
Error in sparse.model.matrix(data = ncount, formu) : object 'formu' not found Calls: MakeFit -> sparse.model.matrix -> terms Execution halted [Fri Apr 30 19:44:11 2021] Error in rule makefit_combined: jobid: 138 output: Data/Fit/CHX7_RIBO_fit_24:25.RData, Data/Fit/CHX7_RIBO_fit_24:25.RData.pred, Data/Fit/CHX7_RIBO_fit_24:25.RData.cor
Best,
Colin
The text was updated successfully, but these errors were encountered: