Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault // Connecting to master: Connection refused #37

Open
fuesseler opened this issue Sep 2, 2021 · 0 comments
Open

Segmentation fault // Connecting to master: Connection refused #37

fuesseler opened this issue Sep 2, 2021 · 0 comments

Comments

@fuesseler
Copy link

Hello!
I am trying to run SatsumaSynteny2 on SLURM and I keep running into problems. My target is an assembly (size 2GB) and I am trying to identify sex-linked scaffolds in it by aligning to a query of size 17MB. I hope you can help me figure out how to make it work!

the command:
/hpc-cloud/.conda/envs/environment-satsuma2/bin/SatsumaSynteny2 -q SceUnd_NC_056531.fasta -t ref_normalized.fasta -o workdir-satsuma
the resources I supplied:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=200GB

error message:
/var/spool/slurmd.spool/job149759/slurm_script: line 45: 128713 Segmentation fault
The Kmer log files get created without a problem as far as I can see.

The slave gets launched, but it looks like master and slave can not connect. The SL1.log file displays this message:

Loading query sequence:  SceUnd_NC_056531.fasta
 - Creating query chunks...
select=0        chunks=4116
chunks: 4116
DONE
Loading target sequence:  ref_normalized.fasta
 - Creating target chunks...
select=0        chunks=666370
chunks: 666370
DONE
TIME SPENT ON LOADING: 30
== launching workers ==
== Entering communication loop ==
comm loop for CompNode02.hpc-cloud 3491
worker created, now to work!!!
ERROR connecting to master: Connection refused
ERROR connecting to master: Connection refused

The slurm_tmp.sh file's contents before the job fails display this command, which leads me to believe HomologyByXCorrSlave is where the problem occurs;

srun /hpc-cloud/.conda/envs/environment-satsuma2/bin/HomologyByXCorrSlave -master CompNode02.hpc-cloud -port 3491 -sid 1 -p 1 -q SceUnd_NC_056531.fasta -t ref_normalized.fasta -l 0 -q_chunk 4096 -t_chunk 4096 -min_prob 0.99999 -cutoff 1.8

I am not sure if I configured the satsuma_run.sh right:
For "QueueName" is set the partition name that I was planning to run everyhting on. Is there something else I should have configured in order for master and slave to be able to communicate? I never used a program before that uses control and slave processes.

#!/bin/sh

# Script for starting Satsuma jobs on different job submission environments
# One section only should be active, ie. not commented out

# Usage: satsuma_run.sh <current_path> <kmatch_cmd> <ncpus> <mem> <job_id> <run_synchronously>
# mem should be in Gb, ie. 100Gb = 100

# no submission system, processes are run locally either synchronously or asynchronously
#if [ "$6" -eq 1 ]; then
#  eval "$2"
#else
#  eval "$2" &
#fi

##############################################################################################################
## For the sections below you will need to change the queue name (QueueName) to one existing on your system ##
##############################################################################################################

# qsub (PBS systems)
#echo "cd $1; $2" | qsub -V -qQueueName -l ncpus=$3,mem=$4G -N $5

# bsub (LSF systems)
#mem=`expr $4 + 1000`
#bsub -o ${5}.log -J $5 -n $3 -q QueueName -R "rusage[mem=$mem]" "$2"

# SLURM systems
echo "#!/bin/sh" > slurm_tmp.sh
echo srun $2 >> slurm_tmp.sh
sbatch -p Spc -c $3 -J $5 -o ${5}.log --mem ${4}G slurm_tmp.sh

The test dataset behaves the same way. I would appreciate your help! Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant