-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Merly Escalona edited this page Oct 17, 2017
·
7 revisions
Documentation v. 20170920
© 2017 Merly Escalona ([email protected])
University of Vigo, Spain, http://darwin.uvigo.es
This has been developed for simulations of targeted-sequencing experiments under a known species/gene tree distribution. The program extracts the reference sequences that would have been used as target in the probe design.
- We are working under a SimPhy - NGSphy simulation pipeline scenario. Meaning, it follows hierarchical SimPhy's folder structure and sequence labeling.
- [SimPhy](https://github.com/adamallo/simphy) folder path
- prefix of the existing [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files
- prefix for the output files
- method indicating how to obtain the reference sequences
- (optional) file with the description of the sequences that will be used as reference
- (optional) length of the N sequence that will be used to separate the sequences when concatenated
-
- The output will be a directory of FASTA files
- There should be as many FASTA files as replicates have been generated for the current SimPhy project
- Each file will contain all the selected loci, either concatenated or as a multiple alignment file
# 1. Clone repository
git clone https://github.com/merlyescalona/refselector.git
# 2. Move to folder
cd refselector
# 3. Install
python setup.py install --user
The SimPhy/NGSphy reference selector does not have a Graphical User Interface (GUI) and works on the Linux/Mac command line in a non-interactive fashion.
usage: refselector -p <path> -ip <input_prefix>
-op <output_prefix> -o <output_path>
-m <method_code> [ -n <N_seq_size> ]
[ -sdf <sequence_descriptions_file_path> ]
[-l <log_level>] [-v] [-h]
-
-s <path>,--simphy-path <path>
:- description: Path of the SimPhy folder.
- type: string (path)
-
-ip <input_prefix>,--input-prefix <input_prefix>
:- description: Prefix of the FASTA filenames.
- type: string
-
-p <ploidy>,--ploidy <ploidy>
: - -- description: ploidy of the dataset.
- type: number (integer)
- values: [1,2] (default: 1)
-
-op <output_prefix>,--outuput-prefix <output_prefix>
: - -- description: Prefix for the output filename.
- type: string
-
-o <output_path>,--output <output_path>
:- description: Path where output will be written.
- type: string (path)
-
-m <method_code>,--method <method_code>
:- description: Specified method to obtain the reference loci used for the design of probes.
- type: number (int) in the closed interval
[0,4]
. - values:
-
[0]
Considers the outgroup sequence as the reference loci (default). -
[1]
Extracts a specific sequence per locus. Needs parameter-sdf
/--seq-desc-file
-
[2]
Selects a random sequence from any of the the ingroups. -
[3]
Selects randomly a specie and generates a consensus sequence of the sequences belonging to that species. -
[4]
Generates a consensus sequences from all the sequences involved
-
NOTE: The higher the method number, the longer it will take to generate the reference loci.
-
-n <N_seq_size>, --nsize <N_seq_size>
:- description: Number of N's that will be introduced to separate the reference sequences selected. If the parameter is not set, the output file per replicate will be a multiple alignment sequence file otherwise, the output will be a single sequence file per replicate consisting of a concatenation of the reference sequences selected separated with as many N's as set for this parameter.
- type: number (int) where
x >= 0
.
-
-sdf <sequence_descriptions_file_path>, --seq-desc-file <sequence_descriptions_file_path>
- description: when method = 4 has been selected, it is required to identify which sequences will be selected per locus per replicate into a tab-separated file.
- type: string (path)
- format:
replicate_ID locus_ID sequence_description_locus
- Example:
1 1 1_0_0 # Replicate 1, locus 1, sequence 1_0_0
1 2 2_0_0 # Replicate 1, locus 2, sequence 2_0_0
2 1 1_0_1 # Replicate 2, locus 1, sequence 1_0_1
2 2 1_0_3 # Replicate 2, locus 2, sequence 1_0_3
-
-l <log_level>, --log <log_level>
- description: Specified level of log that will be shown through the standard output. Entire log will be stored in a separate file when level==DEBUG.
- type: enumerate
- values:
-
DEBUG
: shows very detailed information of the program's process. -
INFO
(default): shows only information about the state of the program. -
WARNING
: shows only system warnings. -
ERROR
: shows only execution errors.
-
-
-v, --version
: Show program's version number and exit. -
-h, --help
: Show help message and exit.