Skip to content
Nelly Sélem edited this page Jun 14, 2016 · 15 revisions

Site on developing ..

CORASON Manual

CORe Analysis of Syntenic Orthologs Natural Product BGC

Index Required Files Execute ClusterTools

TUTORIAL This is a detailed CORASON tutorial to find syntenic clusters and sort them phylogenetically.

Genome Database CORASON genome database has 1233 genomes annotated by RAST Additionally MIBIG clusters from natural products has been added

  1. Genomic Database Saving time by manually choosing just a few organisms from genome database. Set $LIST=”num1,num2,...numn”

Installing a new genome database Dependencies MyRast (tested on ubuntu)

Creación de base de datos. Input: RAST.IDs RAST user password

El usuario y el password de RAST de donde serán descargados los genomas deben estar en globals en las variables PASS USER. Los archivos descargados se numerarán según su orden de archivo RAST.IDs e.g. los archivos de anotaciones correspondientes al JobId 288178 se descargarán como 1.faa y 1.txt. Los archivos correspondientes a 288231 como 3.faa, 3.txt.

Para agregar genomas a la base de datos: Agregas su descripciòn en el archivo RAST.IDs Guardas el archivo de RAST faa y txt en el folder GENOMAS, usando el nùmero de lìnea del ARCHIVO RAST.IDs CORASON input required files

To use CORASON you have to modify an archive named globals.pm. It’s a text file, so you can modify it with your prefered text editor. Or use nano.

file.query, globals.pm

2.1 file.query

A single protein fasta file that contains your query protein. Filename extension .query is mandatory.

2.2 File globals.pm

Reference genome and query

$SPECIAL_ORG="1235"; ## Reference organism having a known BGC, will be used as reference for BGC homology $QUERIES="enediyene.query";

Homology search parameters

$e="0.000000000000000000001"; #sss1E-15 # E value. Minimal for a gene to be considered a hit. $BITSCORE="2000"; ## Revisar el archivo .BLAST.pre para tener idea de este parámetro. $CLUSTER_RATIO="15"; #number of genes in the neighborhood to be analized $eCluster="0.00001"; #Evalue for the search of queries (from reference organism) homologies, values above this will be colored $eCore="0.00001"; #Evalue for the search of ortholog groups within the collection of BGCs

db management

$RAST_IDs="RAST.IDs"; $BLAST_CALL=""; $DOWNLOAD="0"; #1 If you need to download The files needed for the script from RAST 0 if you already have downloaded your genomes database $USER="nselem35"; #If you are going to download files $PASS="q8Vf6ib"; #password for RAST account $FORMAT_DB="1"; #here you put 0 if the genomes DB is already formatted and 1 if you want to reformat the whole DB

#####working directory.. for most cases do not touch $NAME="ClusterTools4"; ##Name of the group (Taxa, gender etc) $BLAST="$NAME.blast"; #$dir="/Users/FBG/Desktop/ClusterTools1/$NAME"; ##The path of your directory $dir="/Users/FBG/Desktop/$NAME"; ##The path of your directory

#####for second round of analysis with selected genomes $LIST = "1235,310,318,487,515,520,522,604,705,840"; ##Wich genomes would you process in case you might, otherwise left empty for whole DB search
#$LIST= "983,1013,984,985,1016,413,411,946,1005,924,1007,1243,1022,408,1246,1245,68,69,70,928,328,529,586,542,759,609,824,50,12,312,415,414,425,418,416,417,814,478,531,490,484,485,491,506,536,533,762,477,516,780,546,495,555,526,472,763,672,494,838,789,492,708,812,520,840,515,501,604"; $NUM = wc -l < $RAST_IDs; chomp $NUM; $NUM=int($NUM); #the number of genomes to be analized in case you used the option $LIST, comment if $LIST is empty #$NUM="";

#Window size $RESCALE=85000; ## Adjust horizontal size on arrows (genes) if greater then arrows are smaller and you will see more genes. 3. Execute ClusterTools Script: perl CoreCluster.pl Once you have written your preferences on the globals.pm file just run on terminal $perl CoreCluster.pl

  1. Outputs Files: RightNames.txt, Concatenados.svg, NAME/FUNCTION

RightNames.txt This file is the core-cluster concatenated fasta file.

Concatenados.svg This is the browsable graphic of your related clusters. NAME/FUNCTION

  1. Installation Dependencias: módulo de perl SVG Para instalarlo en MAC OS X 10.6.8 $sudo perl -MCPAN -e shell $install SVG

  2. CORASON architecture GENOMES RAST.IDs

CoreCluster.pl
1_Context_text.pl
Concatenador.pl
ReadingInputs.pl
header.pl 1_MakeBlast.pl
Rename_Ids_Star_Tree.pl
header2.pl 2.Batch_RetrieveFiles.pl
EliminadorLineas.pl
SearchAminoacidsFromCore.pl
multiAlign_gb.pl 2_OrthoGroups.pl
allvsall.pl ChangeName.pl
ReadReaccion.pl
changeNamesWC.pl RenamePrincipalHits.pl readTree.pl converter.pl 3_Draw.pl

Get line number on RAST.IDs for the organism on the query file. $ grep -n 'org' RAST.IDs

Example $ grep -n 'coelicolor' RAST.IDs 515:242137 6666666.112876 Streptomyces coelicolor A3(2) NC_003888.3 515

Pendientes:

Warning si no existe algún gen del cluster If especial Org=” ” y no hay
Poder buscar Plan con Vero, probarlo, compararlo, limpiarlo, escribir paper.

Examples

Core Genome cycamide Si no da core, por que puede ser que no de un core.

Clone this wiki locally