-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.rmd
139 lines (99 loc) · 4.07 KB
/
README.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
output:
github_document:
html_preview: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(badger)
library(magrittr)
devtools::load_all()
```
# homologene
[![Build Status](https://travis-ci.org/oganm/homologene.svg?branch=master)](https://travis-ci.org/oganm/homologene) [![codecov](https://codecov.io/gh/oganm/homologene/branch/master/graph/badge.svg)](https://codecov.io/gh/oganm/homologene) `r badge_cran_release('homologene',color = '#32BD36')` `r badge_devel("oganm/homologene", "blue")`
An r package that works as a wrapper to homologene
Available species are
```{r}
homologene::taxData
```
Installation
============
```r
install.packages('homologene')
```
or
```r
devtools::install_github('oganm/homologene')
```
Usage
===========
Basic homologene function requires a list of gene symbols or NCBI ids, and an `inTax` and an `outTax`. In this example, `inTax` is the taxon id of *mus musculus* while `outTax` is for humans.
```{r}
homologene(c('Eno2','Mog'), inTax = 10090, outTax = 9606)
homologene(c('Eno2','17441'), inTax = 10090, outTax = 9606)
```
For mouse and humans two convenience functions exist that removes the need to provide taxonomic identifiers. Note that the column names are not the same as the `homologene` output.
```{r}
mouse2human(c('Eno2','Mog'))
human2mouse(c('ENO2','MOG','GZMH'))
```
homologeneData2
=================
Original homologene database has not been updated since 2014.
This package also includes an updated version of the homologene database that
replaces gene symbols and identifiers with the their latest version. For the procedure followed for updating,
see [this blog post](https://oganm.com/homologene-update/) and/or see the [processing code](R/updateHomologene.R).
Using the updated version can help you match genes that cannot matched due to out of date annotations.
```{r}
mouse2human(c('Mesd',
'Trp53rka',
'Cstdc4',
'Ifit3b'))
mouse2human(c('Mesd',
'Trp53rka',
'Cstdc4',
'Ifit3b'),
db = homologeneData2)
```
The `homologeneData2` object that comes with the GitHub version of this package
is updated weekly but if you are using the CRAN version and want the latest
annotations, or if you want to keep
a frozen version homologene, you can use the `updateHomologene` function.
```r
homologeneDataVeryNew = updateHomologene() # update the homologene database with the latest identifiers
mouse2human(c('Mesd',
'Trp53rka',
'Cstdc4',
'Ifit3b'),
db = homologeneDataVeryNew)
```
Gene ID syncronization
=========================
The package also includes functions that were used to create the `homologeneData2`, for updating outdated gene symbols and identifiers.
```{r, cache = TRUE}
library(dplyr)
gene_history = getGeneHistory()
oldIds = c(4340964, 4349034, 4332470, 4334151, 4323831)
newIds = updateIDs(oldIds,gene_history)
print(newIds)
# get the latest gene symbols for the ids
gene_info = getGeneInfo()
gene_info %>%
dplyr::filter(GeneID %in% as.integer(newIds)) # faster to match integers
```
Querying DIOPT
==============
Instead of using just homologene, one can also make queries into the [DIOPT database](https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl). Diopt uses multiple databases
to find gene homolog/orthologues. Note that this function has a `delay` parameter
that is set to 10 seconds by default. This was done to obey the `robots.txt` of their website.
```{r, cache = TRUE}
diopt(c('GZMH'),inTax = 9606, outTax = 10090) %>%
knitr::kable()
diopt(c('Eno2','Mog'),inTax = 10090, outTax =9606) %>%
knitr::kable()
```
Mishaps
=================
As of version version 1.1.68, the output now includes NCBI ids. Since it doesn't change any of the existing column names or their order, this shouldn't cause problems in most use cases.
If a you can't find a gene you are looking for it may have synonyms. See [geneSynonym](https://github.com/oganm/geneSynonym.git) package to find them. If you have other problems open an issue or send a mail.