-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
43 lines (33 loc) · 1.44 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
title: "README"
output: md_document
---
After working with <https://github.com/john-kurkowski/tldextract> in python, I wanted the same functionality within R. The list of top level domains can be automatically loaded from <https://publicsuffix.org/list/effective_tld_names.dat>. A cached version of the data is stored in the package.
### Installation
To install this package, use the devtools package:
```{r eval=FALSE}
devtools::install_github("jayjacobs/tldextract")
```
### Usage
```{r}
library(tldextract)
# use the cached lookup data, simple call
tldextract("www.google.com")
# it can take multiple domains at the same time
tldextract(c("www.google.com", "www.google.com.ar", "googlemaps.ca", "tbn0.google.cn"))
```
The specification for the top-level domains is cached in the package and is viewable.
```{r}
# view and update the TLD domains list in the tldnames data
data(tldnames)
head(tldnames)
```
If the cached version is out of data and the package isn't updated, the data can be manually loaded, and then passed into the \code{tldextract} function.
```{r}
# get most recent TLD listings
tld <- getTLD() # optionally pass in a different URL than the default
manyhosts <- c("pages.parts.marionautomotive.com", "www.embroiderypassion.com",
"fsbusiness.co.uk", "www.vmm.adv.br", "ttfc.cn", "carole.co.il",
"visiontravail.qc.ca", "mail.space-hoppers.co.uk", "chilton.k12.pa.us")
tldextract(manyhosts, tldnames=tld)
```