README.Rmd

---
title: "README"
output: md_document
---

After working with <https://github.com/john-kurkowski/tldextract> in python, I wanted the same functionality within R.  The list of top level domains can be automatically loaded from <https://publicsuffix.org/list/effective_tld_names.dat>.  A cached version of the data is stored in the package.  

### Installation

To install this package, use the devtools package:
```{r eval=FALSE}
devtools::install_github("jayjacobs/tldextract")
```

### Usage

```{r}
library(tldextract)
# use the cached lookup data, simple call
tldextract("www.google.com")

# it can take multiple domains at the same time
tldextract(c("www.google.com", "www.google.com.ar", "googlemaps.ca", "tbn0.google.cn"))
```

The specification for the top-level domains is cached in the package and is viewable.

```{r}
# view and update the TLD domains list in the tldnames data
data(tldnames)
head(tldnames)
```

If the cached version is out of data and the package isn't updated, the data can be manually loaded, and then passed into the \code{tldextract} function.

```{r}
# get most recent TLD listings
tld <- getTLD() # optionally pass in a different URL than the default
manyhosts <- c("pages.parts.marionautomotive.com", "www.embroiderypassion.com", 
               "fsbusiness.co.uk", "www.vmm.adv.br", "ttfc.cn", "carole.co.il",
               "visiontravail.qc.ca", "mail.space-hoppers.co.uk", "chilton.k12.pa.us")
tldextract(manyhosts, tldnames=tld)
```