Web Scraping With R

Installing requirements
Web scraping with rvest
Web scraping with RSelenium

This tutorial covers the basics of web scraping with R. We’ll begin with the scraping of static pages and shift the focus to the techniques that can be used for scraping data from dynamic websites that use JavaScript to render the content.

For a detailed explanation, see this blog post.

Installing requirements

For macOS, run the following:

brew install r
brew install --cask r-studio

For Windows, run the following:

choco install r.project
choco install r.studio

Installing required libraries

install.packages("rvest")
install.packages("dplyr")

Web scraping with rvest

library(rvest)
link = "https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes"
page = read_html(link)

Parsing HTML Content

page %>% html_elements(css="")
page %>% html_elements(xpath="")

For above page, use the following:

htmlElement <- page %>% html_element("table.sortable")

Saving data to a data frame

df <- html_table(htmlEl, header = FALSE)
names(df) <- df[2,]
df = df[-1:-2,]

Exporting data frame to a CSV file

write.csv(df, "iso_codes.csv")

Downloading Images

page <- read_html(url)
image_element <- page %>% html_element(".thumbborder")
image_url <- image_element %>% html_attr("src")
download.file(image_url, destfile = basename("paris.jpg"))

Scrape Dynamic Pages with Rvest

Find the API endpoint and use that as following:

page<-read_html(GET(api_url, timeout(10)))
jsontext <- page %>% html_element("p")  %>% html_text()

For a complete example, see dynamic_rvest.R.

Web scraping with RSelenium

install.package("RSelenium")
library(RSelenium)

Starting Selenium

Method 1

# Method 1
rD <- rsDriver(browser="chrome", port=9515L, verbose=FALSE)
remDr <- rD[["client"]]

Method 2

docker run -d -p 4445:4444 selenium/standalone-firefox

remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4445L,
  browserName = "firefox"
)
remDr$open()

Working with elements in Selenium

remDr$navigate("https://books.toscrape.com/catalogue/category/books/science-fiction_16")

titleElements <- remDr$findElements(using = "xpath", "//article//img")
titles <- sapply(titleElements, function(x){x$getElementAttribute("alt")[[1]]})

pricesElements <- remDr$findElements(using = "xpath", "//*[@class='price_color']")
prices <-  sapply(pricesElements, function(x){x$getElementText()[[1]]})

stockElements <- remDr$findElements(using = "xpath", "//*[@class='instock availability']")
stocks <-  sapply(stockElements, function(x){x$getElementText()[[1]]})

Creating a data frame

df <- data.frame(titles, prices, stocks)

Save CSV

write.csv(df, "books.csv")

If you wish to find out more about web scraping with R, see our blog post.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping With R

Installing requirements

Installing required libraries

Web scraping with rvest

Parsing HTML Content

Saving data to a data frame

Exporting data frame to a CSV file

Downloading Images

Scrape Dynamic Pages with Rvest

Web scraping with RSelenium

Starting Selenium

Method 1

Method 2

Working with elements in Selenium

Creating a data frame

Save CSV

About

Releases

Packages

Contributors 4

Languages

oxylabs/web-scraping-r

Folders and files

Latest commit

History

Repository files navigation

Web Scraping With R

Installing requirements

Installing required libraries

Web scraping with rvest

Parsing HTML Content

Saving data to a data frame

Exporting data frame to a CSV file

Downloading Images

Scrape Dynamic Pages with Rvest

Web scraping with RSelenium

Starting Selenium

Method 1

Method 2

Working with elements in Selenium

Creating a data frame

Save CSV

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages