| Type: | Package | 
| Title: | Easy Web Scraping | 
| Version: | 2.3.0 | 
| Maintainer: | Mohamed El Fodil Ihaddaden <ihaddaden.fodeil@gmail.com> | 
| Description: | The goal of 'ralger' is to facilitate web scraping in R. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| URL: | https://github.com/feddelegrand7/ralger | 
| BugReports: | https://github.com/feddelegrand7/ralger/issues | 
| VignetteBuilder: | knitr | 
| Imports: | rvest, xml2, tidyr, dplyr, stringr, robotstxt, crayon, curl, stringi, urltools (≥ 1.7.3), purrr (≥ 1.0.2) | 
| Suggests: | knitr, testthat, rmarkdown, covr | 
| RoxygenNote: | 7.3.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-12 08:56:52 UTC; mohamedelfodilihaddaden | 
| Author: | Mohamed El Fodil Ihaddaden [aut, cre], Ezekiel Ogundepo [ctb], Romain François [ctb] | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-12 09:10:02 UTC | 
Scraping attributes from HTML elements
Description
This function is used to scrape attributes from HTML elements
Usage
attribute_scrap(link, node, attr, askRobot = FALSE)
Arguments
link | 
 the link of the web page to scrape  | 
node | 
 the HTML element to consider  | 
attr | 
 the attribute to scrape  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
a character vector.
Examples
# Extracting the web links within the World Bank research and publications page
link <- "https://ropensci.org/"
# scraping the class attributes' names from all the anchor
attribute_scrap(link = link, node = "a", attr = "class")
Scrape HTML comments from a web page
Description
Extracts HTML comments (<!– comment –>) from a webpage. Useful for detecting hidden notes, debug info, or developer messages.
Usage
comments_scrap(link, askRobot = FALSE)
Arguments
link | 
 Character. The URL of the web page to scrape.  | 
askRobot | 
 Logical. Should the function check robots.txt before scraping? Default is FALSE.  | 
Value
A character vector of HTML comments found on the page.
Examples
link <- "https://example.com"
comments_scrap(link)
Scrape and download CSV files from a Web Page
Description
Scrape and download CSV files from a Web Page
Usage
csv_scrap(link, path = getwd(), askRobot = FALSE)
Arguments
link | 
 the link of the web page  | 
path | 
 the path where to save the CSV files. Defaults to the current directory  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
called for the side effect of downloading CSV files from a website
Scrape Images URLS that don't have 'alt' attributes
Description
Scrape Images URLS that don't have 'alt' attributes
Usage
images_noalt_scrap(link, askRobot = FALSE)
Arguments
link | 
 the URL of the web page  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
a character vector of images' URL without "alt" attribute
Examples
images_noalt_scrap(link = "https://www.r-consortium.org/")
Scrape Images URLs
Description
Scrape Images URLs
Usage
images_preview(link, askRobot = FALSE)
Arguments
link | 
 the link of the web page  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
Images URLs
Examples
images_preview(link = "https://posit.co/")
Scrape Images from a Web Page
Description
Scrape Images from a Web Page
Usage
images_scrap(link, imgpath = getwd(), extn, askRobot = FALSE)
Arguments
link | 
 the link of the web page  | 
imgpath | 
 the path of the images. Defaults to the current directory  | 
extn | 
 the extension of the image: png, jpeg ...  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
called for the side effect of downloading images
Examples
## Not run: 
images_scrap(link = "https://posit.co/", extn = "jpg")
## End(Not run)
Website text paragraph scraping
Description
This function is used to scrape text paragraphs from a website.
Usage
paragraphs_scrap(
  link,
  contain = NULL,
  case_sensitive = FALSE,
  collapse = FALSE,
  askRobot = FALSE
)
Arguments
link | 
 the link of the web page to scrape  | 
contain | 
 filter the paragraphs according to the character string provided.  | 
case_sensitive | 
 logical. Should the contain argument be case sensitive ? defaults to FALSE  | 
collapse | 
 if TRUE the paragraphs will be collapsed into one element and the contain argument ignored.  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrap the web page ? Default is FALSE.  | 
Value
a character vector.
Examples
# Extracting the paragraphs displayed on the health page of the New York Times
link     <- "https://www.nytimes.com/section/health"
paragraphs_scrap(link)
Scrape and download pdf files from a Web Page
Description
Scrape and download pdf files from a Web Page
Usage
pdf_scrap(link, path = getwd(), askRobot = FALSE)
Arguments
link | 
 the link of the web page  | 
path | 
 the path where to save the PDF files. Defaults to the current directory  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
called for the side effect of downloading PDF files from a website
Simple website scraping
Description
This function is used to scrape one element from a website.
Usage
scrap(link, node, clean = FALSE, askRobot = FALSE)
Arguments
link | 
 the link of the web page to scrape  | 
node | 
 the HTML or CSS element to consider, the SelectorGadget tool is highly recommended  | 
clean | 
 logical. Should the function clean the extracted vector or not ? Default is FALSE.  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
a character vector
Examples
# Extracting imdb top 250 movie titles
  link <- "https://www.imdb.com/chart/top/"
  node <- "h3.ipc-title__text"
  scrap(link, node)
HTML table scraping
Description
This function is used to scrape an html table from a website.
Usage
table_scrap(link, choose = 1, header = TRUE, askRobot = FALSE)
Arguments
link | 
 the link of the web page containing the table to scrape  | 
choose | 
 an integer indicating which table to scrape  | 
header | 
 do you want the first line to be the leader (default to TRUE)  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
a data frame object.
Examples
# Extracting premier ligue 2019/2020 top scorers
link <- "https://www.topscorersfootball.com/premier-league"
table_scrap(link)
Website Tidy scraping
Description
This function is used to scrape a tibble from a website.
Usage
tidy_scrap(link, nodes, colnames, clean = FALSE, askRobot = FALSE)
Arguments
link | 
 the link of the web page to scrape  | 
nodes | 
 the vector of HTML or CSS elements to consider, the SelectorGadget tool is highly recommended.  | 
colnames | 
 the names of the expected columns.  | 
clean | 
 logical. Should the function clean the extracted tibble or not ? Default is FALSE.  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
a tidy data frame.
Examples
# Extracting imdb movie titles and rating
link     <- "https://www.imdb.com/chart/top/"
my_nodes <- c("a > h3.ipc-title__text", "span.ratingGroup--imdb-rating")
names    <- c("title", "rating")
tidy_scrap(link, my_nodes, names)
Website title scraping
Description
This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.
Usage
titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)
Arguments
link | 
 the link of the web page to scrape  | 
contain | 
 filter the titles according to a character string provided.  | 
case_sensitive | 
 logical. Should the contain argument be case sensitive ? defaults to FALSE  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE  | 
Value
a character vector
Examples
# Extracting the current titles of the New York Times
link     <- "https://www.nytimes.com/"
titles_scrap(link)
Website web links scraping
Description
This function is used to scrape web links from a website.
Usage
weblink_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)
Arguments
link | 
 the link of the web page to scrape  | 
contain | 
 filter the web links according to the character string provided.  | 
case_sensitive | 
 logical. Should the contain argument be case sensitive ? defaults to FALSE  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
a character vector.
Examples
# Extracting the web links within the World Bank research and publications page
link <- "https://www.worldbank.org/en/research"
weblink_scrap(link)
Scrape and download Excel xls files from a Web Page
Description
Scrape and download Excel xls files from a Web Page
Usage
xls_scrap(link, path = getwd(), askRobot = FALSE)
Arguments
link | 
 the link of the web page  | 
path | 
 the path where to save the Excel xls files. Defaults to the current directory  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
called for the side effect of downloading Excel xls files from a website
Scrape and download Excel xlsx files from a Web Page
Description
Scrape and download Excel xlsx files from a Web Page
Usage
xlsx_scrap(link, path = getwd(), askRobot = FALSE)
Arguments
link | 
 the link of the web page  | 
path | 
 the path where to save the Excel xlsx files. Defaults to the current directory  | 
askRobot | 
 logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.  | 
Value
called for the side effect of downloading Excel xlsx files from a website
Examples
## Not run: 
xlsx_scrap(
link = "https://www.rieter.com/investor-relations/results-and-presentations/financial-statements"
)
## End(Not run)