Text enables users access to HuggingFace Transformers in R
through the R-package reticulate as an
interface to Python, and the
python packages torch and transformers.
So it’s important to install both the text-package and a python
environment with the text required python packages that the text-package
can use.
The recommended way is to use
textrpp_install()
to install a conda environment with text
required python packages, and textrpp_initialize
to
initialize it.
library(text)
library(reticulate)
# Install text required python packages in a conda environment (with defaults).
text::textrpp_install()
# Show available conda environments.
reticulate::conda_list()
# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R.
text::textrpp_initialize(save_profile = TRUE)
# Test so that the text package work.
textEmbed("hello")
Recently some text users (mainly on Mac), have experienced OMP errors - and that RStudio and R crashes. When this is happening we have found the following solutions for now:
Sys.setenv(OMP_NUM_THREADS = "1") #Limit the number of threads to prevent conflicts.
Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1")
# Also might have to restart R
.rs.restartR()
# If above does not work, you can also try this; although this solution might have some risks assocaited with it (for more information see https://github.com/dmlc/xgboost/issues/1715)
Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE") #Temporarily allows execution despite duplicate OpenMP libraries.
### This is how you can unset the settings
Sys.unsetenv("OMP_NUM_THREADS")
Sys.unsetenv("OMP_MAX_ACTIVE_LEVELS")
Sys.unsetenv("KMP_DUPLICATE_LIB_OK")
# This is how you can verify the settings
print(Sys.getenv("DYLD_LIBRARY_PATH"))
# Please let us know if you find any other solutions.
if running: textrpp_install()
results in this error:
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
In the terminal run:
The success of the installation is dependent on using conda, python and package versions that work together. The installation of the text-package with text required python packages is tested on Linux, Mac OS, and Windows using github actions. The installation procedure and details can be seen at github actions (look at workflow runs called System specific installation NoPy).
The table below show various combination of python and package versions that have worked (it is not an exhaustive list).
os | mini_conda | python | torch | transformers | success |
---|---|---|---|---|---|
‘Mac OS’ | ‘-’ | ‘3.9.0’ | ‘torch==1.11.0’ | ‘transformers==4.19.2’ | Pass |
‘Linux’ | ‘-’ | ‘3.9.0’ | ‘torch==1.11.0’ | ‘transformers==4.19.2’ | Pass |
‘Windows’ | ‘-’ | ‘3.9.0’ | ‘torch==1.11.0’ | ‘transformers==4.19.2’ | Pass |
‘Windows’ | ‘4.10.1’ | ‘3.9.0’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | FAIL |
‘Mac OS’ | ‘4.10.3’ | ‘3.9.0’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | Pass |
‘Linux’ | ‘4.10.3’ | ‘3.9.0’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | Pass |
‘Windows’ | ‘4.10.3’ | ‘3.9.0’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | Pass |
‘Mac OS’ | ‘4.10.3’ | ‘3.8.10’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | Pass |
‘Linux’ | ‘4.10.3’ | ‘3.8.10’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | Pass |
‘Windows’ | ‘4.10.3’ | ‘3.8.10’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | Pass |
‘Mac OS’ | ‘4.10.3’ | ‘3.7.0’ | ‘torch==0.4.1’ | ‘transformers==3.3.1’ | Pass |
‘Linux’ | ‘4.10.3’ | ‘3.7.0’ | ‘torch==0.4.1’ | ‘transformers==3.3.1’ | Pass |
‘Windows’ | ‘4.10.3’ | ‘3.6.13’ | ‘torch==1.10’ | ‘transformers==3.3.1’ | Pass |
It is also possible to use virtual environments (although it is currently only tested on MacOS).
# Create a virtual environment with text required python packages.
# Note that you have to provide a python path.
text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"),
python_path = "/usr/local/bin/python3.9",
envname = "textrpp_virtualenv")
# Initialize the virtual environment.
text::textrpp_initialize(virtualenv = "textrpp_virtualenv",
condaenv = NULL,
save_profile = TRUE)
Virtual environments works for MacOS, whereas github actions does not currently work for Linux and Windows. At gihub actions look for a workflow run called: Virtual environment for more information.
OS | Python_version | torch | transformers | Success |
---|---|---|---|---|
‘Mac OS’ | ‘3.9.8’ | ‘torch==1.11.0’ | ‘transformers==4.19.2’ | Pass |
‘Linux’ | ‘3.9.8’ | ‘torch==1.11.0’ | ‘transformers==4.19.2’ | Pass |
‘Mac OS’ | ‘3.9.8’ | ‘torch==1.7.1’ | ‘transformers==4.12.5’ | Pass |
‘Linux’ | - | - | - | - |
‘Windows’ | - | - | - | - |
Below is the instructions for installing earlier versions of text (0.9.10 and before); these should work for newer versions of text as long as a correct versions of python and required packages are used.
library(text)
# To install the python packages torch, transformers, numpy and nltk through R, run:
library(reticulate)
install_miniconda()
conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE)
# Windows 10
conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'))
If something isn’t working right, it is a good start to examine what is installed and running on your system. For example to make sure that you have R and Python versions that are up to date.
# First check R-version and which packages that are attached and loaded.
sessionInfo()
# Second check out python version; and make sure you at least have version 3.6.10
library(reticulate)
py_config()
After a new install/update of text, RStudio crashed (Abort
session) when running functions that fetches word embeddings (i.e.,
textEmbedLayersOutput
or textEmbed
).
To solve the issue re-install reticulate (development version) and uninstall and install r-miniconda.
Uninstall r-miniconda by removing its entire folder (which by default
[in Mac] is at
Users/YOUR_USER_NAME/Library/r-miniconda
).
(Note that [in Mac] the Library folder is hidden, so to make it
visible go to Finder and the path Users/YOUR_USER_NAME/ and press the
three keys: COMMAND + SHIFT + .
. Then the Library-folder
should appear, and you can find and remove r-miniconda.
library(text)
# To re-install packages start with a fresh session by restarting R and RStudio
# Install development of reticulate (might not be necessary)
devtools::install_github("rstudio/reticulate")
# After having manually removed the r-miniconda folder, install it again:
library(reticulate)
install_miniconda()
# Subsequently re-install torch, transformers, numpy and nltk by running:
conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE)
The exact way to install these packages may differ across systems.
Please see:
Python
torch
transformers