--- title: "a - Getting started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{a-getting-started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, include=FALSE} library(rix) ``` ## Non-technical intro to Nix *If you are familiar with the concept of package managers, you can skip to the next section.* To ensure that a project is reproducible you need to deal with at least four things: - Make sure that the required/correct version of R (or any other language) is installed; - Make sure that the required versions of packages are installed; - Make sure that system dependencies are installed (for example, you'd need a working Java installation to install the `{rJava}` R package on Linux); - Make sure that you can install all of this for the hardware you have on hand. For the three first bullet points, the consensus seems to be a mixture of Docker to deal with system dependencies, `{renv}` for the packages (or `{groundhog}`, or a fixed CRAN snapshot like those [Posit provides](https://packagemanager.posit.co/__docs__/user/get-repo-url/#ui-frozen-urls)) and the [R installation manager](https://github.com/r-lib/rig) to install the correct version of R (unless you use a Docker image as base that already ships the required version by default). As for the last bullet point, the only way out is to be able to compile the software for the target architecture. There's a lot of moving pieces, and knowledge that you need to have to get it right. But it turns out that this is not the only solution. Docker + `{renv}` (or some other way to deal with R packages) is likely the most popular way to ensure reproducibility of your projects, but there are other tools to achieve this. One such tool is called Nix. Nix is a package manager for Linux distributions, macOS and it even works on Windows if you enable WSL2. What's a package manager? If you're not a Linux user, you may not be aware. Let me explain it this way: in R, if you want to install a package to provide some functionality not included with a vanilla installation of R, you'd run this: ``` install.packages("dplyr") ``` It turns out that Linux distributions, like Ubuntu for example, work in a similar way, but for software that you'd usually install using an installer (at least on Windows). For example you could install Firefox on Ubuntu using: ``` sudo apt-get install firefox ``` (there's also graphical interfaces that make this process "more user-friendly"). In Linux jargon, `packages` are simply what we call software (or I guess it's all "apps" these days). These packages get downloaded from so-called repositories (think of CRAN, the repository of R packages) but for any type of software that you might need to make your computer work: web browsers, office suites, multimedia software and so on. So Nix is just another package manager that you can use to install software. But what interests us is not using Nix to install Firefox, but instead to install R and the R packages that we require for our analysis (or any other programming language that we need). But why use Nix instead of the usual ways to install software on our operating systems? The first thing that you should know is that Nix's repository, `nixpkgs`, is huge. Humongously huge. As I'm writing these lines, [there's more than 120'000 pieces of software available](https://search.nixos.org/packages), and the *entirety of CRAN and Bioconductor* is also available through `nixpkgs`. So instead of installing R as you usually do and then use `install.packages()` to install packages, you could use Nix to handle everything. But still, why use Nix at all? Nix has an interesting feature: using Nix, it is possible to install software in (relatively) isolated environments. So using Nix, you can install as many versions of R and R packages that you need. Suppose that you start working on a new project. As you start the project, with Nix, you would install a project-specific version of R and R packages that you would only use for that particular project. If you switch projects, you'd switch versions of R and R packages. If you are familiar with `{renv}`, you should see that this is exactly the same thing: the difference is that not only will you have a project-specific library of R packages, you will also have a project-specific R version. So if you start a project now, you'd have R version 4.2.3 installed (the latest version available in `nixpkgs` but not the latest version available, more on this later), with the accompagnying versions of R packages, for as long as the project lives (which can be a long time). If you start a project next year, then that project will have its own R, maybe R version 4.4.2 or something like that, and the set of required R packages that would be current at that time. This is because Nix always installs the software that you need in separate, (isolated) environments on your computer. So you can define an environment for one specific project. But Nix goes even further: not only can you install R and R packages using Nix (in isolated) project-specific environments, Nix even installs the required system dependencies. So for example if I need `{rJava}`, Nix will make sure to install the correct version of Java as well, always in that project-specific environment (so if you already some Java version installed on your system, there won't be any interference). What's also pretty awesome, is that you can use a specific version of `nixpkgs` to *always* get *exactly* the same versions of **all** the software whenever you build that environment to run your project's code. The environment gets defined in a simple plain-text file, and anyone using that file to build the environment will get exactly, byte by byte, the same environment as you when you initially started the project. And this also regardless of the operating system that is used. ## The Nix package manager Nix is a package manager that can be installed on your computer (regardless of OS) and can be used to install software like with any other package manager. If you're familiar with the Ubuntu Linux distribution, you likely have used `apt-get` to install software. On macOS, you may have used `homebrew` for similar purposes. Nix functions in a similar way, but has many advantages over classic package managers, as it focuses on reproducible builds and downloads packages from `nixpkgs`, [currently the largest software repository](https://repology.org/repositories/graphs). This means that using Nix, it is possible to install not only R, but also all the packages required for your project. The obvious question is why use Nix instead of simply installing R and R packages as usual. The answer is that Nix makes sure to install every dependency of any package, up to required system libraries. For example, the `{xlsx}` package requires the Java programming language to be installed on your computer to successfully install. This can be difficult to achieve, and `{xlsx}` bullied many R developers throughout the years (especially those using a Linux distribution, `sudo R CMD javareconf` still plagues my nightmares). But with Nix, it suffices to declare that we want the `{xlsx}` package for our project, and Nix figures out automatically that Java is required and installs and configures it. It all just happens without any required intervention from the user. The second advantage of Nix is that it is possible to *pin* a certain *revision* of the Nix packages' repository (called `nixpkgs`) for our project. Pinning a revision ensures that every package that Nix installs will always be at exactly the same versions, regardless of when in the future the packages get installed. ## rix workflow The idea of `{rix}` is for you to declare the environment you need using the provided `rix()` function. `rix()` is the package's main function and generates a file called `default.nix` which is then used by the Nix package manager to build that environment. Ideally, you would set up such an environment for each of your projects. You can then use this environment to either work interactively, or run R scripts. It is possible to have as many environments as projects, and software that is common to environments will simply be re-used and not get re-installed to save space. Environments are isolated for each other, but can still interact with your system's files, unlike with Docker where a volume must be mounted. Environments can also interact with the software installed on your computer through the usual means, which can sometimes lead to issues. For example, if you already have R installed, and a user library of R packages, more caution is required to properly use environments managed by Nix. It is important at this stage to understand that you should not call `install.packages()` from a running Nix environment. If you want to add packages to a Nix environment while analyzing data, you need to add it the `default.nix` expression and rebuild the environment. This is explained in greater detail in `vignette("d1-installing-r-packages-in-a-nix-environment")`. To avoid interference between your main R library of packages and your Nix environments, calling `rix()` will also run `rix_init()`, which will create a custom `.Rprofile` in the project's directory. This `.Rprofile` will ensure that if you have a user library of packages, these won't get loaded by an R version running in a Nix shell. It will also redefine `install.packages()` to throw an error if you try to use it. `rix()` has several arguments: - the R version you need for your project or a snapshot date; - a list of R packages that your project needs; - an optional list of additional software (for example a Python interpreter, or Quarto); - an optional list with packages to install from GitHub; - an optional list of local packages in `.tar.gz` format to install; - an optional list of LaTeX packages; - whether you want to use RStudio as an IDE for your project (or VS Code, or another environment); - the path to save the `default.nix` file (by default the current working directory) For example: ```{r parsermd-chunk-1, eval = FALSE} rix( r_ver = "latest-upstream", r_pkgs = c("dplyr", "chronicler"), ide = "other" ) ``` The call above writes a `default.nix` file in the current working directory. This `default.nix` can in turn be used by Nix to build an environment containing the latest version of R included in the upstream `nixpkgs`, with the `{dplyr}` and `{chronicler}` packages. Take note of the `ide = "other"` argument: this argument, and the values it can take, are further discussed in the vignette `vignette("e-interactive-use")` but continue reading this vignette and then vignettes numbered by a "d". You can instead provide a specific R version: ```{r, eval = FALSE} rix( r_ver = "4.4.1", r_pkgs = c("dplyr", "chronicler"), ide = "other" ) ``` or a date: ```{r, eval = FALSE} rix( date = "2024-12-14", r_pkgs = c("dplyr", "chronicler"), ide = "other" ) ``` Providing a specific R version or a date will not use the upstream/official `nixpkgs` package repository, but our `rstats-on-nix` fork. We decided to use a fork instead of the official `nixpkgs` for older releases, because it allows us to backport fixes and improve package compatibility, which is espcially important for Apple Silicon. It also allows us to provide newer releases of R more quickly than through the official channels. For example, as of writing (December 18th 2024), R version 4.4.2 is not yet included in upstream `nixpkgs` but is already available through our fork. Thanks to our fork, we are also able to snapshot CRAN more frequently, and thus include more CRAN and Bioconductor package versions than in the upstream `nixpkgs` repository. ### Converting from an renv.lock file `{rix}` also includes an `renv2nix()` function that converts an `renv.lock` file into a valid Nix expression. Read the vignette `vignette("f-renv2nix")` to learn more. ### Using default.nix files The Nix package manager can be used to build reproducible development environments according to the specifications found in the generated `default.nix` files, which contain a Nix *expression*. An *expression* is Nix jargon for a function with multiple inputs and one output, this output being our development environment. `{rix}` does not require Nix to be installed to generate valid expressions (but does require an internet connection), so you could generate expressions and use them on other machines. To actually build an environment using a `default.nix` file, go to where you chose to write it (ideally in a new, empty folder that will be the root folder of your project) and use the Nix package manager to build the environment. Call the following function in a terminal: ``` nix-build ``` Nix install packages in a dedicated folder on your computer, called the *Nix store*. Once Nix is done building the environment, you can start working on it interactively by using the following command in a terminal emulator (not the R console): ``` nix-shell ``` You will *drop* into a Nix shell which provides the installed software. It is not mandatory to call `nix-build` first: you can immediately call `nix-shell`. The advantage of using `nix-build` first is that it creates a file called `result` which will prevent the environment to get garbage collected if you clean the Nix store. If you want to build an environment for an older version of R, you might get a warning telling you that you cannot build the expression, but that you can directly drop into it. If you want to completely isolate your Nix environment from the rest of the system, we recommend using `nix-shell --pure` to drop into the environment, as described in the documentation of `rix_init()`. Finally, if you want to delete an environment, delete the `result` file first (if you used `nix-build`) and then call `nix-store --gc`, which will delete all the orphaned packages. Now that you know more about Nix and `{rix}`, it is time to get these tools installed on your system. - If you're running either Linux or Windows, read the Linux or Windows vignette: `vignette("b1-setting-up-and-using-rix-on-linux-and-windows")` - If you're running macOS, read the macOS vignette: `vignette("b2-setting-up-and-using-rix-on-macos")`