% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/deduplicate.R
\name{orderly_deduplicate}
\alias{orderly_deduplicate}
\title{Deduplicate an orderly archive}
\usage{
orderly_deduplicate(root = NULL, locate = TRUE, dry_run = TRUE, quiet = FALSE)
}
\arguments{
\item{root}{The path to an orderly root directory, or \code{NULL}
(the default) to search for one from the current working
directory if \code{locate} is \code{TRUE}.}

\item{locate}{Logical, indicating if the configuration should be
searched for.  If \code{TRUE} and \code{config} is not given,
then orderly looks in the working directory and up through its
parents until it finds an \code{orderly_config.yml} file.}

\item{dry_run}{Logical, indicating if the deduplication should be
planned but not run}

\item{quiet}{Logical, indicating if the status should not be printed}
}
\value{
Invisibly, information about the duplication status of the
archive before deduplication was run.
}
\description{
Deduplicate an orderly archive.  Deduplicating an orderly archive
will replace all files that have the same content with "hard
links".  This requires hard link support in the underlying
operating system, which is available on all unix-like systems
(e.g. MacOS and Linux) and on Windows since Vista.  However, on
windows systems this might require somewhat elevated privileges.
If you use this feature, it is \emph{very important} that you
treat your orderly archive as read-only (though you should be
anyway) as changing one copy of a linked file changes all the
other instances of it - the files are literally the same file.
}
\details{
This function will alter your orderly archive.  Ordinarily this is
not something that should be done, so we try to be careful.  In
order for this to work, it is \emph{very important} to treat your
orderly archive as read-only generally.  If your canonical orderly
archive is behind OrderlyWeb this will almost certainly be the
case already.

With "hard linking", two files with the same content can be
updated so that both files point at the same physical bit of data.
This is great, as if the file is large, then only one copy needs
to be stored.  However, this means that if a change is made to one
copy of the file, it is immediately reflected in the other, but
there is nothing to indicate that the files are linked!

This approach is worth exploring if you have large files that are
outputs of one report and inputs to another, or large inputs
repeatedly used in different reports, or outputs that end up being
the same in multiple reports.  If you run the deduplication with
\code{dry_run = TRUE}, an indication of the savings will be
printed.
}
\examples{

path <- orderly::orderly_example("demo")
id1 <- orderly::orderly_run("minimal", root = path)
id2 <- orderly::orderly_run("minimal", root = path)
orderly_commit(id1, root = path)
orderly_commit(id2, root = path)
tryCatch(
  orderly::orderly_deduplicate(path, dry_run = TRUE),
  error = function(e) NULL)
}
