% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/s3_drake_settings.R
\name{new_drake_settings}
\alias{new_drake_settings}
\title{\code{drake_settings} constructor}
\usage{
new_drake_settings(
  cache_log_file = NULL,
  curl_handles = NULL,
  garbage_collection = NULL,
  jobs = NULL,
  jobs_preprocess = NULL,
  keep_going = NULL,
  lazy_load = NULL,
  lib_loc = NULL,
  lock_envir = NULL,
  lock_cache = NULL,
  log_build_times = NULL,
  log_progress = NULL,
  memory_strategy = NULL,
  parallelism = NULL,
  recover = NULL,
  recoverable = NULL,
  seed = NULL,
  session_info = NULL,
  skip_imports = NULL,
  skip_safety_checks = NULL,
  skip_targets = NULL,
  sleep = NULL,
  template = NULL,
  log_worker = NULL
)
}
\arguments{
\item{cache_log_file}{Name of the CSV cache log file to write.
If \code{TRUE}, the default file name is used (\code{drake_cache.CSV}).
If \code{NULL}, no file is written.
If activated, this option writes a flat text file
to represent the state of the cache
(fingerprints of all the targets and imports).
If you put the log file under version control, your commit history
will give you an easy representation of how your results change
over time as the rest of your project changes. Hopefully,
this is a step in the right direction for data reproducibility.}

\item{curl_handles}{A named list of curl handles. Each value is an
object from \code{curl::new_handle()}, and each name is a URL
(and should start with "http", "https", or "ftp").
Example:
list(
\verb{http://httpbin.org/basic-auth} = curl::new_handle(
username = "user", password = "passwd"
)
)
Then, if your plan has
\code{file_in("http://httpbin.org/basic-auth/user/passwd")}
\code{drake} will authenticate using the username and password of the handle
for \verb{http://httpbin.org/basic-auth/}.

\code{drake} uses partial matching on text to
find the right handle of the \code{file_in()} URL, so the name of the handle
could be the complete URL (\code{"http://httpbin.org/basic-auth/user/passwd"})
or a part of the URL (e.g. \code{"http://httpbin.org/"} or
\code{"http://httpbin.org/basic-auth/"}). If you have multiple handles
whose names match your URL, \code{drake} will choose the closest match.}

\item{garbage_collection}{Logical, whether to call \code{gc()} each time
a target is built during \code{\link[=make]{make()}}.}

\item{jobs}{Maximum number of parallel workers for processing the targets.
You can experiment with \code{\link[=predict_runtime]{predict_runtime()}}
to help decide on an appropriate number of jobs.
For details, visit
\verb{https://books.ropensci.org/drake/time.html}.}

\item{jobs_preprocess}{Number of parallel jobs for processing the imports
and doing other preprocessing tasks.}

\item{keep_going}{Logical, whether to still keep running \code{\link[=make]{make()}}
if targets fail.}

\item{lazy_load}{An old feature, currently being questioned.
For the current recommendations on memory management, see
\verb{https://books.ropensci.org/drake/memory.html#memory-strategies}.
The \code{lazy_load} argument is either a character vector or a logical.
For dynamic targets, the behavior is always \code{"eager"} (see below).
So the \code{lazy_load} argument is for static targets only.
Choices for \code{lazy_load}:
\itemize{
\item \code{"eager"}: no lazy loading. The target is loaded right away
with \code{\link[=assign]{assign()}}.
\item \code{"promise"}: lazy loading with \code{\link[=delayedAssign]{delayedAssign()}}
\item \code{"bind"}: lazy loading with active bindings:
\code{bindr::populate_env()}.
\item \code{TRUE}: same as \code{"promise"}.
\item \code{FALSE}: same as \code{"eager"}.
}

If \code{lazy_load} is \code{"eager"},
drake prunes the execution environment before each target/stage,
removing all superfluous targets
and then loading any dependencies it will need for building.
In other words, drake prepares the environment in advance
and tries to be memory efficient.
If \code{lazy_load} is \code{"bind"} or \code{"promise"}, drake assigns
promises to load any dependencies at the last minute.
Lazy loading may be more memory efficient in some use cases, but
it may duplicate the loading of dependencies, costing time.}

\item{lib_loc}{Character vector, optional.
Same as in \code{library()} or \code{require()}.
Applies to the \code{packages} argument (see above).}

\item{lock_envir}{Logical, whether to lock \code{config$envir} during \code{make()}.
If \code{TRUE}, \code{make()} quits in error whenever a command in your
\code{drake} plan (or \code{prework}) tries to add, remove, or modify
non-hidden variables in your environment/workspace/R session.
This is extremely important for ensuring the purity of your functions
and the reproducibility/credibility/trust you can place in your project.
\code{lock_envir} will be set to a default of \code{TRUE} in \code{drake} version
7.0.0 and higher. Namespaces are never locked, e.g.
if \code{envir} is \code{getNamespace("packagename")}.}

\item{lock_cache}{Logical, whether to lock the cache before running \code{make()}
etc. It is usually recommended to keep cache locking on.
However, if you interrupt \code{make()} before it can clean itself up,
then the cache will stay locked,
and you will need to manually unlock it with
\code{drake::drake_cache("xyz")$unlock()}. Repeatedly unlocking the cache
by hand is annoying, and \code{lock_cache = FALSE} prevents the cache
from locking in the first place.}

\item{log_build_times}{Logical, whether to record build_times for targets.
Mac users may notice a 20\% speedup in \code{make()}
with \code{build_times = FALSE}.}

\item{log_progress}{Logical, whether to log the progress
of individual targets as they are being built. Progress logging
creates extra files in the cache (usually the \verb{.drake/} folder)
and slows down \code{make()} a little.
If you need to reduce or limit the number of files in the cache,
call \code{make(log_progress = FALSE, recover = FALSE)}.}

\item{memory_strategy}{Character scalar, name of the
strategy \code{drake} uses to load/unload a target's dependencies in memory.
You can give each target its own memory strategy,
(e.g. \code{drake_plan(x = 1, y = target(f(x), memory_strategy = "lookahead"))})
to override the global memory strategy. Choices:
\itemize{
\item \code{"speed"}: Once a target is newly built or loaded in memory,
just keep it there.
This choice maximizes speed and hogs memory.
\item \code{"autoclean"}: Just before building each new target,
unload everything from memory except the target's direct dependencies.
After a target is built, discard it from memory.
(Set \code{garbage_collection = TRUE} to make sure it is really gone.)
This option conserves memory, but it sacrifices speed because
each new target needs to reload
any previously unloaded targets from storage.
\item \code{"preclean"}: Just before building each new target,
unload everything from memory except the target's direct dependencies.
After a target is built, keep it in memory until \code{drake} determines
they can be unloaded.
This option conserves memory, but it sacrifices speed because
each new target needs to reload
any previously unloaded targets from storage.
\item \code{"lookahead"}: Just before building each new target,
search the dependency graph to find targets that will not be
needed for the rest of the current \code{make()} session.
After a target is built, keep it in memory until the next
memory management stage.
In this mode, targets are only in memory if they need to be loaded,
and we avoid superfluous reads from the cache.
However, searching the graph takes time,
and it could even double the computational overhead for large projects.
\item \code{"unload"}: Just before building each new target,
unload all targets from memory.
After a target is built, \strong{do not} keep it in memory.
This mode aggressively optimizes for both memory and speed,
but in commands and triggers,
you have to manually load any dependencies you need using \code{readd()}.
\item \code{"none"}: Do not manage memory at all.
Do not load or unload anything before building targets.
After a target is built, \strong{do not} keep it in memory.
This mode aggressively optimizes for both memory and speed,
but in commands and triggers,
you have to manually load any dependencies you need using \code{readd()}.
}

For even more direct
control over which targets \code{drake} keeps in memory, see the
help file examples of \code{\link[=drake_envir]{drake_envir()}}.
Also see the \code{garbage_collection} argument of \code{make()} and
\code{drake_config()}.}

\item{parallelism}{Character scalar, type of parallelism to use.
For detailed explanations, see
\verb{https://books.ropensci.org/drake/hpc.html}.

You could also supply your own scheduler function
if you want to experiment or aggressively optimize.
The function should take a single \code{config} argument
(produced by \code{\link[=drake_config]{drake_config()}}). Existing examples
from \code{drake}'s internals are the \verb{backend_*()} functions:
\itemize{
\item \code{backend_loop()}
\item \code{backend_clustermq()}
\item \code{backend_future()}
However, this functionality is really a back door
and should not be used for production purposes unless you really
know what you are doing and you are willing to suffer setbacks
whenever \code{drake}'s unexported core functions are updated.
}}

\item{recover}{Logical, whether to activate automated data recovery.
The default is \code{FALSE} because
\enumerate{
\item Automated data recovery is still stable.
\item It has reproducibility issues.
Targets recovered from the distant past may have been generated
with earlier versions of R and earlier package environments
that no longer exist.
\item It is not always possible, especially when dynamic files
are combined with dynamic branching
(e.g. \code{dynamic = map(stuff)} and \code{format = "file"} etc.)
since behavior is harder to predict in advance.
}

How it works: if \code{recover} is \code{TRUE},
\code{drake} tries to salvage old target values from the cache
instead of running commands from the plan.
A target is recoverable if
\enumerate{
\item There is an old value somewhere in the cache that
shares the command, dependencies, etc.
of the target about to be built.
\item The old value was generated with \code{make(recoverable = TRUE)}.
}

If both conditions are met, \code{drake} will
\enumerate{
\item Assign the most recently-generated admissible data to the target, and
\item skip the target's command.
}

Functions \code{\link[=recoverable]{recoverable()}} and \code{\link[=r_recoverable]{r_recoverable()}} show the most upstream
outdated targets that will be recovered in this way in the next
\code{\link[=make]{make()}} or \code{\link[=r_make]{r_make()}}.}

\item{recoverable}{Logical, whether to make target values recoverable
with \code{make(recover = TRUE)}.
This requires writing extra files to the cache,
and it prevents old metadata from being removed with garbage collection
(\code{clean(garbage_collection = TRUE)}, \code{gc()} in \code{storr}s).
If you need to limit the cache size or the number of files in the cache,
consider \code{make(recoverable = FALSE, progress = FALSE)}.
Recovery is not always possible, especially when dynamic files
are combined with dynamic branching
(e.g. \code{dynamic = map(stuff)} and \code{format = "file"} etc.)
since behavior is harder to predict in advance.}

\item{seed}{Integer, the root pseudo-random number generator
seed to use for your project.
In \code{\link[=make]{make()}}, \code{drake} generates a unique
local seed for each target using the global seed
and the target name. That way, different pseudo-random numbers
are generated for different targets, and this pseudo-randomness
is reproducible.

To ensure reproducibility across different R sessions,
\code{set.seed()} and \code{.Random.seed} are ignored and have no affect on
\code{drake} workflows. Conversely, \code{make()} does not usually
change \code{.Random.seed},
even when pseudo-random numbers are generated.
The exception to this last point is
\code{make(parallelism = "clustermq")}
because the \code{clustermq} package needs to generate random numbers
to set up ports and sockets for ZeroMQ.

On the first call to \code{make()} or \code{drake_config()}, \code{drake}
uses the random number generator seed from the \code{seed} argument.
Here, if the \code{seed} is \code{NULL} (default), \code{drake} uses a \code{seed} of \code{0}.
On subsequent \code{make()}s for existing projects, the project's
cached seed will be used in order to ensure reproducibility.
Thus, the \code{seed} argument must either be \code{NULL} or the same
seed from the project's cache (usually the \verb{.drake/} folder).
To reset the random number generator seed for a project,
use \code{clean(destroy = TRUE)}.}

\item{session_info}{Logical, whether to save the \code{sessionInfo()}
to the cache. Defaults to \code{TRUE}.
This behavior is recommended for serious \code{\link[=make]{make()}}s
for the sake of reproducibility. This argument only exists to
speed up tests. Apparently, \code{sessionInfo()} is a bottleneck
for small \code{\link[=make]{make()}}s.}

\item{skip_imports}{Logical, whether to totally neglect to
process the imports and jump straight to the targets. This can be useful
if your imports are massive and you just want to test your project,
but it is bad practice for reproducible data analysis.
This argument is overridden if you supply your own \code{graph} argument.}

\item{skip_safety_checks}{Logical, whether to skip the safety checks
on your workflow. Use at your own peril.}

\item{skip_targets}{Logical, whether to skip building the targets
in \code{plan} and just import objects and files.}

\item{sleep}{Optional function on a single numeric argument \code{i}.
Default: \code{function(i) 0.01}.

To conserve memory, \code{drake} assigns a brand new closure to
\code{sleep}, so your custom function should not depend on in-memory data
except from loaded packages.

For parallel processing, \code{drake} uses
a central main process to check what the parallel
workers are doing, and for the affected high-performance
computing workflows, wait for data to arrive over a network.
In between loop iterations, the main process sleeps to avoid throttling.
The \code{sleep} argument to \code{make()} and \code{drake_config()}
allows you to customize how much time the main process spends
sleeping.

The \code{sleep} argument is a function that takes an argument
\code{i} and returns a numeric scalar, the number of seconds to
supply to \code{Sys.sleep()} after iteration \code{i} of checking.
(Here, \code{i} starts at 1.)
If the checking loop does something other than sleeping
on iteration \code{i}, then \code{i} is reset back to 1.

To sleep for the same amount of time between checks,
you might supply something like \code{function(i) 0.01}.
But to avoid consuming too many resources during heavier
and longer workflows, you might use an exponential
back-off: say,
\code{function(i) { 0.1 + 120 * pexp(i - 1, rate = 0.01) }}.}

\item{template}{A named list of values to fill in the \code{{{ ... }}}
placeholders in template files (e.g. from \code{\link[=drake_hpc_template_file]{drake_hpc_template_file()}}).
Same as the \code{template} argument of \code{clustermq::Q()} and
\code{clustermq::workers}.
Enabled for \code{clustermq} only (\code{make(parallelism = "clustermq")}),
not \code{future} or \code{batchtools} so far.
For more information, see the \code{clustermq} package:
\verb{https://github.com/mschubert/clustermq}.
Some template placeholders such as \code{{{ job_name }}} and \code{{{ n_jobs }}}
cannot be set this way.}

\item{log_worker}{Logical, same as the \code{log_worker} argument of
\code{clustermq::workers()} and \code{clustermq::Q()}. Only relevant
if \code{parallelism} is \code{"clustermq"}.}
}
\value{
A \code{drake_settings} object.
}
\description{
List of class \code{drake_settings}.
}
\examples{
if (FALSE) { # stronger than roxygen dontrun
new_drake_settings()
}
}
\keyword{internal}
