% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/config.R
\name{drake_config}
\alias{drake_config}
\title{Create the internal runtime parameter list
used internally in \code{\link[=make]{make()}}.}
\usage{
drake_config(plan = drake::read_drake_plan(), targets = NULL,
  envir = parent.frame(), verbose = drake::default_verbose(),
  hook = NULL, cache = drake::get_cache(verbose = verbose,
  console_log_file = console_log_file), fetch_cache = NULL,
  parallelism = drake::default_parallelism(), jobs = 1,
  packages = rev(.packages()), prework = character(0),
  prepend = character(0), command = drake::default_Makefile_command(),
  args = drake::default_Makefile_args(jobs = jobs, verbose = verbose),
  recipe_command = drake::default_recipe_command(), timeout = NULL,
  cpu = Inf, elapsed = Inf, retries = 0, force = FALSE,
  log_progress = FALSE, graph = NULL, trigger = drake::trigger(),
  skip_targets = FALSE, skip_imports = FALSE,
  skip_safety_checks = FALSE, lazy_load = "eager",
  session_info = TRUE, cache_log_file = NULL, seed = NULL,
  caching = c("master", "worker"), keep_going = FALSE,
  session = NULL, imports_only = NULL, pruning_strategy = NULL,
  makefile_path = "Makefile", console_log_file = NULL,
  ensure_workers = TRUE, garbage_collection = FALSE,
  template = list(), sleep = function(i) 0.01,
  hasty_build = drake::default_hasty_build,
  memory_strategy = c("speed", "memory", "lookahead"), layout = NULL)
}
\arguments{
\item{plan}{workflow plan data frame.
A workflow plan data frame is a data frame
with a \code{target} column and a \code{command} column.
(See the details in the \code{\link[=drake_plan]{drake_plan()}} help file
for descriptions of the optional columns.)
Targets are the objects and files that drake generates,
and commands are the pieces of R code that produce them.
Use the function \code{\link[=drake_plan]{drake_plan()}} to generate workflow plan
data frames easily, and see functions \code{\link[=plan_analyses]{plan_analyses()}},
\code{\link[=plan_summaries]{plan_summaries()}}, \code{\link[=evaluate_plan]{evaluate_plan()}},
\code{\link[=expand_plan]{expand_plan()}}, and \code{\link[=gather_plan]{gather_plan()}} for
easy ways to generate large workflow plan data frames.}

\item{targets}{character vector, names of targets to build.
Dependencies are built too. Together, the \code{plan} and
\code{targets} comprise the workflow network
(i.e. the \code{graph} argument).
Changing either will change the network.}

\item{envir}{environment to use. Defaults to the current
workspace, so you should not need to worry about this
most of the time. A deep copy of \code{envir} is made,
so you don't need to worry about your workspace being modified
by \code{make}. The deep copy inherits from the global environment.
Wherever necessary, objects and functions are imported
from \code{envir} and the global environment and
then reproducibly tracked as dependencies.}

\item{verbose}{logical or numeric, control printing to the console.
Use \code{pkgconfig} to set the default value of \code{verbose}
for your R session:
for example, \code{pkgconfig::set_config("drake::verbose" = 2)}.
\itemize{
\item \code{0} or \code{FALSE}: print nothing.
\item \code{1} or \code{TRUE}: print only targets to build.
\item \code{2}: also print checks and cache info.
\item \code{3}: also print any potentially missing items.
\item \code{4}: also print imports and writes to the cache.
}}

\item{hook}{Deprecated.}

\item{cache}{drake cache as created by \code{\link[=new_cache]{new_cache()}}.
See also \code{\link[=get_cache]{get_cache()}} and \code{\link[=this_cache]{this_cache()}}.}

\item{fetch_cache}{character vector containing lines of code.
The purpose of this code is to fetch the \code{storr} cache
with a command like \code{\link[=storr_rds]{storr_rds()}} or \code{\link[=storr_dbi]{storr_dbi()}},
but customized. This feature is experimental. It will turn out
to be necessary if you are using both custom non-RDS caches
and distributed parallelism (\code{parallelism = "future_lapply"}
or \code{"Makefile"}) because the distributed R sessions
need to know how to load the cache.}

\item{parallelism}{character, type of parallelism to use.
To list the options, call \code{\link[=parallelism_choices]{parallelism_choices()}}.
For detailed explanations, see the
\href{https://ropenscilabs.github.io/drake-manual/hpc.html}{high-performance computing chapter}
of the user manual.}

\item{jobs}{maximum number of parallel workers for processing the targets.
If you wish to parallelize the imports and preprocessing as well, you can
use a named numeric vector of length 2, e.g.
\code{make(jobs = c(imports = 4, targets = 8))}.
\code{make(jobs = 4)} is equivalent to \code{make(jobs = c(imports = 1, targets = 4))}.

Windows users should not set \code{jobs > 1} if
\code{parallelism} is \code{"mclapply"} because
\code{\link[=mclapply]{mclapply()}} is based on forking. Windows users
who use \code{parallelism = "Makefile"} will need to
download and install Rtools.

You can experiment with \code{\link[=predict_runtime]{predict_runtime()}}
to help decide on an appropriate number of jobs.
For details, visit
\url{https://ropenscilabs.github.io/drake-manual/time.html}.}

\item{packages}{character vector packages to load, in the order
they should be loaded. Defaults to \code{rev(.packages())}, so you
should not usually need to set this manually. Just call
\code{\link[=library]{library()}} to load your packages before \code{make()}.
However, sometimes packages need to be strictly forced to load
in a certain order, especially if \code{parallelism} is
\code{"Makefile"}. To do this, do not use \code{\link[=library]{library()}}
or \code{\link[=require]{require()}} or \code{\link[=loadNamespace]{loadNamespace()}} or
\code{\link[=attachNamespace]{attachNamespace()}} to load any libraries beforehand.
Just list your packages in the \code{packages} argument in the order
you want them to be loaded.
If \code{parallelism} is \code{"mclapply"},
the necessary packages
are loaded once before any targets are built. If \code{parallelism} is
\code{"Makefile"}, the necessary packages are loaded once on
initialization and then once again for each target right
before that target is built.}

\item{prework}{character vector of lines of code to run
before build time. This code can be used to
load packages, set options, etc., although the packages in the
\code{packages} argument are loaded before any prework is done.
If \code{parallelism} is \code{"mclapply"}, the \code{prework}
is run once before any targets are built. If \code{parallelism} is
\code{"Makefile"}, the prework is run once on initialization
and then once again for each target right before that target is built.}

\item{prepend}{lines to prepend to the Makefile if \code{parallelism}
is \code{"Makefile"}. See the \href{https://ropenscilabs.github.io/drake-manual/store.html}{high-performance computing guide}
to learn how to use \code{prepend}
to distribute work on a cluster.}

\item{command}{character scalar, command to call the Makefile
generated for distributed computing.
Only applies when \code{parallelism} is \code{"Makefile"}.
Defaults to the usual \code{"make"}
(\code{\link[=default_Makefile_command]{default_Makefile_command()}}),
but it could also be
\code{"lsmake"} on supporting systems, for example.
\code{command} and \code{args} are executed via
\code{system2(command, args)} to run the Makefile.
If \code{args} has something like \code{"--jobs=2"}, or if
\code{jobs >= 2} and \code{args} is left alone, targets
will be distributed over independent parallel R sessions
wherever possible.}

\item{args}{command line arguments to call the Makefile for
distributed computing. For advanced users only. If set,
\code{jobs} and \code{verbose} are overwritten as they apply to the
Makefile.
\code{command} and \code{args} are executed via
\code{system2(command, args)} to run the Makefile.
If \code{args} has something like \code{"--jobs=2"}, or if
\code{jobs >= 2} and \code{args} is left alone, targets
will be distributed over independent parallel R sessions
wherever possible.}

\item{recipe_command}{Character scalar, command for the
Makefile recipe for each target.}

\item{timeout}{\code{deprecated}. Use \code{elapsed} and \code{cpu} instead.}

\item{cpu}{Same as the \code{cpu} argument of \code{setTimeLimit()}.
Seconds of cpu time before a target times out.
Assign target-level cpu timeout times with an optional \code{cpu}
column in \code{plan}.}

\item{elapsed}{Same as the \code{elapsed} argument of \code{setTimeLimit()}.
Seconds of elapsed time before a target times out.
Assign target-level elapsed timeout times with an optional \code{elapsed}
column in \code{plan}.}

\item{retries}{Number of retries to execute if the target fails.
Assign target-level retries with an optional \code{retries}
column in \code{plan}.}

\item{force}{Logical. If \code{FALSE} (default) then \code{drake} will stop you
if the cache was created with an old
and incompatible version of drake.
This gives you an opportunity to
downgrade \code{drake} to a compatible version
rather than rerun all your targets from scratch.
If \code{force} is \code{TRUE}, then \code{make()} executes your workflow
regardless of the version of \code{drake} that last ran \code{make()} on the cache.}

\item{log_progress}{logical, whether to log the progress
of individual targets as they are being built. Progress logging
creates a lot of little files in the cache, and it may make builds
a tiny bit slower. So you may see gains in storage efficiency
and speed with
\code{make(..., log_progress = FALSE)}. But be warned that
\code{\link[=progress]{progress()}} and \code{\link[=in_progress]{in_progress()}}
will no longer work if you do that.}

\item{graph}{An \code{igraph} object from the previous \code{make()}.
Supplying a pre-built graph could save time.}

\item{trigger}{Name of the trigger to apply to all targets.
Ignored if \code{plan} has a \code{trigger} column.
See \code{\link[=trigger]{trigger()}} for details.}

\item{skip_targets}{logical, whether to skip building the targets
in \code{plan} and just import objects and files.}

\item{skip_imports}{logical, whether to totally neglect to
process the imports and jump straight to the targets. This can be useful
if your imports are massive and you just want to test your project,
but it is bad practice for reproducible data analysis.
This argument is overridden if you supply your own \code{graph} argument.}

\item{skip_safety_checks}{logical, whether to skip the safety checks
on your workflow. Use at your own peril.}

\item{lazy_load}{either a character vector or a logical. Choices:
\itemize{
\item \code{"eager"}: no lazy loading. The target is loaded right away
with \code{\link[=assign]{assign()}}.
\item \code{"promise"}: lazy loading with \code{\link[=delayedAssign]{delayedAssign()}}
\item \code{"bind"}: lazy loading with active bindings:
\code{bindr::populate_env()}.
\item \code{TRUE}: same as \code{"promise"}.
\item \code{FALSE}: same as \code{"eager"}.
}

\code{lazy_load} should not be \code{"promise"}
for \code{"parLapply"} parallelism combined with \code{jobs} greater than 1.
For local multi-session parallelism and lazy loading, try
\code{library(future); future::plan(multisession)} and then
\code{make(..., parallelism = "future_lapply", lazy_load = "bind")}.

If \code{lazy_load} is \code{"eager"},
drake prunes the execution environment before each target/stage,
removing all superfluous targets
and then loading any dependencies it will need for building.
In other words, drake prepares the environment in advance
and tries to be memory efficient.
If \code{lazy_load} is \code{"bind"} or \code{"promise"}, drake assigns
promises to load any dependencies at the last minute.
Lazy loading may be more memory efficient in some use cases, but
it may duplicate the loading of dependencies, costing time.}

\item{session_info}{logical, whether to save the \code{sessionInfo()}
to the cache. This behavior is recommended for serious \code{\link[=make]{make()}}s
for the sake of reproducibility. This argument only exists to
speed up tests. Apparently, \code{sessionInfo()} is a bottleneck
for small \code{\link[=make]{make()}}s.}

\item{cache_log_file}{Name of the cache log file to write.
If \code{TRUE}, the default file name is used (\code{drake_cache.log}).
If \code{NULL}, no file is written.
If activated, this option uses
\code{\link[=drake_cache_log_file]{drake_cache_log_file()}} to write a flat text file
to represent the state of the cache
(fingerprints of all the targets and imports).
If you put the log file under version control, your commit history
will give you an easy representation of how your results change
over time as the rest of your project changes. Hopefully,
this is a step in the right direction for data reproducibility.}

\item{seed}{integer, the root pseudo-random number generator
seed to use for your project.
In \code{\link[=make]{make()}}, \code{drake} generates a unique
local seed for each target using the global seed
and the target name. That way, different pseudo-random numbers
are generated for different targets, and this pseudo-randomness
is reproducible.

To ensure reproducibility across different R sessions,
\code{set.seed()} and \code{.Random.seed} are ignored and have no affect on
\code{drake} workflows. Conversely, \code{make()} does not usually
change \code{.Random.seed},
even when pseudo-random numbers are generated.
The exceptions to this last point are
\code{make(parallelism = "clustermq")} and
\code{make(parallelism = "clustermq_staged")},
because the \code{clustermq} package needs to generate random numbers
to set up ports and sockets for ZeroMQ.

On the first call to \code{make()} or \code{drake_config()}, \code{drake}
uses the random number generator seed from the \code{seed} argument.
Here, if the \code{seed} is \code{NULL} (default), \code{drake} uses a \code{seed} of \code{0}.
On subsequent \code{make()}s for existing projects, the project's
cached seed will be used in order to ensure reproducibility.
Thus, the \code{seed} argument must either be \code{NULL} or the same
seed from the project's cache (usually the \code{.drake/} folder).
To reset the random number generator seed for a project,
use \code{clean(destroy = TRUE)}.}

\item{caching}{character string, only applies to
\code{"clustermq"}, \code{"clustermq_staged"}, and \code{"future"} parallel backends.
The \code{caching} argument can be either \code{"master"} or \code{"worker"}.
\itemize{
\item \code{"master"}: Targets are built by remote workers and sent back to
the master process. Then, the master process saves them to the
cache (\code{config$cache}, usually a file system \code{storr}).
Appropriate if remote workers do not have access to the file system
of the calling R session. Targets are cached one at a time,
which may be slow in some situations.
\item \code{"worker"}: Remote workers not only build the targets, but also
save them to the cache. Here, caching happens in parallel.
However, remote workers need to have access to the file system
of the calling R session. Transferring target data across
a network can be slow.
}}

\item{keep_going}{logical, whether to still keep running \code{\link[=make]{make()}}
if targets fail.}

\item{session}{An optional \code{callr} function if you want to
build all your targets in a separate master session:
for example, \code{make(plan = my_plan, session = callr::r_vanilla)}.
Running \code{make()} in a clean, isolated
session can enhance reproducibility.
But be warned: if you do this, \code{\link[=make]{make()}} will take longer to start.
If \code{session} is \code{NULL} (default), then \code{\link[=make]{make()}} will just use
your current R session as the master session. This is slightly faster,
but it causes \code{\link[=make]{make()}} to populate your workspace/environment
with the last few targets it builds.}

\item{imports_only}{deprecated. Use \code{skip_targets} instead.}

\item{pruning_strategy}{deprecated. See \code{memory_strategy}.}

\item{makefile_path}{Path to the \code{Makefile} for
\code{make(parallelism = "Makefile")}. If you set this argument to a
non-default value, you are responsible for supplying this same
path to the \code{args} argument so \code{make} knows where to find it.
Example: \code{make(parallelism = "Makefile", makefile_path = ".drake/.makefile", command = "make", args = "--file=.drake/.makefile")}}

\item{console_log_file}{character scalar,
connection object (such as \code{stdout()}) or \code{NULL}.
If \code{NULL}, console output will be printed
to the R console using \code{message()}.
If a character scalar, \code{console_log_file}
should be the name of a flat file, and
console output will be appended to that file.
If a connection object (e.g. \code{stdout()})
warnings and messages will be sent to the connection.
For example, if \code{console_log_file} is \code{stdout()},
warnings and messages are printed to the console in real time
(in addition to the usual in-bulk printing
after each target finishes).}

\item{ensure_workers}{logical, whether the master process
should wait for the workers to post before assigning them
targets. Should usually be \code{TRUE}. Set to \code{FALSE}
for \code{make(parallelism = "future_lapply", jobs = n)}
(\code{n > 1}) when combined with \code{future::plan(future::sequential)}.
This argument only applies to parallel computing with persistent workers
(\code{make(parallelism = x)}, where \code{x} could be \code{"mclapply"},
\code{"parLapply"}, or \code{"future_lapply"}).}

\item{garbage_collection}{logical, whether to call \code{gc()} each time
a target is built during \code{\link[=make]{make()}}.}

\item{template}{a named list of values to fill in the \code{{{ ... }}}
placeholders in template files (e.g. from \code{\link[=drake_hpc_template_file]{drake_hpc_template_file()}}).
Same as the \code{template} argument of \code{clustermq::Q()} and
\code{clustermq::workers}.
Enabled for \code{clustermq} only (\code{make(parallelism = "clustermq_staged")}),
not \code{future} or \code{batchtools} so far.
For more information, see the \code{clustermq} package:
\url{https://github.com/mschubert/clustermq}.
Some template placeholders such as \code{{{ job_name }}} and \code{{{ n_jobs }}}
cannot be set this way.}

\item{sleep}{In its parallel processing, \code{drake} uses
a central master process to check what the parallel
workers are doing, and for the affected high-performance
computing workflows, wait for data to arrive over a network.
In between loop iterations, the master process sleeps to avoid throttling.
The \code{sleep} argument to \code{make()} and \code{drake_config()}
allows you to customize how much time the master process spends
sleeping.

The \code{sleep} argument is a function that takes an argument
\code{i} and returns a numeric scalar, the number of seconds to
supply to \code{Sys.sleep()} after iteration \code{i} of checking.
(Here, \code{i} starts at 1.)
If the checking loop does something other than sleeping
on iteration \code{i}, then \code{i} is reset back to 1.

To sleep for the same amount of time between checks,
you might supply something like \code{function(i) 0.01}.
But to avoid consuming too many resources during heavier
and longer workflows, you might use an exponential
back-off: say,
\code{function(i) { 0.1 + 120 * pexp(i - 1, rate = 0.01) }}.}

\item{hasty_build}{a user-defined function.
In "hasty mode" (\code{make(parallelism = "hasty")})
this is the function that evaluates a target's command
and returns the resulting value. The \code{hasty_build} argument
has no effect if \code{parallelism} is any value other than "hasty".

The function you pass to \code{hasty_build} must have arguments \code{target}
and \code{config}. Here, \code{target} is a character scalar naming the
target being built, and \code{config} is a configuration list of
runtime parameters generated by \code{\link[=drake_config]{drake_config()}}.}

\item{memory_strategy}{Character scalar, name of the
strategy \code{drake} uses to manage targets in memory. For more direct
control over which targets \code{drake} keeps in memory, see the
help file examples of \code{\link[=drake_envir]{drake_envir()}}. The \code{memory_strategy} argument
to \code{make()} and \code{drake_config()} is an attempt at an automatic
catch-all solution. These are the choices.
\itemize{
\item \code{"speed"}: Once a target is loaded in memory, just keep it there.
Maximizes speed, but hogs memory.
\item \code{"memory"}: For each target, unload everything from memory
except the target's direct dependencies. Conserves memory,
but sacrifices speed because each new target needs to reload
any previously unloaded targets from the cache.
\item \code{"lookahead"} (default): keep loaded targets in memory until they are
no longer needed as dependencies in downstream build steps.
Then, unload them from the environment. This step avoids
keeping unneeded data in memory and minimizes expensive
reads from the cache. However, it requires looking ahead
in the dependency graph, which could add overhead for every
target of projects with lots of targets.
}

Each strategy has a weakness.
\code{"speed"} is memory-hungry, \code{"memory"} wastes time reloading
targets from storage, and \code{"lookahead"} wastes time
traversing the entire dependency graph on every \code{\link[=make]{make()}}. For a better
compromise and more control, see the examples in the help file
of \code{\link[=drake_envir]{drake_envir()}}.}

\item{layout}{\code{config$layout}, where \code{config} is the return value
from a prior call to \code{drake_config()}. If your plan or environment
have changed since the last \code{make()}, do not supply a \code{layout} argument.
Otherwise, supplying one could save time.}
}
\value{
The master internal configuration list of a project.
}
\description{
This configuration list
is also required for functions such as \code{\link[=outdated]{outdated()}}.
It is meant to be specific to
a single call to \code{\link[=make]{make()}}, and you should not modify
it by hand afterwards. If you later plan to call \code{\link[=make]{make()}}
with different arguments (especially \code{targets}),
you should refresh the config list with another call to
\code{\link[=drake_config]{drake_config()}}. For changes to the
\code{targets} argument
specifically, it is important to recompute the config list
to make sure the internal workflow network has all the targets you need.
Modifying the \code{targets} element afterwards will have no effect
and it could lead to false negative results from
\code{\link[=outdated]{outdated()}}
}
\examples{
\dontrun{
test_with_dir("Quarantine side effects.", {
load_mtcars_example() # Get the code with drake_example("mtcars").
# Construct the master internal configuration list.
config <- drake_config(my_plan)
vis_drake_graph(config) # See the dependency graph.
sankey_drake_graph(config) # See the dependency graph.
# These functions are faster than otherwise
# because they use the configuration list.
outdated(config) # Which targets are out of date?
missed(config) # Which imports are missing?
})
}
}
\seealso{
\code{\link[=make]{make()}}, \code{\link[=drake_plan]{drake_plan()}}, \code{\link[=vis_drake_graph]{vis_drake_graph()}}
}
