% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/layers-normalization.R
\name{layer_layer_normalization}
\alias{layer_layer_normalization}
\title{Layer normalization layer (Ba et al., 2016).}
\usage{
layer_layer_normalization(
  object,
  axis = -1L,
  epsilon = 0.001,
  center = TRUE,
  scale = TRUE,
  rms_scaling = FALSE,
  beta_initializer = "zeros",
  gamma_initializer = "ones",
  beta_regularizer = NULL,
  gamma_regularizer = NULL,
  beta_constraint = NULL,
  gamma_constraint = NULL,
  ...
)
}
\arguments{
\item{object}{Object to compose the layer with. A tensor, array, or sequential model.}

\item{axis}{Integer or list. The axis or axes to normalize across.
Typically, this is the features axis/axes. The left-out axes are
typically the batch axis/axes. \code{-1} is the last dimension in the
input. Defaults to \code{-1}.}

\item{epsilon}{Small float added to variance to avoid dividing by zero.
Defaults to 1e-3.}

\item{center}{If \code{TRUE}, add offset of \code{beta} to normalized tensor. If \code{FALSE},
\code{beta} is ignored. Defaults to \code{TRUE}.}

\item{scale}{If \code{TRUE}, multiply by \code{gamma}. If \code{FALSE}, \code{gamma} is not used.
When the next layer is linear (also e.g. \code{layer_activation_relu()}), this can be
disabled since the scaling will be done by the next layer.
Defaults to \code{TRUE}.}

\item{rms_scaling}{If \code{TRUE}, \code{center} and \code{scale} are ignored, and the
inputs are scaled by \code{gamma} and the inverse square root
of the square of all inputs. This is an approximate and faster
approach that avoids ever computing the mean of the input. Note that
this \emph{isn't} equivalent to the computation that the
\code{layer_rms_normalization} layer performs.}

\item{beta_initializer}{Initializer for the beta weight. Defaults to zeros.}

\item{gamma_initializer}{Initializer for the gamma weight. Defaults to ones.}

\item{beta_regularizer}{Optional regularizer for the beta weight.
\code{NULL} by default.}

\item{gamma_regularizer}{Optional regularizer for the gamma weight.
\code{NULL} by default.}

\item{beta_constraint}{Optional constraint for the beta weight.
\code{NULL} by default.}

\item{gamma_constraint}{Optional constraint for the gamma weight.
\code{NULL} by default.}

\item{...}{Base layer keyword arguments (e.g. \code{name} and \code{dtype}).}
}
\value{
The return value depends on the value provided for the first argument.
If  \code{object} is:
\itemize{
\item a \code{keras_model_sequential()}, then the layer is added to the sequential model
(which is modified in place). To enable piping, the sequential model is also
returned, invisibly.
\item a \code{keras_input()}, then the output tensor from calling \code{layer(input)} is returned.
\item \code{NULL} or missing, then a \code{Layer} instance is returned.
}
}
\description{
Normalize the activations of the previous layer for each given example in a
batch independently, rather than across a batch like Batch Normalization.
i.e. applies a transformation that maintains the mean activation within each
example close to 0 and the activation standard deviation close to 1.

If \code{scale} or \code{center} are enabled, the layer will scale the normalized
outputs by broadcasting them with a trainable variable \code{gamma}, and center
the outputs by broadcasting with a trainable variable \code{beta}. \code{gamma} will
default to a ones tensor and \code{beta} will default to a zeros tensor, so that
centering and scaling are no-ops before training has begun.

So, with scaling and centering enabled the normalization equations
are as follows:

Let the intermediate activations for a mini-batch to be the \code{inputs}.

For each sample \code{x} in a batch of \code{inputs}, we compute the mean and
variance of the sample, normalize each value in the sample
(including a small factor \code{epsilon} for numerical stability),
and finally,
transform the normalized output by \code{gamma} and \code{beta},
which are learned parameters:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{outputs <- inputs |> apply(1, function(x) \{
  x_normalized <- (x - mean(x)) /
                  sqrt(var(x) + epsilon)
  x_normalized * gamma + beta
\})
}\if{html}{\out{</div>}}

\code{gamma} and \code{beta} will span the axes of \code{inputs} specified in \code{axis}, and
this part of the inputs' shape must be fully defined.

For example:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{layer <- layer_layer_normalization(axis = c(2, 3, 4))

layer(op_ones(c(5, 20, 30, 40))) |> invisible() # build()
shape(layer$beta)
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## shape(20, 30, 40)

}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{shape(layer$gamma)
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## shape(20, 30, 40)

}\if{html}{\out{</div>}}

Note that other implementations of layer normalization may choose to define
\code{gamma} and \code{beta} over a separate set of axes from the axes being
normalized across. For example, Group Normalization
(\href{https://arxiv.org/abs/1803.08494}{Wu et al. 2018}) with group size of 1
corresponds to a \code{layer_layer_normalization()} that normalizes across height, width,
and channel and has \code{gamma} and \code{beta} span only the channel dimension.
So, this \code{layer_layer_normalization()} implementation will not match a
\code{layer_group_normalization()} layer with group size set to 1.
}
\section{Reference}{
\itemize{
\item \href{https://arxiv.org/abs/1607.06450}{Lei Ba et al., 2016}.
}
}

\seealso{
\itemize{
\item \url{https://keras.io/api/layers/normalization_layers/layer_normalization#layernormalization-class}
}

Other normalization layers: \cr
\code{\link{layer_batch_normalization}()} \cr
\code{\link{layer_group_normalization}()} \cr
\code{\link{layer_rms_normalization}()} \cr
\code{\link{layer_spectral_normalization}()} \cr
\code{\link{layer_unit_normalization}()} \cr

Other layers: \cr
\code{\link{Layer}()} \cr
\code{\link{layer_activation}()} \cr
\code{\link{layer_activation_elu}()} \cr
\code{\link{layer_activation_leaky_relu}()} \cr
\code{\link{layer_activation_parametric_relu}()} \cr
\code{\link{layer_activation_relu}()} \cr
\code{\link{layer_activation_softmax}()} \cr
\code{\link{layer_activity_regularization}()} \cr
\code{\link{layer_add}()} \cr
\code{\link{layer_additive_attention}()} \cr
\code{\link{layer_alpha_dropout}()} \cr
\code{\link{layer_attention}()} \cr
\code{\link{layer_aug_mix}()} \cr
\code{\link{layer_auto_contrast}()} \cr
\code{\link{layer_average}()} \cr
\code{\link{layer_average_pooling_1d}()} \cr
\code{\link{layer_average_pooling_2d}()} \cr
\code{\link{layer_average_pooling_3d}()} \cr
\code{\link{layer_batch_normalization}()} \cr
\code{\link{layer_bidirectional}()} \cr
\code{\link{layer_category_encoding}()} \cr
\code{\link{layer_center_crop}()} \cr
\code{\link{layer_concatenate}()} \cr
\code{\link{layer_conv_1d}()} \cr
\code{\link{layer_conv_1d_transpose}()} \cr
\code{\link{layer_conv_2d}()} \cr
\code{\link{layer_conv_2d_transpose}()} \cr
\code{\link{layer_conv_3d}()} \cr
\code{\link{layer_conv_3d_transpose}()} \cr
\code{\link{layer_conv_lstm_1d}()} \cr
\code{\link{layer_conv_lstm_2d}()} \cr
\code{\link{layer_conv_lstm_3d}()} \cr
\code{\link{layer_cropping_1d}()} \cr
\code{\link{layer_cropping_2d}()} \cr
\code{\link{layer_cropping_3d}()} \cr
\code{\link{layer_cut_mix}()} \cr
\code{\link{layer_dense}()} \cr
\code{\link{layer_depthwise_conv_1d}()} \cr
\code{\link{layer_depthwise_conv_2d}()} \cr
\code{\link{layer_discretization}()} \cr
\code{\link{layer_dot}()} \cr
\code{\link{layer_dropout}()} \cr
\code{\link{layer_einsum_dense}()} \cr
\code{\link{layer_embedding}()} \cr
\code{\link{layer_equalization}()} \cr
\code{\link{layer_feature_space}()} \cr
\code{\link{layer_flatten}()} \cr
\code{\link{layer_flax_module_wrapper}()} \cr
\code{\link{layer_gaussian_dropout}()} \cr
\code{\link{layer_gaussian_noise}()} \cr
\code{\link{layer_global_average_pooling_1d}()} \cr
\code{\link{layer_global_average_pooling_2d}()} \cr
\code{\link{layer_global_average_pooling_3d}()} \cr
\code{\link{layer_global_max_pooling_1d}()} \cr
\code{\link{layer_global_max_pooling_2d}()} \cr
\code{\link{layer_global_max_pooling_3d}()} \cr
\code{\link{layer_group_normalization}()} \cr
\code{\link{layer_group_query_attention}()} \cr
\code{\link{layer_gru}()} \cr
\code{\link{layer_hashed_crossing}()} \cr
\code{\link{layer_hashing}()} \cr
\code{\link{layer_identity}()} \cr
\code{\link{layer_integer_lookup}()} \cr
\code{\link{layer_jax_model_wrapper}()} \cr
\code{\link{layer_lambda}()} \cr
\code{\link{layer_lstm}()} \cr
\code{\link{layer_masking}()} \cr
\code{\link{layer_max_num_bounding_boxes}()} \cr
\code{\link{layer_max_pooling_1d}()} \cr
\code{\link{layer_max_pooling_2d}()} \cr
\code{\link{layer_max_pooling_3d}()} \cr
\code{\link{layer_maximum}()} \cr
\code{\link{layer_mel_spectrogram}()} \cr
\code{\link{layer_minimum}()} \cr
\code{\link{layer_mix_up}()} \cr
\code{\link{layer_multi_head_attention}()} \cr
\code{\link{layer_multiply}()} \cr
\code{\link{layer_normalization}()} \cr
\code{\link{layer_permute}()} \cr
\code{\link{layer_rand_augment}()} \cr
\code{\link{layer_random_brightness}()} \cr
\code{\link{layer_random_color_degeneration}()} \cr
\code{\link{layer_random_color_jitter}()} \cr
\code{\link{layer_random_contrast}()} \cr
\code{\link{layer_random_crop}()} \cr
\code{\link{layer_random_erasing}()} \cr
\code{\link{layer_random_flip}()} \cr
\code{\link{layer_random_gaussian_blur}()} \cr
\code{\link{layer_random_grayscale}()} \cr
\code{\link{layer_random_hue}()} \cr
\code{\link{layer_random_invert}()} \cr
\code{\link{layer_random_perspective}()} \cr
\code{\link{layer_random_posterization}()} \cr
\code{\link{layer_random_rotation}()} \cr
\code{\link{layer_random_saturation}()} \cr
\code{\link{layer_random_sharpness}()} \cr
\code{\link{layer_random_shear}()} \cr
\code{\link{layer_random_translation}()} \cr
\code{\link{layer_random_zoom}()} \cr
\code{\link{layer_repeat_vector}()} \cr
\code{\link{layer_rescaling}()} \cr
\code{\link{layer_reshape}()} \cr
\code{\link{layer_resizing}()} \cr
\code{\link{layer_rms_normalization}()} \cr
\code{\link{layer_rnn}()} \cr
\code{\link{layer_separable_conv_1d}()} \cr
\code{\link{layer_separable_conv_2d}()} \cr
\code{\link{layer_simple_rnn}()} \cr
\code{\link{layer_solarization}()} \cr
\code{\link{layer_spatial_dropout_1d}()} \cr
\code{\link{layer_spatial_dropout_2d}()} \cr
\code{\link{layer_spatial_dropout_3d}()} \cr
\code{\link{layer_spectral_normalization}()} \cr
\code{\link{layer_stft_spectrogram}()} \cr
\code{\link{layer_string_lookup}()} \cr
\code{\link{layer_subtract}()} \cr
\code{\link{layer_text_vectorization}()} \cr
\code{\link{layer_tfsm}()} \cr
\code{\link{layer_time_distributed}()} \cr
\code{\link{layer_torch_module_wrapper}()} \cr
\code{\link{layer_unit_normalization}()} \cr
\code{\link{layer_upsampling_1d}()} \cr
\code{\link{layer_upsampling_2d}()} \cr
\code{\link{layer_upsampling_3d}()} \cr
\code{\link{layer_zero_padding_1d}()} \cr
\code{\link{layer_zero_padding_2d}()} \cr
\code{\link{layer_zero_padding_3d}()} \cr
\code{\link{rnn_cell_gru}()} \cr
\code{\link{rnn_cell_lstm}()} \cr
\code{\link{rnn_cell_simple}()} \cr
\code{\link{rnn_cells_stack}()} \cr
}
\concept{layers}
\concept{normalization layers}
