Rcpp::asis
vignette engine. No rendering on CRAN runners.rmarkdown from Suggests (no longer
needed).ggml_graph_print() output captured in
test-graph-utils.R; C-level broadcast warnings captured in
ONNX broadcast and resize-broadcast tests.gguf_load(path) — opens a GGUF file
(v2/v3) and reads all metadata and tensor descriptors. Returns an S3
object of class "gguf".gguf_metadata(x) — returns all
key-value metadata pairs as a named list (architecture, tokenizer
config, quantization info, etc.).gguf_tensor_names(x) — lists all
tensor names in the file.gguf_tensor_info(x, name) — returns
shape, type, and size in bytes for a single tensor.gguf_tensor_data(x, name) —
dequantizes (if needed) and returns tensor weights as an R numeric array
with correct dimensions.gguf_free(x) — explicitly frees GGUF
context (also called by GC).print.gguf() method shows file version, tensor count,
and metadata count.VK_KHR_push_descriptor): unchanged — when the extension is
available and maxPushDescriptors >= 12, descriptor sets
are pushed directly into the command buffer via
pushDescriptorSetKHR(), eliminating descriptor pool
overhead. Falls back to the traditional descriptor pool path on hardware
without the extension.fit() now accepts a
callbacks parameter for sequential models (passed through
to ggml_fit_sequential()).test-gguf.R,
test-graph-utils.R, test-inplace-ops.R,
test-keras-api.R, test-misc-ops.R,
test-model-ops.R, test-print-methods.R,
test-tensor-utils.R, test-threading.R,
test-autograd-missing.R,
test-nn-functional-missing.R,
test-quants-missing.R.src/ and
inst/include/ headers: configure and
configure.win now automatically sync all public headers
from src/ to inst/include/ at install time.
Previously, changes to GGML_MAX_DIMS (4→5) and other
structs in src/ggml.h were not propagated to the exported
headers, causing segfaults in downstream packages (e.g. sd2R).tests/testthat/test-headers-sync.R to verify that
inst/include/ headers remain in sync with src/
headers and that GGML_MAX_DIMS is consistent.ggml_view_5d() — new API function for
creating 5D views with explicit strides, extending the existing 1D–4D
view family. Uses the existing ggml_view_impl()
internally.ggml_repeat_5d() — new API function
for tiling tensors up to 5D. CPU kernels
(ggml_compute_forward_repeat_f32,
ggml_compute_forward_repeat_f16) updated with a 5th loop
dimension. Vulkan dispatch collapses dim3×dim4 into push constants
transparently (no shader changes needed — push constants remain at 128
bytes).onnx_ggml.c (~20 sites):
ne[GGML_MAX_DIMS] arrays, switch with
case 5: new_tensor_5d.onnx_broadcast_align): all
reshape/new_tensor calls use dimension-aware helpers.onnx_reshape_nd().ggml_repeat_5d().tmap_put_nd() and slice_fill arrays
updated to GGML_MAX_DIMS.onnx_reshape_nd(),
onnx_new_tensor_nd(), ne_product() — eliminate
switch/case duplication.ggml_permute API
limitation).ConstantOfShape read the
value TensorProto attribute as float regardless of
data_type. When data_type=7 (INT64), the
8-byte int64 was reinterpreted as a 4-byte float, producing garbage
values (~1.4e-45 instead of 1). This broke attention mask generation
(fill=0 instead of 1) and position ID generation (NonZero on zeros =
empty).ConstantOfShape now checks data_type
and correctly handles INT64, INT32, DOUBLE, and FLOAT value
attributes.ggml_get_rows which only supports 2D data.
For axis=0 on rank>2 (e.g. CaiT QKV split on
[48,576,6,3]), the tensor is now reshaped to 2D, gathered,
and reshaped back.GGML_OP_SCATTER_ELEMENTS added to the ggml engine
with both CPU kernel and Vulkan compute shader.scatter_elements.comp):
two variants compiled at install time —
scatter_elements_none (overwrite) and
scatter_elements_add (atomicAdd via
GL_EXT_shader_atomic_float). Data is copied to output via
vkCmdCopyBuffer with a pipeline barrier before the scatter
dispatch.ScatterElements op with
axis=0 and reduction="none"/"add" attributes.
Indices cast to I32, updates/data cast to F32 automatically.ggml_map_custom3 op. The CPU kernel
computes 2D relative position bias directly:
bias[b,hq,wq,hk,wk] = dot(x, W_h) + dot(x_transposed, W_w).detect_pos_embed_blocks() identifies
contiguous node ranges with /pos_embed/ in output names,
extracts W_h/W_w initializer shapes to determine H, W, C, validates F32
data type.onnx_ggml_run(), input data is copied into pinned memory
before ggml_backend_tensor_set() — the Vulkan driver
detects the pinned source pointer and performs direct DMA transfer to
VRAM, bypassing the internal staging copy.ggml_backend_vk_host_buffer_type() returns
NULL or buffer is too small, the standard staging path is used
transparently.onnx_device_info(): added NULL guards for
ctx->graph and n_nodes == 0 edge cases that
caused segfault when called on models before first inference run.ggml_predict() with stochastic
dropout: nn_build_graph() now receives
training = FALSE during inference, so stochastic Bernoulli
dropout is disabled at predict time. Previously,
stochastic = TRUE dropout layers applied random masks
during inference, degrading accuracy.ggml_fit() return value: the return
value of ggml_fit() must be assigned back to
model to obtain trained weights
(model <- ggml_fit(...)). This is now clarified in all
examples and documentation. Using
history <- ggml_fit(...) without reassigning
model leaves the model with untrained weights.ggml_evaluate() return value: now
includes n_samples in addition to loss and
accuracy. Metrics are computed on all samples without
truncation (via ggml_predict() internally).inst/examples/titanic_classification.R — new end-to-end
binary classification example on the Titanic dataset. Demonstrates
feature engineering (Title, FamilySize, IsAlone), stratified train/val
split, one-hot encoding, dropout regularization, and manual validation
metrics (accuracy, precision, recall, F1, confusion matrix). Achieves
~82% val accuracy.weight_buf and
never re-transferred between runs. Previous architecture reloaded all
weights before every onnx_run() call — eliminated
entirely.ctx_weight / ctx contexts: weight
tensors live in a permanent GPU buffer that the scheduler never aliases;
compute tensors are managed by ggml_backend_sched
independently.onnx_device_info() — scheduler diagnostic: number of
splits, GPU/CPU op counts, CPU-only op list.inst/examples/benchmark_onnx.R):
proper VRAM cleanup between models via rm() +
gc().onnx_load(path, device, input_shapes) — load an ONNX
model file, build a ggml computation graph, and allocate tensors on
Vulkan GPU or CPU. Weights are loaded via memory-mapped file (zero-copy
where possible).onnx_run(model, inputs) — run inference on a loaded
ONNX model with named input data.onnx_inputs(model) — list expected input tensor names
and shapes.onnx_summary(model) — return model metadata (IR
version, opset, producer, ops used).print.onnx_model() — formatted summary of a loaded ONNX
model.input_shapes parameter for models with dynamic
dimensions: specify fixed shapes at load time
(e.g. input_shapes = list(image = c(1L, 3L, 224L, 224L))).auto_pad attribute (SAME_UPPER, SAME_LOWER) supported
for Conv and pooling ops.input_shapes (Conv,
Reshape, Transpose)input_shapes (1180 nodes)input_shapes (482 nodes:
MatMul, LayerNorm, GELU, Softmax)inst/lib/libggml.a, breaking static linking from dependent
packages (e.g. llamaR).dp_train(make_model, data, loss_fn, forward_fn, target_fn, n_gpu, n_iter, lr, max_norm, verbose)
— data-parallel training across multiple replicas. Weights are broadcast
from replica 0 before the first step; gradients are averaged across
replicas each iteration; weights are re-broadcast after each optimizer
update. Returns list(params, loss_history, model).ag_mul and ag_sub now support CPU
broadcast: [d×s] * [1×s] and [d×s] * [d×1]
shapes work correctly with proper gradient reduction.ag_softmax_cross_entropy_loss accepts integer target
vectors (0-based class indices) and converts them to one-hot
automatically.ggml_sum_rows f16 on Vulkan: F16→F16 dispatch now
supported natively (no CPU fallback).ag_tensor() / ag_param() —
environment-backed tensors with reference semantics; in-place optimizer
updates visible to all references.with_grad_tape({ ... }) — enables the global gradient
tape for the enclosed forward pass.backward(loss) — reverse-mode automatic
differentiation; returns a gradient environment keyed by tensor id.ag_matmul, ag_add
(with bias broadcast), ag_sub, ag_mul,
ag_scale.ag_relu, ag_sigmoid,
ag_tanh, ag_softmax.ag_sum, ag_mean,
ag_log, ag_exp, ag_pow,
ag_clamp.ag_reshape, ag_transpose.ag_mse_loss,
ag_cross_entropy_loss,
ag_softmax_cross_entropy_loss (numerically-stable
fused).optimizer_sgd() — SGD with optional momentum.optimizer_adam() — Adam with bias-corrected moment
estimates.ag_linear() — Glorot-initialised dense layer
(closure-based, returns $forward,
$params()).ag_gradcheck() — central finite-difference gradient
checker (like torch.autograd.gradcheck).ag_sequential(...) — ordered layer container; collects
all parameters for the optimizer.ag_dropout(rate) — inverted dropout; identity in eval
mode.ag_batch_norm(num_features) — batch normalisation with
running statistics and learnable γ/β.ag_embedding(vocab_size, dim) — token lookup with
scatter-add backward.ag_train(model) / ag_eval(model) — switch
all sub-layers between train and eval mode.ag_dataloader(x, y, batch_size, shuffle, col_major) —
mini-batch iterator with shuffle and $epoch() helper.lr_scheduler_step(optimizer, step_size, gamma) —
step-decay learning rate.lr_scheduler_cosine(optimizer, T_max, lr_min, restart)
— cosine-annealing (with optional SGDR warm restarts).clip_grad_norm(params, grads, max_norm) — clips all
gradients by global L2 norm in-place.ggml_layer_lstm() — LSTM recurrent layer (unrolled
BPTT).ggml_layer_gru() — GRU recurrent layer (unrolled
BPTT).ggml_layer_global_max_pooling_2d() — reduces
[H,W,C] to [C] via max pooling.ggml_layer_global_average_pooling_2d() — reduces
[H,W,C] to [C] via average pooling.ggml_save_model() — saves full model (architecture +
weights) to RDS file.ggml_load_model() — restores a model saved with
ggml_save_model().ggml_dense(), ggml_conv_2d(),
ggml_conv_1d(), ggml_batch_norm(),
ggml_embedding(), ggml_lstm(),
ggml_gru() — layer object constructors returning a reusable
ggml_layer object.ggml_apply(tensor, layer) — applies a
ggml_layer object to a tensor node; shared weights by
object identity.ggml_layer_dropout() — dropout with deterministic or
stochastic (per-epoch Bernoulli mask) mode.ggml_layer_embedding() — token embedding lookup for
integer inputs.ggml_input() gains dtype argument
("float32" or "int32").ggml_model() and
ggml_predict().ggml_input() — declare a symbolic input tensor node
(Functional API).ggml_model() — assemble a
ggml_functional_model from input/output nodes.ggml_layer_add() — element-wise addition of tensor
nodes (residual connections).ggml_layer_concatenate() — concatenate tensor nodes
along an axis.ggml_layer_*() functions now accept a
ggml_tensor_node as first argument (Functional API
mode).ggml_compile(), ggml_fit(),
ggml_evaluate(), ggml_predict() are now S3
generics with methods for ggml_functional_model.ggml_fit_opt() — low-level optimizer loop with
callbacks and learning-rate control.ggml_callback_early_stopping() — stops training when a
metric stagnates.ggml_schedule_step_decay() — step learning-rate
decay.ggml_schedule_cosine_decay() — cosine learning-rate
annealing.ggml_schedule_reduce_on_plateau() — reduces LR when
metric stops improving.ggml_opt_init_for_fit(),
ggml_opt_set_lr(), ggml_opt_get_lr() —
learning-rate control without recreating the optimizer context.configure.win.ggml_layer_conv_1d() — 1D convolution layer.ggml_layer_batch_norm() — batch normalization
layer.ggml_predict_classes() — argmax wrapper returning
1-based class indices.summary.ggml_sequential_model() — detailed model
summary with parameter counts.ggml_fit() now returns model$history
(class ggml_history) with print and
plot methods.ggml_model_sequential(),
ggml_layer_dense(), ggml_layer_conv_2d(),
ggml_layer_max_pooling_2d(),
ggml_layer_flatten(), ggml_compile(),
ggml_fit(), ggml_evaluate(),
ggml_predict(), ggml_save_weights(),
ggml_load_weights().ggml_timestep_embedding() — sinusoidal timestep
embeddings.ggml_set_f32_nd(),
ggml_get_f32_nd(), ggml_set_i32_nd(),
ggml_get_i32_nd().ggml_tensor_nb(),
ggml_tensor_num(), ggml_tensor_copy(),
ggml_tensor_set_f32_scalar(),
ggml_get_first_tensor(),
ggml_get_next_tensor().libggml.a exported for linking by
dependent packages.gguf.cpp added for GGUF file format support.inst/include/ for
LinkingTo.ggml_opt_init(),
ggml_opt_free(), ggml_opt_fit(),
ggml_opt_epoch(), ggml_opt_eval().ggml_opt_dataset_init(),
ggml_opt_dataset_data(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_shuffle().ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_accuracy(),
ggml_opt_result_pred().
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.