Flash attention support: Enabled by default in
edge_load_model() via flash_attn = TRUE.
Reduces memory usage and improves attention computation speed on
CPU.
Full hardware thread utilization: Removed the
4-thread cap for small contexts. edge_load_model() now uses
all available CPU threads by default, with n_threads_batch
set to max for prompt processing.
User-configurable threading: New
n_threads parameter in edge_load_model()
allows explicit control over CPU thread count. Pass NULL
(default) for auto-detect or an integer to limit cores.
Apple Accelerate framework (macOS):
Automatically links the Accelerate framework on macOS builds, enabling
hardware-accelerated vDSP vector operations for faster matrix
math.
Compiler auto-vectorization: Added
-ftree-vectorize to GGML compilation flags on all
platforms, allowing GCC/Clang to generate SIMD instructions for eligible
loops beyond the hand-tuned GGML kernels.
Existing Features
SIMD-optimized build system: Replaced generic
scalar fallback with architecture-aware SIMD detection in both
Makevars (Unix) and Makevars.win (Windows)
x86_64: Enables SSE4.2 baseline by default (universal since Intel
Nehalem 2008)
aarch64/arm64: NEON support built into the ABI (no extra flags
needed)
Other architectures: Automatic generic fallback
User-configurable SIMD levels: Set
EDGEMODELR_SIMD environment variable before install to
select optimization level:
NATIVE: Uses -march=native for maximum
performance on the build machine
edge_simd_info(): New function to
query compile-time SIMD status including architecture, compiler
features, and GGML optimization flags
x86 architecture-specific quantization: Enabled
optimized x86 quantization kernels (arch/x86/quants.c,
arch/x86/repack.cpp) with SIMD-accelerated dot products and
matrix operations
Performance
15-40% faster inference on x86_64 with SSE4.2 baseline vs generic
scalar
Up to 2-3x faster with AVX2 for quantized model operations
SSSE3-accelerated integer multiply-accumulate for quantized dot
products
edgemodelr 0.1.5
CRAN Policy Fixes
Bug Fixes
Fixed donttest examples: Changed
resource-intensive examples from \donttest{} to
\dontrun{} to prevent downloading multi-GB models during
CRAN checks
Fixed M1 Mac compiler warnings: Added explicit
static_cast<> for:
double to float conversions for
temperature/top_p parameters
size_type to int32_t conversions for
buffer size parameters
Fixed connection handling: Replaced
on.exit() with tryCatch/finally for proper
connection cleanup in loops (thanks @eddelbuettel)
edgemodelr 0.1.4
Performance
Optimizations for Small Language Models
New Features
Small Model Configuration Helper: New
edge_small_model_config() function provides optimized
settings for small models (1B-3B parameters)
Device-specific presets: mobile, laptop, desktop, and server
Adaptive configuration based on model size and available RAM
Built-in performance tips and recommendations
Automatic parameter tuning for optimal inference speed
Adaptive Batch Processing: Intelligent batch size
optimization based on context length
Small contexts (≤512): Uses up to full context for batching
Medium contexts (512-2048): Uses 1/2 context for optimal
throughput
Large contexts (2048-4096): Uses 1/4 context to balance speed and
memory
Very large contexts (>4096): Caps at 2048 tokens for
stability
Smart Thread Allocation: Context-aware CPU thread
management
Small models automatically limit threads to avoid overhead
Reduces CPU contention on resource-constrained devices
Improves inference speed for models with contexts ≤2048 tokens
Automatic Context Optimization: Model size-based
context tuning
Small models (<1GB): Optimized to 1024 tokens for faster
inference
Medium models (1-2GB): Set to 1536 tokens for balanced
performance
Large models (>2GB): Maintains 2048+ tokens for quality
User override available via n_ctx parameter
Performance Improvements
Faster Small Model Inference: 15-30% speed
improvement for small models through optimized batch and thread
settings
Reduced Memory Footprint: Better memory efficiency
for resource-constrained environments
01_basic_usage.R: Fundamental operations including
model loading, text generation, parameter tuning, and error
handling
02_ollama_integration.R: Complete Ollama workflow
with model discovery, hash-based loading, and compatibility testing
03_streaming_generation.R: Real-time streaming text
generation with interactive chat interfaces and callback processing
04_performance_optimization.R: Advanced performance
tuning including GPU acceleration, benchmarking, memory management, and
batch processing
examples/README.md: Comprehensive documentation
with learning paths, troubleshooting guide, and customization
instructions
Package Structure
Improvements
Organized File Structure: Consolidated all examples
into structured examples/ directory with consistent formatting
Enhanced Documentation: Improved inline
documentation and example comments throughout
edgemodelr 0.1.1
Bug Fixes and Improvements
Compilation Fixes
macOS Boolean Conflicts: Completely resolved
Boolean enum conflicts by avoiding problematic system headers and using
direct function declarations
Filesystem Compatibility: Added comprehensive
fallback implementation for disabled std::filesystem on
macOS builds
Header Protection: Implemented robust
cross-platform header inclusion strategy that works with R, Rcpp, and
system headers
System Header Workarounds: Replaced
<mach-o/dyld.h> inclusion with direct function
declarations to avoid enum conflicts
Format Attribute Warnings: Suppressed unsupported
printf format attribute warnings on macOS Apple Clang compiler
CRAN Compliance: Removed non-portable optimization
flags (-march=native, -mtune=native, etc.)
from Makevars for CRAN compatibility
Cross-platform Build: Enhanced Makevars
configuration for better macOS compatibility with R package
requirements
Demo and Documentation
Updates
Modern UI: Updated streaming chat demo with modern
bslib interface for enhanced user experience
Documentation: Improved documentation for
edge_clean_cache() function
Examples: Enhanced streaming chat example with
better UI components
Technical Improvements
Build System: Updated Makevars files for improved
compilation on Windows and Unix systems
Core Bindings: Enhanced C++ bindings for better
performance and stability
edgemodelr 0.1.0
Initial CRAN Release
New Features
Local LLM Inference: Complete R interface for
running large language models locally using llama.cpp and GGUF model
files
Model Management: Built-in functions for
downloading and managing popular models from Hugging Face
Text Generation: Support for both blocking and
streaming text completion
Interactive Chat: Real-time streaming chat
interface with conversation history
Privacy-First: All processing happens locally
without external API calls
Core Functions
edge_load_model() - Load GGUF model files for
inference
edge_completion() - Generate text completions
edge_stream_completion() - Stream text generation with
real-time callbacks
edge_chat_stream() - Interactive chat session with
streaming responses
edge_free_model() - Memory management and cleanup
is_valid_model() - Model context validation
Model Management
edge_list_models() - List pre-configured popular
models
edge_download_model() - Download models from Hugging
Face Hub
edge_quick_setup() - One-line model download and
setup
System Support
Self-contained: Includes complete llama.cpp
implementation
Cross-platform: Works on Windows, macOS, and
Linux
CPU optimized: Runs efficiently on standard
hardware
Memory efficient: Support for quantized models
Documentation
Comprehensive getting started vignette
Complete API documentation with examples
README with extensive usage examples
Test coverage for all major functionality
Technical Implementation
C++17 integration via Rcpp
Real-time token streaming with callback support
Automatic memory management with RAII
Robust error handling and validation
Thread-safe model operations
This release provides a complete, production-ready solution for Local
Large Language Model Inference Engine in R, enabling private, offline
text generation workflows.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.