Welcome to ClientVPS Mirrors

BenchmarkPerformance

BenchmarkPerformance

The purpose of this Vignette is to demonstrate the performance of onnxruntime inference with nativeORT, including CoreML capabilities. This will demonstrate nativeORT is capable of running at real-time (sub-29.97fps) inferencing.

This is tested on 50 256x256 arrays on an Apple M1 machine, simulating an incoming video stream.

nativeORT CPU & CoreML

# typical RGB 256x256 image
input <- array(
  runif(1 * 3 * 256 * 256),
  dim=c(1L, 3L, 256L, 256L)
)

session <- nativeORT::ort_session(model_path,
                                  threads=0L,
                                  opt_level=99L)

times_cpu <- numeric(100)
for (i in 1:100){
  times_cpu[i] <- system.time(
    nativeORT::ort_infer_raw(session, input)
  )["elapsed"] * 1000
}

# CoreML
dir.create(path.expand("~/.nativeORT/cache"),
           recursive = TRUE, showWarnings = FALSE
           )
session <- nativeORT::ort_session(model_path,
                                  provider='coreml',
                                  cache_dir=path.expand("~/.nativeORT/cache"),
                                  threads=0L,
                                  opt_level=99L
           )

times_coreml <- numeric(100)
for (i in 1:100){
  times_coreml[i] <- system.time(
    nativeORT::ort_infer_raw(session, input)
  )["elapsed"] * 1000
}
results <- data.frame(
  run=rep(1:length(times_cpu), 2),
  provider=c(
    rep("CPU (nativeORT)", length(times_cpu)),
    rep("CoreML (nativeORT)", length(times_coreml))
  ),
  latency_ms=c(times_cpu, times_coreml)
)

ggplot(results, aes(x=run, y=latency_ms, color=provider)) +
  geom_line() + 
  geom_hline(yintercept=33.3, linetype="dashed", color="red") +
  annotate("text", x=85, y=40, label="29.97 fps threshold") +
  labs(
    title="Inference Latency Across Inference Engines",
    subtitle="YOLOv11n, 256x256 Images, Apple M1",
    x="Run",
    y="Latency (ms)"
  ) +
  theme_minimal()

Results

Notably, nativeORT can run substantially below real-time requirements. Due to optimization in the C++ bindings, the CPU and CoreML latency are near parity; however, it is of note that the CoreML runs offer better stability as they sit on dedicated hardware, whereas the CPU is subject to slowdowns when other processes hit.

CoreML does require a warmup (as noticed in the spike) but after one or two inferences, it becomes real-time performant. At a median latency of 7-8 milliseconds on Apple M1 Silicon, there is still time to run post-processing and remain under target latency.

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.