Benchmarks
The CI-generated table below compares common small-tensor operations for Tensor, Array, and SArray. It is meant to show that Tensorial stays in the same small-tensor performance range as SArray while working directly with tensor types.
For fourth-order tensor baselines, the Array and SArray implementations use the classical Voigt form so that symmetries are handled correctly. Tensorial performs the same operations on tensor types directly, without manually converting formulas to Voigt form.
a = rand(Vec{3})
A = rand(SecondOrderTensor{3})
S = rand(SymmetricSecondOrderTensor{3})
AA = rand(FourthOrderTensor{3})
SS = rand(SymmetricFourthOrderTensor{3})| Operation | Tensor | Array | Speedup | SArray | Speedup |
|---|---|---|---|---|---|
| Single contraction | |||||
a ⊡ a | 3.134 ns | 8.071 ns | ×2.6 | 3.475 ns | ×1.1 |
A ⊡ a | 3.485 ns | 48.383 ns | ×14.0 | 3.485 ns | ×1.0 |
S ⊡ a | 3.485 ns | 48.383 ns | ×14.0 | 3.485 ns | ×1.0 |
| Double contraction | |||||
A ⊡₂ A | 3.835 ns | 10.837 ns | ×2.8 | 3.485 ns | ×0.91 |
S ⊡₂ S | 3.485 ns | 9.183 ns | ×2.6 | 3.485 ns | ×1.0 |
AA ⊡₂ A | 8.562 ns | 68.279 ns | ×8.0 | 8.572 ns | ×1.0 |
SS ⊡₂ S | 4.486 ns | 61.569 ns | ×14.0 | 4.496 ns | ×1.0 |
| Tensor product | |||||
a ⊗ a | 3.835 ns | 30.146 ns | ×7.9 | 3.835 ns | ×1.0 |
| Cross product | |||||
a × a | 3.485 ns | 18.574 ns | ×5.3 | 3.825 ns | ×1.1 |
| Determinant | |||||
det(A) | 3.485 ns | 179.357 ns | ×51.0 | 3.825 ns | ×1.1 |
det(S) | 3.485 ns | 172.183 ns | ×49.0 | 3.825 ns | ×1.1 |
| Inverse | |||||
inv(A) | 6.479 ns | 426.422 ns | ×66.0 | 8.622 ns | ×1.3 |
inv(S) | 4.556 ns | 409.147 ns | ×90.0 | 8.622 ns | ×1.9 |
inv(AA) | 1.080 μs | 1.541 μs | ×1.4 | 1.085 μs | ×1.0 |
inv(SS) | 422.749 ns | 903.222 ns | ×2.1 | 414.825 ns | ×0.98 |
The benchmarks are generated by runbenchmarks.jl on the following system:
julia> versioninfo()
Julia Version 1.12.6
Commit 15346901f00 (2026-04-09 19:20 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 4 × AMD EPYC 9V74 80-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)