Automatic differentiation

Automatic differentiation is provided by the callable operator .

For a function f and arguments args..., ∂{N}(f, args...) computes the Nth-order partial derivative of f with respect to args.... ∂(f, args...) is equivalent to ∂{1}(f, args...).

The basic usage is:

  • ∂{N}(f, args...) returns only the highest-order derivative.
  • ∂{N}(f, args..., :all) returns all derivatives up to order N, together with the function value.

When pseudo keyword :all is given, the return value is ordered from higher to lower order, ending with the function value:

(∂{N}(f, args...), ..., ∂{2}(f, args...), ∂(f, args...), f(args...))

For multiple inputs, mixed partial derivatives are grouped by input blocks. If f returns a tuple, each component is differentiated separately.

Warning

The user must provide the appropriate tensor symmetry information; otherwise, automatic differentiation may not preserve the expected tensor symmetry. In the following example, even with identical tensor values, the results differ depending on the Tensor type.

julia> A = rand(Mat{3,3})3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}:
 0.325977  0.894245  0.953125
 0.549051  0.353112  0.795547
 0.218587  0.394255  0.49425
julia> S = A * A' # symmetric in value, but not typed as `SymmetricSecondOrderTensor`3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}: 1.81438 1.253 0.894897 1.253 1.05904 0.65243 0.894897 0.65243 0.4475
julia> ∂(identity, S) ≈ one(FourthOrderTensor{3})true
julia> ∂(identity, symmetric(S)) ≈ one(SymmetricFourthOrderTensor{3})true
Tensorial.∂Type
∂{N}

An operator representing the Nth-order partial derivative.

∂{N} is callable. Applying it to a function f and its arguments computes Nth-order partial derivatives of f by automatic differentiation. ∂(f, args...) is equivalent to ∂{1}(f, args...).

If pseudo keyword :all is given as the last argument, derivatives of all orders up to N are returned together with the function value. For example, ∂{N}(f, x, :all) returns (∂{N}(f,x), ..., ∂{2}(f,x), ∂{1}(f,x), f(x)).

source

Basic examples

We begin with the simplest case: a scalar-valued function of a single scalar variable.

julia> x = 2.02.0
julia> ∂(x -> x^3, x)12.0
julia> ∂(x -> x^3, x, :all)(12.0, 8.0)

Here, ∂(x -> x^3, x) returns only the first derivative, while ∂(x -> x^3, x, :all) returns

(∂f/∂x, f(x))

Higher-order derivatives are obtained by specifying the order in braces.

julia> ∂{2}(x -> x^3, x)12.0
julia> ∂{2}(x -> x^3, x, :all)(12.0, 12.0, 8.0)

In this case,

∂{2}(x -> x^3, x, :all)

returns

(∂²f/∂x², ∂f/∂x, f(x))

The same interface also works for tensor inputs.

julia> a = rand(Vec{2})2-element Vec{2, Float64}:
 0.7484150218874741
 0.5782319465613976
julia> ∂(norm, a)2-element Vec{2, Float64}: 0.7913304024086155 0.6113887423103394
julia> ∂(norm, a, :all)([0.7913304024086155, 0.6113887423103394], 0.9457680630107)
julia> ∂{2}(norm, a)2×2 SymmetricSecondOrderTensor{2, Float64, 3}: 0.39523 -0.511553 -0.511553 0.662111
julia> ∂{2}(norm, a, :all)([0.3952302988894536 -0.5115530100904504; -0.5115530100904504 0.6621113888988412], [0.7913304024086156, 0.6113887423103395], 0.9457680630106999)

Multiple inputs

For multiple inputs, the first-order derivative is returned as a tuple whose entries follow the order of the inputs.

julia> ∂((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0)(16.0, 14.0)
julia> ∂((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0, :all)((16.0, 14.0), 44.0)

The first result is interpreted as

(∂f/∂x, ∂f/∂y)

and the second as

((∂f/∂x, ∂f/∂y), f(x, y))

Second-order derivatives for multiple inputs are returned as a block Hessian.

julia> ∂{2}((x, y) -> x^2 + x*y + y^3, 2.0, 3.0)((2.0, 1.0), (1.0, 18.0))
julia> ∂{2}((x, y) -> x^2 + x*y + y^3, 2.0, 3.0, :all)(((2.0, 1.0), (1.0, 18.0)), (7.0, 29.0), 37.0)

The first result is interpreted as

(
    (∂²f/∂x², ∂²f/∂x∂y),
    (∂²f/∂y∂x, ∂²f/∂y²),
)

and the second as

(
    (
        (∂²f/∂x², ∂²f/∂x∂y),
        (∂²f/∂y∂x, ∂²f/∂y²),
    ),
    (∂f/∂x, ∂f/∂y),
    f(x, y),
)

The same block structure is used even when the input types differ.

julia> x = 2.02.0
julia> A = rand(SymmetricSecondOrderTensor{2})2×2 SymmetricSecondOrderTensor{2, Float64, 3}: 0.727935 0.00744801 0.00744801 0.199377
julia> ∂((x, A) -> x * tr(A), x, A)(0.9273116153257611, [2.0 0.0; 0.0 2.0])
julia> ∂((x, A) -> x * tr(A), x, A, :all)((0.9273116153257611, [2.0 0.0; 0.0 2.0]), 1.8546232306515222)
julia> ∂{2}((x, A) -> x * tr(A), x, A)((0.0, [1.0 0.0; 0.0 1.0]), ([1.0 0.0; 0.0 1.0], [0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0;;;; 0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0]))
julia> ∂{2}((x, A) -> x * tr(A), x, A, :all)(((0.0, [1.0 0.0; 0.0 1.0]), ([1.0 0.0; 0.0 1.0], [0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0;;;; 0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0])), (0.9273116153257611, [2.0 0.0; 0.0 2.0]), 1.8546232306515222)

Multiple outputs

If f returns a tuple, each component is differentiated separately. The outer tuple follows the outputs.

julia> ∂(x -> (x^2, x^3), 2.0)(4.0, 12.0)
julia> ∂(x -> (x^2, x^3), 2.0, :all)((4.0, 12.0), (4.0, 8.0))

The first result is interpreted as

(∂f₁/∂x, ∂f₂/∂x)

and the second as

((∂f₁/∂x, ∂f₂/∂x), (f₁(x), f₂(x)))

Second-order derivatives are handled in the same way.

julia> ∂{2}(x -> (x^2, x^3), 2.0)(2.0, 12.0)
julia> ∂{2}(x -> (x^2, x^3), 2.0, :all)((2.0, 12.0), (4.0, 12.0), (4.0, 8.0))

The first result is interpreted as

(∂²f₁/∂x², ∂²f₂/∂x²)

and the second as

(
    (∂²f₁/∂x², ∂²f₂/∂x²),
    (∂f₁/∂x, ∂f₂/∂x),
    (f₁(x), f₂(x)),
)

Multiple inputs and multiple outputs

When there are both multiple inputs and multiple outputs, the outer tuple follows the outputs, and the inner tuple follows the inputs.

julia> ∂((x, y) -> (x + y, x * y), 2.0, 3.0)((1.0, 1.0), (3.0, 2.0))
julia> ∂((x, y) -> (x + y, x * y), 2.0, 3.0, :all)(((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))

The first result is interpreted as

(
    (∂f₁/∂x, ∂f₁/∂y),
    (∂f₂/∂x, ∂f₂/∂y),
)

and the second as

(
    (
        (∂f₁/∂x, ∂f₁/∂y),
        (∂f₂/∂x, ∂f₂/∂y),
    ),
    (f₁(x, y), f₂(x, y)),
)

For second-order derivatives, each output carries its own block Hessian.

julia> ∂{2}((x, y) -> (x + y, x * y), 2.0, 3.0)(((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0)))
julia> ∂{2}((x, y) -> (x + y, x * y), 2.0, 3.0, :all)((((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0))), ((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))

The first result is interpreted as

(
    (
        (∂²f₁/∂x², ∂²f₁/∂x∂y),
        (∂²f₁/∂y∂x, ∂²f₁/∂y²),
    ),
    (
        (∂²f₂/∂x², ∂²f₂/∂x∂y),
        (∂²f₂/∂y∂x, ∂²f₂/∂y²),
    ),
)

and the second as

(
    (
        (
            (∂²f₁/∂x², ∂²f₁/∂x∂y),
            (∂²f₁/∂y∂x, ∂²f₁/∂y²),
        ),
        (
            (∂²f₂/∂x², ∂²f₂/∂x∂y),
            (∂²f₂/∂y∂x, ∂²f₂/∂y²),
        ),
    ),
    (
        (∂f₁/∂x, ∂f₁/∂y),
        (∂f₂/∂x, ∂f₂/∂y),
    ),
    (f₁(x, y), f₂(x, y)),
)

Aliases

gradient and hessian are aliases for first- and second-order partial derivatives: