Skip to content
This repository has been archived by the owner on Nov 4, 2024. It is now read-only.

Commit

Permalink
docs: add docs for loading packages
Browse files Browse the repository at this point in the history
  • Loading branch information
avik-pal committed Oct 17, 2024
1 parent c75af09 commit c8e5006
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/api/activation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ generic implementation.
This function doesn't replace `σ` with `NNlib.fast_act(σ, ...)`, that needs to be
done by the user if needed.
!!! tip
!!! tip "Load `SLEEFPirates.jl` to get faster activations"
Certain activation functions are replaced with specialized implementations from
[SLEEFPirates.jl](https://github.com/JuliaSIMD/SLEEFPirates.jl) for FP32. This might
Expand Down
5 changes: 5 additions & 0 deletions src/api/batched_mul.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
Computes the batched matrix multiplication of `x` and `y`. For more details see the NNlib
documentation on `NNlib.batched_mul`. This function is mostly a wrapper around `batched_mul`
but attempts to be faster on CPUs.
!!! tip "Load `LoopVectorization.jl` to get faster batched matrix multiplication"
On CPUs loading LoopVectorization adds faster implementations of batched matrix
multiplication.
"""
function batched_matmul(x::AbstractMatrix, y::AbstractArray{yT, 3}) where {yT}
return batched_matmul(expand_batchdim(x), y)
Expand Down
5 changes: 5 additions & 0 deletions src/api/dense.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ multiple operations.
- For small CPU Arrays, we use LoopVectorization.jl. On `x86_64` we use Octavian for
medium sized matrices. This is overridden if special BLAS implementations are loaded
(currently `MKL`, `AppleAccelerate`, and `BLISBLAS`).
!!! tip "Load `Octavian.jl`
Loading `Octavian.jl` enables a polyalgorithm that uses different backends based on the
input sizes.
"""
function fused_dense_bias_activation::F, weight::AbstractMatrix, x::AbstractMatrix,
b::Optional{<:AbstractVector}) where {F}
Expand Down

0 comments on commit c8e5006

Please sign in to comment.