Skip to content

Apache TVM v0.18.0

Latest
Compare
Choose a tag to compare
@ysh329 ysh329 released this 17 Oct 15:36
· 65 commits to main since this release

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress):

  • Frontend: PyTorch's ExportedProgram is supported in the relax frontend ( #17346)
  • Community, RFCs
  • AOT, Hexagon, OpenCL & CLML, Web, Metal
  • Relax, Dlight, Disco
  • TIR, TVMScript
  • Docs, Docker, CI, Misc, BugFix

Please visit the full listing of commits for a complete view: v0.18.dev0...v0.18.0.rc0.

Community

  • #17450 - update contributors

RFCs

The new RFC introduces a new backend Android Neural Network API (NNAPI) for BYOC. It is a graph-level neural network inference API provided by the Android runtime. Prior to this RFC, TVM on Android mobile devices mainly relies on OpenCL for GPU acceleration. This RFC aims to add a new codegen and a runtime via the BYOC framework, which enables execution on custom accelerators from SoC vendors on mobile devices.

  • #109 - [RFC] NNAPI Integration via BYOC

BYOC

  • #17385 - [NNAPI] Add NNAPI backend for BYOC

BugFix

  • #17440 - [TIR][Schedule] TileWithTensorIntrin skip ComputeInline if bu…
  • #17419 - [FFI]Grab GIL when check env signals
  • #17403 - [Fix][LLVM] Fix getHostCPUFeatures LLVM version cutoff
  • #17383 - [ONNX] Skip constant If node generated by PyTorch
  • #17360 - [FIX] fix bug when normalize iter with different lower bounds
  • #17148 - [Relax] Preserve existing DataflowBlock in ConvertToDataflow
  • #17345 - [Fix][Relax] Add the missing tree-attn func arg for KV cache creation
  • #17073 - [Relax]FCallPacked not checked in CodegenVMTIR
  • #17315 - [MSC]Bugfix for strided_slice op
  • #17335 - [Relax][PyTorch][Fix] use_convert_torch_tensor_to_relax() where possible
  • #17330 - [Relax][PyTorch]Update layer_norm converter to support immutable_list for normalized_shape
  • #17324 - [Fix] Remove tvm. prefix from image name when ./docker/build.sh
  • #17308 - [TVM4J]Fix unhandled return type in JNI
  • #17307 - [Fix][TIR] LowerThreadAllreduce warp reduction mask
  • #17312 - [Relax]Infer TIR values from shapes inside a tuple
  • #17292 - [Relax]Support torch.unbind op and fix bugs for expand && split
  • #17263 - [Relax]Preserve dtype in ToMixedPrecision for kNever ops
  • #17229 - [Cutlass] fix cutlass instantiate attention template bugs
  • #17121 - [Relax]Fix a bug about the IR construction in test file
  • #17142 - Allow import of TVM when current directory is read-only

CI

  • #17444 - [Docs] Upgrade Sphinx
  • #17425 - Upgrade CI to Python 3.9
  • #17410 - Upgrade unity image tag to 20240917-153130-9f281758
  • #17409 - [Windows] Workaround for error in FindLLVM
  • #17397 - Update image tag to 20240917-153130-9f281758
  • #17338 - Upgrade PyTorch to 2.4.1
  • #17337 - Disable NNPACK build and fix error on Android SDK installaion
  • #17355 - Upgrade github upload-artifact action
  • #17334 - [Hexagon] Forward gtest tests into pytest as separate tests
  • #17271 - Resolve CI compilation failures on MacOSX
  • #17221 - Reduce logging level when checking if docker image exists
  • #17206 - Update dummy-variable regex for pylint
  • #17117 - [CLML]Fix for few clml regression issues
  • #17155 - Remove lint step from unity/pr-head step

Disco

  • #17398 - Enable float8 data type in disco
  • #17275 - Fix double free of nccl communicator
  • #17264 - Disable splitting nccl communicator in single-group
  • #17182 - Implement SocketSession
  • #17191 - Cross-group and p2p send/receive primitives
  • #17180 - Group-wise operation

Dlight

  • #17430 - [GPU] Improve matmul schedule for adreno
  • #17363 - Fix Matmul rule for Conv3D
  • #17259 - [ADRENO] Fix for opencl adreno matmul schedule
  • #17187 - [GPU] Add OpenCL dequant matmul schedule

Docker

  • #17433 - [CI] Add NNEF dependency to CI images

Docs

  • #17436 - [Relax][PyTorch]Use torch.export insteamd of fx.symbolic_trace for tutorial
  • #17402 - [Doc] Update Architecture Overview
  • #17382 - More clarity on security model of RPC server
  • #17380 - [Doc] Relax Deep Dive
  • #17377 - Update document to include security model of RPC server
  • #17378 - Link to project-specific security page
  • #17352 - TVM pip Installation fix
  • #17343 - Minor fix typo in developer howto guide
  • #17328 - [Doc] Deep Dive TensorIR
  • #17327 - [Doc] How to Optimize a Language Model
  • #17320 - [Doc] Customize Optimization
  • #17319 - [Doc] Fix doc build error in e2e_opt_model.py
  • #17306 - [Doc] Refactor How-To
  • #17296 - [Doc] Overview
  • #17298 - [Doc] IRModule
  • #17286 - Introduce Relax API and move legacy part to standalone page
  • #17289 - [Doc] Quick Start
  • #17287 - [Doc] Refactor install docs

Frontend

  • #17431 - [Relax][Onnx] Add support for pad-2
  • #17447 - [ONNX] Move relax related tests to the correct file
  • #17427 - [Relax][ONNX] Expand op support for ONNX frontend
  • #17429 - [Relax][PyTorch] Support tensor manipulation and creation ops for ExportedProgram importer
  • #17426 - [Relax][PyTorch] Support neural network ops for ExportedProgram importer
  • #17424 - [Relax][PyTorch] Support binary, statistical and search ops for ExportedProgram importer
  • #17421 - [Relax][PyTorch] Support more unary ops for ExportedProgram importer
  • #17396 - [Relax][PyTorch] Add support for torch.export.ExportedProgram in Relax PyTorch Frontend
  • #17379 - [Relax][PyTorch] Fix output shape of torch.nn.functional.scaled_dot_product_attention
  • #17376 - [Relax][PyTorch] Cleanup Tensor Manipulation and Creation op converters
  • #17372 - [Relax][PyTorch] Cleanup Statistical, Search and DataType op converters
  • #17369 - [Relax][PyTorch] Cleanup Neural Network op converters
  • #17366 - [Relax][PyTorch] Cleanup binary op converters
  • #17356 - [Relax][PyTorch] Cleanup unary op converters
  • #17350 - [Relax][Onnx] fix params name bug in onnx frontend
  • #17342 - [Relax][PyTorch] Add support for torch.ops.aten.sym_size.int
  • #17300 - [Relax][PyTorch] Add support for torchvision.ops.stochastic_depth
  • #17325 - [Relax][PyTorch] Add support for torch.nn.functional.conv*
  • #17309 - [Relax][Onnx] fix expand bug in onnx frontend
  • #17304 - [Relax][PyTorch] Add support for torch.repeat
  • #17291 - [Relax][PyTorch] Add support for torch.tile
  • #17277 - [Relay][Pytorch] Add support for aten::tile
  • #17228 - [Unity]Add Sqrt Op
  • #17189 - [Relax][PyTorch] Add support for torch.nn.functional.max_pool2d
  • #17186 - [Relax][PyTorch] Add support for torch.einsum
  • #17184 - [Relax][PyTorch] Add support for torch.permute
  • #17167 - [Relax] [ONNX] Add support for Sign and Not

Hexagon

  • #17204 - Fix LWP assembly handler (predicate register)
  • #17169 - [CMake] Fix v66 build issue
  • #17162 - Support RPC execution of existing shared lib

LLVM

  • #17347 - [RUNTIME] Fix RISC-V CodeModel propagation to ORCJIT runtime executor
  • #17199 - Fix for getHostCPUFeatures API change

MetaSchedule

  • #17166 - Replace xgboost.rabit with xgboost.collective because it's deprecated
  • #17171 - Add a testcase for padded conv2d in meta_schedule

OpenCL & CLML

  • #17273 - [CODEGEN][OPENCL] Fix opencl codegen for few ops

ROCm

  • #17295 - Fix non-standard rocm path
  • #17290 - hipBLAS integration
  • #17256 - Support ROCm 6

Relax

  • #17449 - Add scatter_nd op support
  • #17453 - Add NonZero op
  • #17448 - Support left_shift and right_shift op
  • #17432 - [KVCACHE] Improved schedule for prefill attention
  • #17428 - Introduce static shape tuning pipeline
  • #17401 - [KVCache] Attention func accepting over-padded qkv and output NDArray
  • #17331 - Validate StructInfo annotations in well-formed check
  • #17368 - [Transform] Add SelectNode handling in SymbolicMatcher
  • #17353 - Fix BYOC removing existing ext mods
  • #17359 - Add new NN allgather operator
  • #17362 - [KV Cache] Refactor _attention_sequence_prefill function to …
  • #17332 - Validate StructInfo of variable bindings
  • #17354 - Fix inline source module cause path too long error
  • #17213 - Refactor RealizeVDevice to remove in-place mutation
  • #17253 - [Transform] Handle tuple return in RemoveUnusedOutputs
  • #17285 - Require correct input/output shapes R.call_tir
  • #17202 - Update GlobalVar name in AttachGlobalSymbol
  • #17218 - Allow dynamic shape argument to R.reshape
  • #17326 - [KVCache] Add tree attention with paged cache support
  • #17314 - [Transform] Compose preproc functions in LiftTransformParams
  • #17313 - Identify tuple unpack/repack in CanonicalizeBindings
  • #17305 - [Python]Rotary positional embedding scaling
  • #17243 - Avoid wrapping TupleStructInfo into a Tuple for R.call_tir
  • #17224 - [Analysis] Handle recursive functions in CollectVarUsage
  • #17280 - [KVCache] Increase coalesce threshold
  • #17261 - Add KVCache Interface for Relax NNModule
  • #17145 - Implement R.ensure_zero_offset and update memory planning for R.view
  • #17242 - Remove segfault in R.call_tir_inplace validation
  • #17234 - FuseTransposeMatmul Pass
  • #17226 - Fix segfault in rewrite_bindings for MatchCast node
  • #17220 - Handle presence of R.call_tir in MergeCompositeFunctions
  • #17201 - [Transform]Handle is_group argument in IPC AllReduce
  • #17198 - Disable fusion for fetching from the packed params in FuseOps
  • #17149 - Implement Rewriter class for pattern-rewrite
  • #17192 - [KVCache] Partial layers support
  • #17157 - Integrate cuDNN attention
  • #17160 - Fix fuseOps via pattern

Relay

  • #17339 - [qnn]: Fix qnn.avg_pool2d layout inference
  • #17177 - [FQ2I]: Use appropriate dtype while quantizing relay.op.nn.pad…

Runtime

  • #17407 - Add property Module.is_device_module
  • #17294 - Support KV cache with RoPE extension factor array
  • #17240 - [FFI]Use TVMValue::v_int64 to represent boolean values
  • #17252 - Revert "[FFI]Introduce runtime boxed types for int/float/bool"
  • #16183 - [FFI]Introduce runtime boxed types for int/float/bool
  • #17237 - Reorganize PagedKVCache attn kernel invocation
  • #17227 - Allow aborting fetchWithCache through AbortSignal
  • #17208 - Allow aborting fetchNDArray through AbortSignal

TIR

  • #17443 - Add is_vector Method to DataType class and update usages across Codebase
  • #17411 - [NarrowDataType] Bufferload's index should not inherit bits constraint of value
  • #17219 - Validate tir::Buffer axis_separators on construction
  • #17158 - [Analyzer] Simplify x==x expressions for all dtypes

TOPI

  • #17274 - [ADRENO] Add Group Conv2d texture schedule

TVMScript

  • #17435 - Enable T.macro decorateing class method
  • #17434 - [TIR] Add source kernel intetration via call_kernel
  • #17395 - [TIR, TVMScript] Add TIR - Triton integration
  • #17131 - [Relax] Allow return statement in DataflowBlock
  • #17373 - Avoid segfault from invalid TVMScript

cuda & cutlass & tensorrt

  • #17408 - [CUTLASS] Add FP8 gemm kernels

web

  • #17420 - Allow deprecated API requestAdapterInfo with any cast
  • #17404 - [WASM] Implement concat embeddings
  • #17251 - Add TVMArgBool to ArgTypeCode

Misc

  • #17457 - Try to fix windows CI conda build issue
  • #17415 - [NVSHMEM] Enable nvshmem memory allocation
  • #17422 - [CMake] Add NCCL/RCCL header directory to include path
  • #17405 - [TVMjs] Modify web package description
  • #17400 - [3rdparty] Bump FlashInfer for tmp workspace reduction
  • #17394 - [MSC] Support concat with constant inputs
  • #17351 - [MSC][Refactor] Support dynamic shape
  • #17371 - [WEBGPU] Update runtime to remove deprecated API
  • #17361 - [IR] Expose ReplaceGlobalVars utility in the Python API
  • #17358 - Update tvmc_command_line_driver.py, modify the sentence, remove the duplicate "as"
  • #17344 - [MSC] Reconstruct tensorrt module
  • #17297 - [Apps] Remove mxnet dependency from /apps/android_camera/models
  • #17299 - [Apps] Remove mxnet dependency from /apps/ios_rpc
  • #17293 - [Rust] Remove mxnet dependency and re-enable rust example
  • #17321 - [Target] Refine equality check on TargetKind instances
  • #17317 - Add NVSHMEM support
  • #17301 - [TE][CreatePrimFunc] Fix create reduce block with spatial iter dependent init value
  • #17284 - [Support] Fix the Read/Write of socket stream
  • #17302 - [Codegen][WebGPU] LetNode common subexpr override
  • #17246 - [Cleanup] Remove using namespace tvm::runtime from headers
  • #17278 - [Codegen] Emit tir::Let as var assignment explicitly
  • #17260 - [WINDOWS] Compiler options for non x86 targets
  • #17249 - [IR] Handle NaN in StructuralEqual and StructuralHash
  • #17257 - [FFI] Re-introduce the boxed primitive values
  • #17265 - [CompileBugfix][contrib] meet 'base64.h: No such file or directory' and '‘tvm::runtime::vm::AllocatorType’ has not been declared' while compiling
  • #17214 - Replacing unary ops with LookUpTable and Take op to improve performance
  • #17250 - [WebGPU] Fix unexpected device lost error when intentional dispose
  • #17236 - [3rdparty] Bump FlashInfer
  • #17233 - [Runtime Patch] Add AbortSignal to fetchWithCache in ArtifactCacheTemplate interface
  • #17190 - [Cython][FFI] Fix crash when call del operator for handle
  • #17170 - Pass to eliminate redundant branch and overcompute
  • #17185 - Remove and replace deprecated distutils.util.strtobool()
  • #17188 - Add packaging to python/gen_requirements.py
  • #17181 - [FFI] Add python signal handler for ctypes FFI
  • #17173 - Use packaging.version.parse instead of distutils.version.LooseVersion
  • #17174 - [TVMJS] Check DataType.NUMPY2STR when saving array
  • #17168 - [Meta Schedule][XGBoost] enable custom callback func test with xgboost>=1.6.0