Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVIDIA GPU SPMD] Add runtime support to run windowed einsum in multiple streams #3

Closed
wants to merge 2,324 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
2324 commits
Select commit Hold shift + click to select a range
836c164
Consolidate TilingScheme functions.
jreiffers Jan 12, 2024
94a6cfb
[XLA:GPU][NFC] Add a util to derive an IndexingMapSimplifier from an …
bchetioui Jan 12, 2024
09062c1
Added an api_version field to PJRT_Gpu_Register_Custom_Call*
superbobry Jan 12, 2024
e527933
Fix broken build.
dimitar-asenov Jan 12, 2024
e7d9d99
[xla:gpu][NFC] Remove unnecessary static helper for BuildConstantInit…
tyb0807 Jan 12, 2024
f36492e
[xla:gpu] Handle HLO-only case in BuildConstantInitializerThunk
tyb0807 Jan 12, 2024
6518a7d
Import openai/triton from GitHub.
Moerafaat Jan 12, 2024
5ec5b7c
PR #7843: [GPU] NCCL User Buffer registration (1/3)
trevor-m Jan 12, 2024
b41c906
Consider `scratch_bytes` for top autotuning results whose times are w…
dimitar-asenov Jan 12, 2024
63a9e97
[XLA] Require that allocations use the same offset for their aliased …
berkinilbeyi Jan 12, 2024
27efd18
[xla:gpu] Add support for HLO-only BuildInitializerThunk
tyb0807 Jan 12, 2024
d9e1cb3
Accelerate and optimize MergeShardingIfCompatible.
ZixuanJiang Jan 12, 2024
5850c1c
Roll back previous change "Cleans up wrong callers of FindNonTrivialH…
haoyuz Jan 12, 2024
9696bb9
[XLA] Purge AsyncOpCanonicalizer
vsytch Jan 12, 2024
d3eda6f
[xla:gpu] Clean up NCCL utils library and extract a separate nccl_cli…
ezhulenev Jan 12, 2024
cf54f99
Remove multi_platform_manager_listener module initializer sequencing.…
klucke Jan 12, 2024
5092a84
[xla:gpu] Support adding cu_threefry2x32 custom kernel to command buffer
anlunx Jan 12, 2024
1bb2a74
[xla:gpu] Add support for emitting SelectAndScatter from HLO
tyb0807 Jan 12, 2024
93e8599
Reverts 093cd67c14599699b7971f7c8ccda838627f37ee
tyb0807 Jan 12, 2024
4132f14
[xla:gpu] Support emitting ConditionalThunk for command buffers
tyb0807 Jan 12, 2024
4ee8bd4
Migrate wasm32 constraint to emscripten
tensorflower-gardener Jan 12, 2024
97a4eed
Replace `std::vector<int64_t>` with `DimensionVector` in `hlo_shardin…
ZixuanJiang Jan 12, 2024
ce1e5b6
[xla:gpu] Add support for scheduling conditional into a command buffer
tyb0807 Jan 12, 2024
e9502be
[xla:gpu] Add GpuExecutable post-initialization rendezvous to avoid d…
ezhulenev Jan 12, 2024
ac46dcd
[XLA] Remove slice to dynamic custom call with pad to static operand
blakehechtman Jan 12, 2024
1e5f535
Introduce some macros to define XLA errors.
klucke Jan 13, 2024
b1f2f98
Increase maxZoom factor for hlo graph explorer
zzzaries Jan 13, 2024
e69cbb2
Reverts 93e8599d9a3c5c70e89e74eae2bccc41e7ba4d19
tensorflower-gardener Jan 13, 2024
a83c6e1
[xla:gpu][NFC] AppendCommands in thunk kind alphabetical order
tyb0807 Jan 13, 2024
c7d4559
Improves logging/debuggability in the hlo verifier.
tensorflower-gardener Jan 13, 2024
719f609
Reverts 5092a84d17bc9ef9e455ff893e2509601e3b071c
haoyuz Jan 13, 2024
5a32eee
Use xla::Internal rather than the duplicate xla::InternalError to mat…
klucke Jan 13, 2024
78ca99b
Move deps to OSS-only section.
DrMarcII Jan 13, 2024
9d95710
[XLA] We only require statically bound portions of shapes to match
majnemer Jan 14, 2024
55c1f87
[TileAnalysis] Move IndexingMap together with IndexingMapSimplifier.
pifon2a Jan 14, 2024
5b71830
[TileAnalysis][NFC] Clean-up binary affine exprs in indexing_map_test.cc
pifon2a Jan 14, 2024
fbd57d0
[XLA:GPU] Simplify SymbolicTile to store offset/size/stride affine ma…
bchetioui Jan 14, 2024
cbba4c3
Fix a bug due to lazy initailization in debugging model.
tensorflower-gardener Jan 15, 2024
6155be2
Fix container-overflow in MHLO-to-LHLO
tensorflower-gardener Jan 15, 2024
fe38900
Reland PR #55394: tanh float32 over 1.
akuegel Jan 15, 2024
eb199ff
PR #8313: [XLA:GPU] add flash attention dropout in XLA runtime
Cjkkkk Jan 15, 2024
7c262e7
[TileAnalysis] Add a custom printer for indexing maps.
pifon2a Jan 15, 2024
9babbb6
PR #6872: [XLA:GPU] add cuDNN flash attention support in XLA (3rd PR …
Cjkkkk Jan 15, 2024
2d3b9e0
PR #8073: ReplaceInstructionWithDifferentShape should return false if…
apivovarov Jan 15, 2024
3f5c73e
PR #8452: Move stream attributes from hlo.proto to gpu backend config
Tixxx Jan 15, 2024
cfc12d5
Improve exhaustive test for tanh.
akuegel Jan 15, 2024
fc3c62f
Integrate LLVM at llvm/llvm-project@c230138011cb
tensorflower-gardener Jan 15, 2024
dbf5b30
[xla:gpu][NFC] Rename command buffer options
tyb0807 Jan 15, 2024
de7c152
Fix test to not require C++ 20.
dimitar-asenov Jan 15, 2024
0b543c4
[NCCL] Persistent allocators usage.
tensorflower-gardener Jan 15, 2024
e3bf20b
[xla:gpu] CheckImplementable should not returrn prematurely on error
tyb0807 Jan 15, 2024
b8f07cf
PR #7940: LLVM codegen changes for SPIR backend
ShengYang1 Jan 15, 2024
404d544
Refactor NVPTX Compilation Caching
tensorflower-gardener Jan 15, 2024
e230a0f
For XLA:CPU still use the range [-9, 9] for Tanh clamping.
akuegel Jan 15, 2024
0eb366b
[xla:gpu] Remove impl namespace in nccl_all_to_all_thunk.cc
tyb0807 Jan 15, 2024
975a85a
[xla:gpu] Unify IsSyncCollective implementation
tyb0807 Jan 15, 2024
7c41fdd
[TileAnalysis] Don't fail to compute indexing for fusions with unknow…
pifon2a Jan 15, 2024
88610cd
[xla:gpu] Add NcclAllToAllThunk util functions for HLO
tyb0807 Jan 15, 2024
c3a069d
PR #8490: [ROCM] Fixing build brake 15/01/24
pemeliya Jan 15, 2024
40daa89
[XLA:GPU] Add a test for IndexingMapSimplifier's methods for sign ana…
bchetioui Jan 15, 2024
119f193
[xla:gpu] Replace std::string with NcclCliqueId type to pass clique ids
ezhulenev Jan 15, 2024
d9a034a
[xla:gpu] Add support for emitting AllToAllStart from HLO
tyb0807 Jan 15, 2024
6fd14bf
[xla:gpu] Add support for emitting AllToAllDone from HLO
tyb0807 Jan 15, 2024
e237575
Use xla::Internal rather than the duplicate xla::InternalError to mat…
klucke Jan 15, 2024
f32ee83
PR #8289: [XLA:CPU][Perf] Replace MatMul 2D BIAS with binary_add
rfsaliev Jan 15, 2024
ea6abf7
Update dlpack, in preparation for supporting newer dlpack features.
hawkinsp Jan 16, 2024
ff3f5a2
[TileAnalysis] Do not use c++20 default equality comparison operator.
pifon2a Jan 16, 2024
8842c58
[XLA:CPU] Ensure that tanh never returns values outside [-1, 1].
akuegel Jan 16, 2024
b0bc40d
Integrate LLVM at llvm/llvm-project@baba0a4cb431
tensorflower-gardener Jan 16, 2024
65916f0
PR #8332: Store device-side time in ExecutionProfile and use it for a…
olupton Jan 16, 2024
d632353
[XLA] Fix copy insertion with region analysis for multi-output instru…
olegshyshkov Jan 16, 2024
bfae997
[xla:gpu] Add profiling annotations for thunks executed as command bu…
ezhulenev Jan 16, 2024
4aa3024
[Cleanup] Combine RemoveUseShardings and SaveUserShardings, and add m…
tensorflower-gardener Jan 16, 2024
dfae0a2
Use xla::Internal rather than the duplicate xla::InternalError to mat…
klucke Jan 16, 2024
cd1a952
[XLA][NFC] Tidy up debug logs.
olegshyshkov Jan 16, 2024
db2ac36
[XLA] Remove CHECK from HloSchedule::remove_computation.
olegshyshkov Jan 16, 2024
b14eb93
[XLA] Add new HLO matcher for control dependencies
jurahul Jan 16, 2024
1bacf1c
Fix a issue in `IsSubTilingOrEqualSharding` and add more test cases f…
ZixuanJiang Jan 16, 2024
5834c40
[XLA:GPU] Move HloOpProfile-related logic into a class.
olegshyshkov Jan 16, 2024
82e04a8
PR #8429: [ROCM] adding occupancy calculation & renamed to rocm_executor
pemeliya Jan 16, 2024
793b8f3
Fix a bug in ShapeLegalizeToHLO
tensorflower-gardener Jan 16, 2024
34142e1
Use xla::Internal rather than the duplicate xla::InternalError to mat…
klucke Jan 16, 2024
5780c78
Add license files back, fix some missing files on mac/win for wheel v2
tensorflower-gardener Jan 16, 2024
e125c9f
[xla:gpu] Start moving all NCCL API users to NcclApi + stub
ezhulenev Jan 16, 2024
d9e90a0
Integrate LLVM at llvm/llvm-project@076eb4c79ec7
tensorflower-gardener Jan 16, 2024
b1253fa
Add a shortcut in IsSubTilingOrEqualSharding to accelerate the function.
ZixuanJiang Jan 16, 2024
710d1d4
[xla:gpu] Move NCCL communicator initialization into NcclApi
ezhulenev Jan 16, 2024
5250a06
[xla:pjrt] Do not include nccl.h directly and use NcclApi instead
ezhulenev Jan 16, 2024
5d2a56d
[XLA] Remove dead field HloInstruction::operation_queue_id_ after htt…
cezheng Jan 16, 2024
ddf0296
[xla:gpu] Support adding cu_threefry2x32 custom kernel to command buffer
anlunx Jan 16, 2024
e4fc329
PR #8306: Add instructions to generate `compile_commands.json`
andportnoy Jan 16, 2024
60b559e
Add `.tf_configure.bazelrc` to XLA's .gitignore
ddunl Jan 17, 2024
6308ac6
Integrate LLVM at llvm/llvm-project@fdbf255c96cb
tensorflower-gardener Jan 17, 2024
971ccc3
[xla:gpu] NFC: Remove direct uses of nccl group operations
ezhulenev Jan 17, 2024
9ef9476
[xla:gpu] Remove direct uses of ncclCommCount
ezhulenev Jan 17, 2024
2c4a90d
[xla:gpu] Do not use ncclAllGather directly and use NcclApi
ezhulenev Jan 17, 2024
e8555d7
Make :nccl_id_store dependency conditional on cuda/rocm enabled
ezhulenev Jan 17, 2024
f6d2eea
[XLA] Introduce user-annotated host memory offloading.
SandSnip3r Jan 17, 2024
c8adbc4
[XLA:GPU] Turn off cuDNN fmha flag by default
tensorflower-gardener Jan 17, 2024
14a3eb8
[xla:gpu] Do not use ncclAllReduce directly and use NcclApi
ezhulenev Jan 17, 2024
87dd8fe
[xla:gpu] Do not use ncclReduceScatter directly and use NcclApi
ezhulenev Jan 17, 2024
f842a3e
[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi
ezhulenev Jan 17, 2024
588171c
[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…
ezhulenev Jan 17, 2024
48a80dd
[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…
ezhulenev Jan 17, 2024
47c7481
For some while loops, XLA can statically determine (an upper bound on…
tensorflower-gardener Jan 17, 2024
ee60d65
[xla:gpu] NFC: Remove unused ToNcclReduction from nccl_utils
ezhulenev Jan 17, 2024
8d3a1c7
[xla:gpu] NFC: Remove nccl_types and nccl_errors
ezhulenev Jan 17, 2024
72f2a7d
[XLA:GPU] Dump priority fusion proto only when the priority pass dump…
olegshyshkov Jan 17, 2024
fa8ce5f
[xla:gpu] Move persistent plan allocator into NcclApi
ezhulenev Jan 17, 2024
36f8a5a
PR #8033: Add dropped changes from [ROCm] Add command buffer tests
draganmladjenovic Jan 17, 2024
19dbd1b
Reland clean up wrong callers of FindNonTrivialHero.
akuegel Jan 17, 2024
1ecfc4f
PR #8402: [XLA:CPU] [oneDNN] Enable Dot op (MatMul) in BF16 Type
mahmoud-abuzaina Jan 17, 2024
569f1ca
Fix pybind11 linker version script in TSL and Tensorflow
tensorflower-gardener Jan 17, 2024
64c6dea
PR #8506: [ROCm] Move device libs path initialization into CompileToH…
draganmladjenovic Jan 17, 2024
8804c8a
Make NVPTXCompiler::CompileWithPtxAs a free function
tensorflower-gardener Jan 17, 2024
55d2e1b
Integrate LLVM at llvm/llvm-project@f3d534c4251b
tensorflower-gardener Jan 17, 2024
840df82
Disable sanitizers for some Triton-related tests
tensorflower-gardener Jan 17, 2024
f6a6f97
PR #8151: [mhlo] AllGather variadic operands support
apivovarov Jan 17, 2024
7909304
[XLA/GPU] Fix AllReduceBlueConnect to handle all-reduce with control …
jurahul Jan 17, 2024
19dc13a
Add support for bool dlpack values.
hawkinsp Jan 17, 2024
6645aa6
Change MSA memory-bound loop optimization to only do pinned allocatio…
tensorflower-gardener Jan 17, 2024
05abb09
Convert op fields using arrays to elements before serialization to av…
Jan 17, 2024
d06a70e
[xla:gpu] NFC: Remove nccl_utils
ezhulenev Jan 17, 2024
11a0282
Fix memory alignment bug
Jan 17, 2024
b147ce7
Mark `xla/tests:onednn_matmul_test` as "no_oss" due to timeouts on CI…
ddunl Jan 17, 2024
9708d90
Optimize PtrVec::empty() by using a unique representation of empty ve…
tensorflower-gardener Jan 17, 2024
c721368
Use xla::Internal rather than the duplicate xla::InternalError to mat…
klucke Jan 17, 2024
f471590
Use correct public stream_executor dependency.
klucke Jan 17, 2024
93994d7
[stream_executor] NFC: Split GpuCollectives from GpuDriver
ezhulenev Jan 17, 2024
55dcf4d
PR #62779: [XLA:TSL] add gil acquire when entering catch region
qingyunqu Jan 17, 2024
83c67ed
[xla:gpu] Move nccl buffer registration to NcclApi
ezhulenev Jan 17, 2024
93e496f
Make xla::ErrorStrCat error calls consistent, and include source loca…
klucke Jan 17, 2024
731ed6e
[XLA:Runtime] Cleaning dependencies from gpu_executable after removin…
tensorflower-gardener Jan 18, 2024
c6c1702
Use correct public stream_executor dependency within stream_executor …
klucke Jan 18, 2024
7e51810
Integrate StableHLO at openxla/stablehlo@9c8a1b7d
sdasgup3 Jan 18, 2024
8aeccaf
[xla:gpu] Pass buffer memory space to collective thunks directly inst…
ezhulenev Jan 18, 2024
4468b6e
Refactor unbounded dynamism unary op tests.
ghpvnist Jan 18, 2024
94c1c7a
Remove unused multi_platform_manager MODULE_INITIALIZER.
klucke Jan 18, 2024
8cdecfd
Stop using stream_executor_headers target in blas cc_library.
klucke Jan 18, 2024
6925139
Stop using stream_executor_headers in dnn cc_library.
klucke Jan 18, 2024
171d8a7
In GpuSanitizeConstantNames pass, register new names generated with t…
yifjiang Jan 18, 2024
90a45af
PR #8546: Fix ambiguity with explicit cast
elfringham Jan 18, 2024
c307281
[XLA:GPU] Improve the error message when autotuning is given no algor…
dimitar-asenov Jan 18, 2024
fdc3549
Avoid duplicate codegen if transpose hero has multiple root users.
akuegel Jan 18, 2024
706fdba
Enable sharding for gpu_fused_mha_test
tensorflower-gardener Jan 18, 2024
f685679
[XLA:cpu] allow overriding of MLIR pass pipeline via debug flags
ftynse Jan 18, 2024
f6471e7
Stop Triton debug info pass from triggering LLVM verifier
tensorflower-gardener Jan 18, 2024
da7781e
[XLA] [NFC] Unify and simplify LIT configuration
cheshire Jan 18, 2024
9415074
Integrate LLVM at llvm/llvm-project@3a82a1c3f6bd
tensorflower-gardener Jan 18, 2024
28d5549
[XLA:GPU] Consider the size of all spatial dimensions in the filter a…
dimitar-asenov Jan 18, 2024
d206ccb
Add missing patch files to workspace.bzl.
bchetioui Jan 18, 2024
8f06c2a
Fix a bug in ReplaceParameter interface in hlo_computation which prev…
tensorflower-gardener Jan 18, 2024
c1de3dc
Use correct public stream_executor dependency.
klucke Jan 18, 2024
d81fe8c
[XLA] Allow host_offloader to offload memory to the host even if one …
SandSnip3r Jan 18, 2024
ca53281
[xla:gpu][NFC] Move IsSliceWithUnitStrides to ir_emission_utils for r…
tyb0807 Jan 18, 2024
6cae7f8
[xla:gpu] NFC: Do not use ncclComm_t directly in collective thunks
ezhulenev Jan 18, 2024
603fa34
TargetConfig: Support all fields in the StreamExecutor constructor.
pizzud Jan 18, 2024
39a76ad
[xla:gpu][NFC] Move GetAllocationSliceForHlo to ir_emission_utils for…
tyb0807 Jan 18, 2024
e5b88b4
[xla:gpu] NFC: Rename mock_nccl_sleep_kernel to a more generic sleep_…
ezhulenev Jan 18, 2024
41357e2
[XLA] Fix HostMemoryTransferAsyncifier to create the legacy asyncrono…
SandSnip3r Jan 18, 2024
4346a17
[XLA] Don't assume all async wrapped ops are collectives
vsytch Jan 18, 2024
8b13620
[xla:gpu] Convert nccl_clique to a regular cc_library
ezhulenev Jan 18, 2024
3bfbbd8
Update configure.py to use clang by default
ddunl Jan 18, 2024
6509f60
Disable tests that fail in cross-compile setting for macOS x86
nitins17 Jan 18, 2024
23febb0
[XLA] Add new helper to copy control deps from inst to start/end pair.
jurahul Jan 18, 2024
16bfdd6
[XLA] Adapt HloComputation::CreateAsyncInstructions to also support c…
SandSnip3r Jan 18, 2024
a7754a2
Add unbounded dynamism tests for StableHLO supported unary ops.
ghpvnist Jan 18, 2024
c1bdcd3
Add checkpoint sharding metrics.
BlaziusMaximus Jan 18, 2024
dff777b
Change header to attribute copyright to the OpenXLA authors
ddunl Jan 18, 2024
e0eb857
[xla:gpu] NFC: Add details to error message
ezhulenev Jan 19, 2024
f7bfe22
Increase test timeouts for the cross-compile Mac build
nitins17 Jan 19, 2024
a385012
[xla:gpu] Remove old BF16 ifdef for CUDA < 11
ezhulenev Jan 19, 2024
b86a51e
Integrate LLVM at llvm/llvm-project@05e85e4fc5ac
tensorflower-gardener Jan 19, 2024
c2fa9bf
[xla:gpu] Pass NcclApi to all collective thunks
ezhulenev Jan 19, 2024
4af07d5
[stream_executor] NFC: Replace XLA_ENABLE_XCCL with a stream executor…
ezhulenev Jan 19, 2024
8a583ef
[xla:gpu] NFC: Remove almost all of XLA_ENABLE_XCCL from XLA
ezhulenev Jan 19, 2024
fd4c7fe
[XLA] hlo-opt tool: add option for generating hlo after RunBackend
anlunx Jan 19, 2024
72788f1
Remove column reduction vectorization heuristic.
jreiffers Jan 19, 2024
1707cb3
Allow intermediate ops with multiple users in epilogues.
akuegel Jan 19, 2024
b5fbd3e
Clean up dead code in reduction emitter.
jreiffers Jan 19, 2024
cc0422c
Remove shared memory usage check from vectorization logic.
jreiffers Jan 19, 2024
1622284
Avoid indirect parameter replacements in lit tests.
akuegel Jan 19, 2024
cdcbc1e
Add repr to PyDeviceList
yashk2810 Jan 19, 2024
7b18119
PR #8610: added AddCustomKernelReplacementPasses to amdgpu_compiler
zstreet87 Jan 19, 2024
f2f12ae
Fix tests that are too sensitive to naming.
jreiffers Jan 19, 2024
ebf916b
Fix ir_emitter_triton_test with priority fusion.
jreiffers Jan 19, 2024
e964d67
Fix auto_sharding_gpu_compiler_test with priority fusion.
jreiffers Jan 19, 2024
6b4baf8
Integrate LLVM at llvm/llvm-project@9299ca797ae6
tensorflower-gardener Jan 19, 2024
cd34b9a
[XLA:GPU] [NFC] Simplify scheduling API call + API call to get GPU ta…
cheshire Jan 19, 2024
9ae65c7
Make gemm_rewriter_test less sensitive to fusion algorithm.
jreiffers Jan 19, 2024
692a8f3
PR #8545: The CHECK macros can give warnings or errors on redefinition
elfringham Jan 19, 2024
e3006f2
Mark tsl::Status and tsl::errors::Code as DEPRECATE_INLINE, i.e. new …
tkoeppe Jan 19, 2024
a35c2e9
[XLA:GPU] Expose flag xla_gpu_enable_llvm_module_compilation_parallelism
tdanyluk Jan 19, 2024
6207749
Don't crash in GetAllocatorStats when using platform allocator.
jreiffers Jan 19, 2024
e7414e6
Integrate LLVM at llvm/llvm-project@340054e561bc
tensorflower-gardener Jan 19, 2024
96bd439
[xla:gpu] Convert NcclApi to a virtual base class
ezhulenev Jan 19, 2024
ef697e9
Rollback of 1493218625e3f2a2e239930171373c0429b6abe3, broke windows T…
BrianWieder Jan 19, 2024
7253636
Clarify that the ids in Ifrt_DevicesAttr are logical device IDs.
ICGog Jan 19, 2024
2a5579a
[xla:gpu] Fix tsan race in rendezvous with values
ezhulenev Jan 19, 2024
c3687fc
Use xla::Internal rather than the duplicate xla::InternalError to mat…
klucke Jan 19, 2024
17d3249
[PJRT C API] Add StableHLO forward/backward compatibility infrastruct…
GleasonK Jan 19, 2024
34c3bc8
Fix a typo in auto_sharding_dot_handler.cc.
tensorflower-gardener Jan 19, 2024
6ed94a2
Return mlir modules instead of XlaComputation from custom_partitioning.
pschuh Jan 19, 2024
4d292ae
1. In preparation for additional SliceTimePermutationIterators, we in…
sparc1998 Jan 19, 2024
65fd6a5
Integrate StableHLO at openxla/stablehlo@20255865
sdasgup3 Jan 19, 2024
cdec136
Support more types of conversions between HloSharding and ShardingParam.
ICGog Jan 19, 2024
bfa9bf2
[xla:gpu] Add a specialization for rendezvous returning absl::StatusOr
ezhulenev Jan 19, 2024
ef3a533
Replace manual creation of copystart/copydone with HloComputation::Cr…
SandSnip3r Jan 19, 2024
9d2e3d0
Create a SliceTimePermutationIterator that iterates over preferred sh…
sparc1998 Jan 19, 2024
f5334f5
[xla:gpu] NFC: Add error handling to factory methods for NcclExecuteP…
ezhulenev Jan 19, 2024
ae7572d
[NFC] Update PJRT ID related APIs in subclasses and callsites.
changhuilin Jan 20, 2024
c155aaf
Integrate LLVM at llvm/llvm-project@0784b1eefa36
tensorflower-gardener Jan 20, 2024
c5ff6e3
Automated Code Change
tensorflower-gardener Jan 20, 2024
3bcf05e
PR #8285: [ROCm] Fix //xla/service/gpu/runtime:topk_test
draganmladjenovic Jan 20, 2024
247280a
Change the HloComputation representation for the instructions from a
jeffreyadean Jan 20, 2024
023c964
[xla:gpu] Add support for human readable names to lockable
ezhulenev Jan 21, 2024
b004317
[xla:gpu] Make NcclClique public and rewrite NcclClique initialization
ezhulenev Jan 21, 2024
9f0a829
[xla:gpu] Add util function to test slices on leading dimension only
tyb0807 Jan 22, 2024
191cf78
[xla:gpu] Add support for AddressComputationFusion
tyb0807 Jan 22, 2024
e6a0bb6
[TileAnalysis] Move ComposeIndexingMaps to indexing_map.h.
pifon2a Jan 22, 2024
60f30d7
[TileAnalysis] Consistently use closed intervals in IndexingSimplifie…
pifon2a Jan 22, 2024
c4dc37b
Adjust FindNonTrivialHero check for reductions.
akuegel Jan 22, 2024
5c4527e
PR #8633: [ROCm] Fixed head file in rocm_collectives
i-chaochen Jan 22, 2024
eed1a7b
[XLA:GPU] Add ability to generate LLVM-before-optimizations to hlo-opt
cheshire Jan 22, 2024
51abfed
hlo-opt --stage=hlo-backend: check for scheduling in test.
jreiffers Jan 22, 2024
ae717d1
Internal change.
jreiffers Jan 22, 2024
5f135c6
[XLA:GPU] If dumping is enabled, dump used Compiler::TargetConfig und…
cheshire Jan 22, 2024
e82b8d5
Make padding tests less sensitive to fusion.
jreiffers Jan 22, 2024
7303823
PR #8545: The CHECK macros can give warnings or errors on redefinition
elfringham Jan 22, 2024
f61f472
Remove obsolete TODO.
jreiffers Jan 22, 2024
60f5032
PR #7963: [GPU] NCCL User Buffer Registration - Create CollectiveBFCA…
trevor-m Jan 22, 2024
43081b5
Add support for unbounded dynamism for Broadcast and BroadcastInDim op.
GleasonK Jan 22, 2024
ae11a78
Add LibNVPTXCompiler-based compilation
tensorflower-gardener Jan 22, 2024
d7be06e
[XLA:GPU] Add "NVIDIA H100 PCIe" target config (h100_pcie.txtpb)
tdanyluk Jan 22, 2024
324b9dd
Integrate LLVM at llvm/llvm-project@21830c913505
tensorflower-gardener Jan 22, 2024
350c766
Refactor PartialReplicateReshardCompatibleSharding and MakePartitionO…
ZixuanJiang Jan 22, 2024
ab60a40
[xla:gpu] Add new emitter for concatenate fusions.
chsigg Jan 22, 2024
8ac05e1
Add unbounded dynamism tests for AndOp and CompareOp.
ghpvnist Jan 22, 2024
4549a23
Rename xla::InternalErrorStrCat to be xla::InternalStrCat for consist…
klucke Jan 22, 2024
c3b4145
[XLA:WHILE_LOOP_FUSIBLE] Do not sync sliced invariant values into whi…
blakehechtman Jan 22, 2024
779cb8b
Mark const for member functions of PartitionedHlo. Pass the const& in…
ZixuanJiang Jan 22, 2024
3b31e2c
[TileAnalysis] Match Domain of thread_id->output indexing maps.
pifon2a Jan 22, 2024
b5dda58
Symlink the Bazel test output directory to be in the Kokoro artifacts…
nitins17 Jan 22, 2024
59d9399
Allow Reshape to have unbounded dynamic input if result shape is stat…
GleasonK Jan 22, 2024
54accfd
[XLA:Python] Fail with an AttributeError if __cuda_array_interface__ …
hawkinsp Jan 22, 2024
962df63
[xla:gpu] Support Emitting fft from HLO
anlunx Jan 22, 2024
ff1db84
Remove unneeded deps.
DrMarcII Jan 22, 2024
166f51f
[xla:gpu] NFC: Rename NcclExecuteParams to CollectiveExecuteParams an…
ezhulenev Jan 22, 2024
5cf68b6
[xla:gpu] Add a Prepare stage to Thunks to request shared resources b…
ezhulenev Jan 22, 2024
391ff8a
added multi-stream support in sequentialThunk
Tixxx Dec 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
274 changes: 222 additions & 52 deletions .bazelrc

Large diffs are not rendered by default.

39 changes: 39 additions & 0 deletions .github/workflows/buildifier.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright 2023 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
name: Buildifier
on:
pull_request_target:

env:
# Have `go install` place binaries in $PATH
GOBIN: "/usr/local/bin"

jobs:
buildifier-lint:
runs-on: ubuntu-22.04
defaults:
run:
shell: bash
timeout-minutes: 1
if: |
github.event.sender.type == 'User' ||
contains(github.event.pull_request.body, 'FORCE_TEST_ACTIONS')
steps:
- name: "Checking out repository"
uses: actions/checkout@e2f20e631ae6d7dd3b768f56a5d2af784dd54791 # v2.5.0
- name: "Install buildifier"
run: go install github.com/bazelbuild/buildtools/buildifier@433ea85 # 6.4.0
- name: "Run buildifier"
run: buildifier --lint=warn --warnings=-out-of-order-load -r xla/
2 changes: 1 addition & 1 deletion .github/workflows/trusted_partners.js
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ const get_email_domain = async ({github, username}) => {
const filter_action = async ({github, context, domain}) => {
const labels = ['kokoro:force-run'];

let assignees = ['radhakrishnaba', 'xla-rotation'];
let assignees = ['kamaljeeti', 'xla-rotation'];
const title =
context.payload.pull_request && context.payload.pull_request.title;
const lowercased_title = (title || '').toLowerCase();
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ bazel-bin
bazel-out
bazel-testlogs

# Ignore files produced by `configure`
.tf_configure.bazelrc
tools/python_bin_path.sh

# Emacs autosaves
*~
Expand Down
2 changes: 0 additions & 2 deletions .kokoro/generate_index_html.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ tee "$1" <<EOF
</ul>
<h2>Googlers-Only Links</h2>
<ul>
<li><a href="http://sponge/$KOKORO_BUILD_ID">Sponge</a></li>
<li><a href="http://sponge2/$KOKORO_BUILD_ID">Sponge2</a></li>
<li><a href="http://fusion/$KOKORO_BUILD_ID">Test Fusion</a></li>
<li><a href="http://sponge/target:$KOKORO_JOB_NAME">Sponge - recent jobs</a></li>
</ul>
<h2>Non-Googler Links</h2>
Expand Down
107 changes: 107 additions & 0 deletions .kokoro/jax/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
#!/bin/bash
# Copyright 2022 Google LLC All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# -e: abort script if one command fails
# -u: error if undefined variable used
# -o pipefail: entire command fails if pipe fails. watch out for yes | ...
# -o history: record shell history
set -euox pipefail -o history

# Builds + tests jaxlib against CL/PR version of XLA + JAX main.

source "${KOKORO_GFILE_DIR}/utils.sh"

function is_linux_gpu_job() {
[[ "$KOKORO_JOB_NAME" =~ tensorflow/xla/jax/.*gpu.* ]]
}

clone_main_jax() {
git clone https://github.com/google/jax.git
}

prelude() {
export JAX_ENABLE_X64=0

if is_linux_gpu_job ; then
export JAX_CUDA_VERSION=12
export JAX_CUDNN_VERSION=8.9
nvidia-smi
setup_env_vars_py39
else
setup_env_vars_py312
fi

cd "${KOKORO_ARTIFACTS_DIR}"

use_local_or_install_python
install_packages "$NUMPY_VERSION" "$SCIPY_VERSION"
clone_main_jax
# Install bazel
update_bazel_linux

cd jax

}

build_and_test_on_rbe_cpu() {
# Run the tests.
bazel \
test \
--verbose_failures=true \
--override_repository=xla="${KOKORO_ARTIFACTS_DIR}"/github/xla \
--config=avx_posix \
--config=mkl_open_source_only \
--config="rbe_cpu_linux_py3.12" \
--config=tensorflow_testing_rbe_linux \
--test_env=JAX_NUM_GENERATED_CASES=25 \
--test_output=errors \
-- //tests:cpu_tests //tests:backend_independent_tests
}

build_and_test_on_rbe_gpu() {
# Runs non-multiaccelerator tests with one GPU apiece.
# It appears --run_under needs an absolute path.

bazel \
test \
--verbose_failures=true \
--override_repository=xla="${KOKORO_ARTIFACTS_DIR}"/github/xla \
--config=avx_posix \
--config=mkl_open_source_only \
--config="rbe_linux_cuda12.3_nvcc_py3.9" \
--config=tensorflow_testing_rbe_linux \
--test_env=XLA_PYTHON_CLIENT_ALLOCATOR=platform \
--test_output=errors \
--test_env=JAX_SKIP_SLOW_TESTS=1 \
--test_env=TF_CPP_MIN_LOG_LEVEL=0 \
--test_env=JAX_EXCLUDE_TEST_TARGETS="PmapTest.testSizeOverflow" \
--test_tag_filters=-multiaccelerator \
-- //tests:gpu_tests //tests:backend_independent_tests
}

# Generate a templated results file to make output accessible to everyone
"$KOKORO_ARTIFACTS_DIR"/github/xla/.kokoro/generate_index_html.sh "$KOKORO_ARTIFACTS_DIR"/index.html

prelude

if is_linux_gpu_job ; then
build_and_test_on_rbe_gpu
else
build_and_test_on_rbe_cpu
fi

echo "bazel-testlogs (test results) location:"
find "$KOKORO_ARTIFACTS_DIR" \
-type l,d -name bazel-testlogs || echo "bazel-testlogs not found"
9 changes: 5 additions & 4 deletions .kokoro/linux/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,19 @@ RC_FILE="/usertools/cpu.bazelrc"
TARGET_FILTER=""
TAGS_FILTER="-no_oss,-oss_excluded,-oss_serial"
ADDITIONAL_FLAGS=""
RBE_CONFIG=""

if is_linux_gpu_job ; then
TAGS_FILTER="$TAGS_FILTER,gpu,requires-gpu-nvidia,-no_gpu"
ADDITIONAL_FLAGS="$ADDITIONAL_FLAGS --run_under=//tools/ci_build/gpu_build:parallel_gpu_execute"
RC_FILE="/usertools/gpu.bazelrc"
RBE_CONFIG="rbe_linux_cuda_nvcc"
echo "***NOTE: nvidia-smi lists the highest CUDA version the driver supports, which may be different than the version of CUDA actually used!!***"
nvidia-smi
else
TAGS_FILTER="$TAGS_FILTER,-gpu,-requires-gpu-nvidia"
ADDITIONAL_FLAGS="$ADDITIONAL_FLAGS --config=nonccl"
RBE_CONFIG="rbe_linux_cpu"
fi

# Build & test XLA
Expand All @@ -65,7 +68,7 @@ docker exec xla bazel --bazelrc=$RC_FILE \
--features=layering_check \
--profile=/tf/pkg/profile.json.gz \
--flaky_test_attempts=3 \
--config=rbe \
--config=$RBE_CONFIG \
--jobs=150 \
--nobuild_tests_only \
$ADDITIONAL_FLAGS \
Expand All @@ -74,9 +77,7 @@ docker exec xla bazel --bazelrc=$RC_FILE \

# Print build time statistics, including critical path.
docker exec xla bazel analyze-profile "/tf/pkg/profile.json.gz"
# TODO(ddunleavy): enable once container has clang-tidy
# docker exec xla git config --global --add safe.directory /tf/xla
# docker exec xla bash -c "bazel aquery --output=jsonproto \"mnemonic(CppCompile, //xla/...)\" | PYTHONPATH=.. python3 build_tools/lint/clang_tidy.py --changed_lines_only"

# Stop container
docker stop xla

Expand Down
5 changes: 0 additions & 5 deletions .kokoro/linux/cpu/build_cpu.cfg

This file was deleted.

13 changes: 0 additions & 13 deletions .kokoro/linux/cpu/common.cfg

This file was deleted.

5 changes: 0 additions & 5 deletions .kokoro/linux/gpu/build_gpu.cfg

This file was deleted.

13 changes: 0 additions & 13 deletions .kokoro/linux/gpu/common.cfg

This file was deleted.

8 changes: 8 additions & 0 deletions .kokoro/macos/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,15 @@ bazel test \
--output_filter="" \
--macos_minimum_os=10.15 \
--keep_going \
--test_output=errors \
--config=nonccl \
--build_tag_filters=$TAGS_FILTER --test_tag_filters=$TAGS_FILTER \
--test_size_filters=small,medium \
-- //xla/... $TARGET_FILTER


# We want to store the individual test logs in GCS after the build finishes.
# Since the Bazel test output folder is in a different partition to the Kokoro
# artifacts folder, we need to symlink the test output folder to be in the
# artifacts folder.
ln -s /Volumes/BuildData/bazel_output "$KOKORO_ARTIFACTS_DIR"
14 changes: 0 additions & 14 deletions .kokoro/macos/cpu/common.cfg

This file was deleted.

5 changes: 0 additions & 5 deletions .kokoro/macos/cpu/cpu_py39_full.cfg

This file was deleted.

1 change: 0 additions & 1 deletion .kokoro/windows/cpu/build_cpu_py39.cfg

This file was deleted.

12 changes: 0 additions & 12 deletions .kokoro/windows/cpu/common.cfg

This file was deleted.

2 changes: 1 addition & 1 deletion build_tools/docker/context/install_bazel.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023 The TensorFlow Authors. All Rights Reserved.
# Copyright 2023 The OpenXLA Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion build_tools/docker/context/install_python_deps.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023 The TensorFlow Authors. All Rights Reserved.
# Copyright 2023 The OpenXLA Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion build_tools/docker/dockerfiles/benchmarking.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023 The TensorFlow Authors. All Rights Reserved.
# Copyright 2023 The OpenXLA Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion build_tools/github_actions/build_xla.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright 2023 The TensorFlow Authors. All Rights Reserved.
# Copyright 2023 The OpenXLA Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
29 changes: 17 additions & 12 deletions build_tools/lint/BUILD
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023 The TensorFlow Authors. All Rights Reserved.
# Copyright 2023 The OpenXLA Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -13,24 +13,14 @@
# limitations under the License.
# ============================================================================

load(
"//xla:pytype.default.bzl",
"pytype_strict_binary",
"pytype_strict_library",
)
load("//xla:pytype.default.bzl", "pytype_strict_library")
# Placeholder: load py_test

package(
# copybara:uncomment default_applicable_licenses = ["//tensorflow:license"],
licenses = ["notice"],
)

pytype_strict_binary(
name = "clang_tidy",
srcs = ["clang_tidy.py"],
deps = [":diff_parser"],
)

pytype_strict_library(
name = "check_contents",
srcs = ["check_contents.py"],
Expand All @@ -43,6 +33,11 @@ pytype_strict_library(
visibility = ["//visibility:public"],
)

pytype_strict_library(
name = "generate_compile_commands",
srcs = ["generate_compile_commands.py"],
)

py_test(
name = "check_contents_test",
srcs = ["check_contents_test.py"],
Expand Down Expand Up @@ -71,3 +66,13 @@ py_test(
"@absl_py//absl/testing:absltest",
],
)

py_test(
name = "generate_compile_commands_test",
srcs = ["generate_compile_commands_test.py"],
tags = ["no_oss"],
deps = [
":generate_compile_commands",
"@absl_py//absl/testing:absltest",
],
)
Loading
Loading