-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directly send tensor via jit serialization #3088
Conversation
Added support for BF16 in TensorDecomposer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logics seems only for LLM BF16. I think what want to achieve is for all Tensor, regardless or not.
I think we mixed two processes: conversion for filtering, and conversion for communication Considering local to server communication (reverse will be similar) with quantization: Currently our client api executor has a default to_nvflare_converter as PTtoNumpy, so afterwards everything will be in numpy, including the serialization part, so the tensor decomposer/composer will not be called. Now if instead of PTtoNumpy, we use a simple "pass through" to_nvflare_converter, it gonna have two indications:
Hence we can have two places with tensor<->numpy conversion: converter for filter, and decomposer for communication. The first will mean all the following computations (filter) are in numpy, while the second means only the communication/serialization is via numpy - but it will be recovered to tensor once received, so "virtually" the whole pipeline is still in tensor. For the sake of serialization efficency, my guess is that numpy maybe more efficient than jit (@nvidianz to confirm), then jit is only needed for formats not supported by numpy (e.g. bf16), but if otherwise, we can use jit for all cases (And maybe "safe tensor" as suggested). |
The reason for avoid to_numpy() conversion is to avoid loss the Tensor Compression ratio to make sure the Tensor Model doesn't increase after transfer. It doesn't matter this conversion is in filter or other places, if we convert tensor in jit in place, and use to_n umpy in another place before sending over the wire, we already loss the compression, it JIT conversion is becomes pointless. We need Tensor native in all communication pipeline |
no this is not the case, use numpy + jit for serialization will not lead to bigger message, only the conversion for filter purpose will - because for that we want everything in numpy and so have to cast bf16 to float32 so that it can be convered |
/build |
/build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to change package path
/build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes # .
Description
Directly send tensor without converting to numpy
Using jit serialization to avoid pickle
Types of changes
./runtest.sh
.