forked from intel/nn-hal
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ai benchmark improvements #26
Open
rairatne
wants to merge
8
commits into
projectceladon:master
Choose a base branch
from
rairatne:ai_benchmark_improvements
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Ai benchmark improvements #26
rairatne
wants to merge
8
commits into
projectceladon:master
from
rairatne:ai_benchmark_improvements
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- loadmodel rpc call added after sending IR files - included data_type parameter for remote input data - add check for remote output length Tracked-On: OAM-110555 Signed-off-by: Ratnesh Kumar Rai <[email protected]>
- increases grpc message size limit to INT_MAX - increases deadline for remote model load time to 3 minutes - modified mRemoteCheck from global to class member - improved remote checks - increase chunk size from 1 MB to 10 MB Tracked-On:OAM-110557 Signed-off-by: Ratnesh Kumar Rai <[email protected]> Signed-off-by: Anoob Anto K <[email protected]>
- Added Hard Swish - Enabled Resize Bilinear for float 16 and Quant - Enabled Resize Nearest Neighbor for float 16 and Quant - Resolved quan type conversion for Quant Asymm and Signed for Split Tracked-On: OAM-110564 Signed-off-by: Anoob Anto K <[email protected]> Signed-off-by: Ratnesh Kumar Rai <[email protected]>
This fixes the following errors - Upcasting non-compliant model - Upcasting non-compliant operand type TENSOR_QUANT8_ASYMM_SIGNED from V1_3::OperandType to V1_2::OperandType Tracked-On: OAM-110572 Signed-off-by: Anoob Anto K <[email protected]>
- modified mDetectionClient to class object from global - added Tokens to identify specific model request over grpc - added release rpc call to do cleanup on remote side - [To be fixed]removed remote infer for asyncExceute and fencedExecute as implementaion was not proper Tracked-On: OAM-110559 Signed-off-by: Anoob Anto K <[email protected]> Signed-off-by: Ratnesh Kumar Rai <[email protected]>
Remove static variables - Separate ModelInfo objects for each operation - unmap runtime memory pool at end of each execute call - optimised network graph creator so that it can be released once graph is created and loaded Tracked-On: OAM-110558 Signed-off-by: Anoob Anto K <[email protected]> Signed-off-by: Ratnesh Kumar Rai <[email protected]>
- if remote infer fails, disable parallel attempts for remote inference - disable remote infer for quant type models Tracked-On: OAM-110563 Signed-off-by: Ratnesh Kumar Rai <[email protected]> Signed-off-by: Anoob Anto K <[email protected]>
- Split previous loadNetwork into two parts - create Network: It loads the generated graph Network and dump it as xml and bin - loadNetwork: which now reads the xml and bin and and create infer request - fallback to native inference if remote infer fails. Note: fallback causes load network to trigger load for native infer which increase infer time in fallback scenario, in case of only native infer(no remote infer) compile_model is called twice, thus resulting in longer model time. Sub-Task JIRA: OAM-110562 Tracked-On: OAM-109729 Signed-off-by: Ratnesh Kumar Rai <[email protected]>
rairatne
force-pushed
the
ai_benchmark_improvements
branch
from
June 2, 2023 09:14
6ead7e3
to
ac80d67
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Modified nn-hal to improve memory utilization and scores on the Ai_Benchmark app.
Improvement includes:
- better memory usage while doing parallel inference
- more operations enabled/added with float 16 support
- offload to remote infer if available
- offload to remote only if the model is non-quant type
- for now, remote-infer is only supported if nn-api calls execute Synchronously
- enable parallel remote inference
- supports dynamic input shapes and data-types for remote infer
Tracked-On: OAM-109729