LitePred: Inference latency differences between different versions of tflite benchmark #62

JiacliUstc · 2025-01-06T08:41:44Z

Hi, thank you for your nice work of Litepred.
I noticed that litepred mentioned that "different versions of tflite have different inference latency", but I used different versions of tensorflow repository to compile the benchmark (compiled by bazel), and use them to test on Android 10 and found that their inference latency of gpu is similar. How to solve this problem?
the test logs is following:
star2qltechn:/data/local/tmp/test $ ./1-5/tf2.1 --use_gpu=true --warmup_runs=1 --num_runs=50 --graph=alexnet/alexnet_tf_2_1.tflite --enable_op_profiling=true
STARTING!
Min num runs: [50]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [alexnet/alexnet_tf_2_1.tflite]
Input layers: []
Input shapes: []
Input value ranges: []
Use nnapi : [0]
Use legacy nnapi : [0]
Use gpu : [1]
Allow lower precision in gpu : [1]
Allow fp16 : [0]
Require full delegation : [0]
Enable op profiling: [1]
Max profiling buffer entries: [1024]
Loaded model alexnet/alexnet_tf_2_1.tflite
resolved reporter
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Initialized OpenCL-based API.
Applied GPU delegate.
Initialized session in 3977.45ms
[Init Phase] - Memory usage: max resident set size = 696.133 MB, total malloc-ed size = 1.50799 MB
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=24 first=31746 curr=22412 min=19189 max=31746 avg=21405.9 std=2399

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=21568 curr=21257 min=20264 max=22469 avg=21337.2 std=576

[Overall] - Memory usage: max resident set size = 696.133 MB, total malloc-ed size = 1.67936 MB
Average inference timings in us: Warmup: 21405.9, Init: 3977452, no stats: 21337.2
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
Misc Runtime Ops 0.000 0.015 0.015 0.067% 0.067% 0.000 0 AllocateTensors/0
DELEGATE 0.000 21.558 21.328 99.933% 100.000% 0.000 0 [Identity]:17

============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DELEGATE 0.000 21.558 21.328 99.933% 99.933% 0.000 0 [Identity]:17
Misc Runtime Ops 0.000 0.015 0.015 0.067% 100.000% 0.000 0 AllocateTensors/0

Number of nodes executed: 2
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
DELEGATE 1 10.881 99.936% 99.936% 0.000 0
Misc Runtime Ops 1 0.007 0.064% 100.000% 0.000 0

Timings (microseconds): count=98 first=15 curr=21248 min=15 max=22461 avg=10889.1 std=10662
Memory (bytes): count=0
2 nodes observed

star2qltechn:/data/local/tmp/test $ ./1-5/tf2.7 --use_gpu=true --warmup_runs=1 --num_runs=50 --graph=alexnet/alexnet_tf_2_7.tflite --enable_op_profiling=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [50]
Min warmup runs: [1]
Graph: [alexnet/alexnet_tf_2_7.tflite]
Enable op profiling: [1]
Use gpu: [1]
Loaded model alexnet/alexnet_tf_2_7.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate created.
INFO: Replacing 18 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 244.411
Initialized session in 1687.51ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=23 first=29248 curr=21244 min=20900 max=29248 avg=21976.8 std=1588

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=22599 curr=21881 min=21313 max=22994 avg=22036.8 std=467

Inference timings in us: Init: 1687514, First inference: 29248, Warmup (avg): 21976.8, Inference (avg): 22036.8
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=725.035 overall=725.035
Profiling Info for Benchmark Initialization:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 0.000 1685.944 1685.944 99.998% 99.998% 740452.000 1 ModifyGraphWithDelegate/0
AllocateTensors 1685.943 0.031 0.017 0.002% 100.000% 0.000 2 AllocateTensors/0

============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 0.000 1685.944 1685.944 99.998% 99.998% 740452.000 1 ModifyGraphWithDelegate/0
AllocateTensors 1685.943 0.031 0.017 0.002% 100.000% 0.000 2 AllocateTensors/0

Number of nodes executed: 2
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
ModifyGraphWithDelegate 1 1685.944 99.998% 99.998% 740451.938 1
AllocateTensors 1 0.034 0.002% 100.000% 0.000 2

Timings (microseconds): count=1 curr=1685978
Memory (bytes): count=0
2 nodes observed

Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
TfLiteGpuDelegateV2 0.039 22.467 21.949 100.000% 100.000% 0.000 1 [StatefulPartitionedCall:0]:18

============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
TfLiteGpuDelegateV2 0.039 22.467 21.949 100.000% 100.000% 0.000 1 [StatefulPartitionedCall:0]:18

Number of nodes executed: 1
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
TfLiteGpuDelegateV2 1 21.948 100.000% 100.000% 0.000 1

Timings (microseconds): count=50 first=22467 curr=21798 min=21236 max=22905 avg=21948.9 std=463
Memory (bytes): count=0
1 nodes observed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LitePred: Inference latency differences between different versions of tflite benchmark #62

LitePred: Inference latency differences between different versions of tflite benchmark #62

JiacliUstc commented Jan 6, 2025

LitePred: Inference latency differences between different versions of tflite benchmark #62

LitePred: Inference latency differences between different versions of tflite benchmark #62

Comments

JiacliUstc commented Jan 6, 2025

Timings (microseconds): count=98 first=15 curr=21248 min=15 max=22461 avg=10889.1 std=10662 Memory (bytes): count=0 2 nodes observed

Timings (microseconds): count=98 first=15 curr=21248 min=15 max=22461 avg=10889.1 std=10662
Memory (bytes): count=0
2 nodes observed