MobileNet: Update QNN EP example (#1563)

## Describe your changes MobileNet QNN EP example was outdated. QNN EP no longer requires x86 env for dev and arm env for inference. There is only one windows x86 package that can do both. - Removed mobilenet_qnn_ep.py launcher script since a workflow json is enough. - Added dynamic -> static shape step as a pass instead of during model download. This makes the static shape requirement clear to the user. - Simplified data config. - Download files uses command list instead of string to run subprocess. Otherwise, there are potential issues with paths being split incorrectly. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. - [ ] Is this PR including examples changes? If yes, please remember to update [example documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md) in a follow-up PR. ## (Optional) Issue link
microsoft · Jan 22, 2025 · da3b967 · da3b967
1 parent e510074
commit da3b967
Show file tree

Hide file tree

Showing 10 changed files with 59 additions and 200 deletions.
diff --git a/examples/mobilenet/README.md b/examples/mobilenet/README.md
@@ -1,17 +1,17 @@
 # MobileNet optimization with QDQ Quantization on Qualcomm NPU
-This folder contains a sample use case of Olive to optimize a MobileNet model for Qualcomm NPU (QNN Execution Provider)
-using static QDQ quantization.
+This folder contains a sample use case of Olive to optimize a MobileNet model for Qualcomm NPU (QNN Execution Provider) using static QDQ quantization.
 
-## Prerequisites for Quantization
-### Clone the repository and install Olive (x86 python)
+This example requires an x86 python environment on a Windows ARM machine.
 
-Refer to the instructions in the [examples README](../README.md) to clone the repository and install Olive.
 
-### Install onnxruntime (x86 python)
-This example requires onnxruntime>=1.17.0. Please install the latest version of onnxruntime:
+## Prerequisites
+### Clone the repository and install Olive
+
+Refer to the instructions in the [examples README](../README.md) to clone the repository and install Olive.
 
+### Install onnxruntime-qnn
 ```bash
-python -m pip install "onnxruntime>=1.17.0"
+python -m pip install onnxruntime-qnn
 ```
 
 ### Pip requirements
@@ -20,50 +20,16 @@ Install the necessary python packages:
 python -m pip install -r requirements.txt
 ```
 
-## Prerequisites for Evaluation
-
-### Download and unzip QNN SDK
-Download the Qualcomm AI Engine Direct SDK (QNN SDK) from https://qpm.qualcomm.com/main/tools/details/qualcomm_ai_engine_direct.
-
-Complete the steps to configure the QNN SDK for QNN EP as described in the [QNN EP Documentation](https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#running-a-quantized-model-on-windows-arm64).
-
-Set the environment variable `QNN_LIB_PATH` as `QNN_SDK\lib\aarch64-windows-msvc`.
-
-### Install onnxruntime-qnn (ARM64 python)
-If you want to evaluate the quantized model on the NPU, you will need to install the onnxruntime-qnn package. This package is only available for Windows ARM64 python so you will need a separate ARM64 python installation to install it.
-
-Using an ARM64 python installation, create a virtual environment and install the onnxruntime-qnn package:
-```bash
-python -m venv qnn-ep-env
-qnn-ep-env\Scripts\activate
-python -m pip install ort-nightly-qnn --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
-deactivate
+### Download data and model
+To download the necessary data and model files:
 ```
-
-Set the environment variable `QNN_ENV_PATH` to the directory where the python executable is located:
-```bash
-set QNN_ENV_PATH=C:\path\to\qnn-ep-env\Scripts
+python download_files.py
 ```
 
-**Note:** Using a virtual environment is optional but recommended to better manage the dependencies.
-
 ## Run the sample
-
-### Quantize the model
-Run the following command to quantize the model:
-```bash
-python mobilenet_qnn_ep.py
-```
-
-### Quantize and evaluate the model
 Run the following command to quantize the model and evaluate it on the NPU:
 ```bash
-python mobilenet_qnn_ep.py --evaluate
+olive run --config mobilenet_qnn_ep.json
 ```
 
-**NOTE:** You can also only dump the workflow configuration file by adding the `--config_only` flag to the command.
-
-The configuration file will be saved as `mobilenet_qnn_ep_{eval|no_eval}.json` in the current directory and can be run using the Olive CLI.
-```bash
-olive run --config mobilenet_qnn_ep_{eval|no_eval}.json
-```
+**NOTE:** The model optimization part of the workflow can also be done on a Linux machine with a different onnxruntime package installed. Remove the `"evaluators"` and `"evaluator`" sections from the `mobilenet_qnn_ep.json` configuration file to skip the evaluation step.
diff --git a/examples/mobilenet/README_QNN_SDK.md b/examples/mobilenet/README_QNN_SDK.md
@@ -17,7 +17,7 @@ olive configure-qualcomm-sdk --py_version 3.8 --sdk qnn
 
 ### Prepare workflow config json
 ```
-python prepare_config.py --use_raw_qnn_sdk
+python prepare_config.py
 ```
 
 ### Pip requirements

diff --git a/examples/mobilenet/download_files.py b/examples/mobilenet/download_files.py
@@ -54,10 +54,7 @@ def download_model():
         tar_ref.extractall(stage_dir)  # lgtm
     original_model_path = stage_dir / mobilenet_name / f"{mobilenet_name}.onnx"
     model_path = models_dir / f"{mobilenet_name}.onnx"
-    run_subprocess(
-        f"python -m onnxruntime.tools.make_dynamic_shape_fixed {original_model_path} {model_path} --dim_param"
-        " batch_size --dim_value 1"
-    )
+    shutil.copy(original_model_path, model_path)
 
 
 def download_eval_data():
@@ -69,7 +66,7 @@ def download_eval_data():
 
     # download evaluation data
     github_source = "https://github.com/EliSchwartz/imagenet-sample-images.git"
-    run_subprocess(f"git clone {github_source} {stage_dir}")
+    run_subprocess(["git", "clone", github_source, stage_dir], check=True)
 
     # sort jpegs
     jpegs = list(stage_dir.glob("*.JPEG"))

diff --git a/.../mobilenet/mobilenet_qnn_ep_template.json → examples/mobilenet/mobilenet_qnn_ep.json b/.../mobilenet/mobilenet_qnn_ep_template.json → examples/mobilenet/mobilenet_qnn_ep.json
@@ -4,27 +4,15 @@
         "local_system": {
             "type": "LocalSystem",
             "accelerators": [ { "execution_providers": [ "QNNExecutionProvider" ] } ]
-        },
-        "qnn_ep_env": {
-            "type": "IsolatedORT",
-            "python_environment_path": "<qnn_env_path>",
-            "accelerators": [ { "execution_providers": [ "QNNExecutionProvider" ] } ],
-            "preprend_to_path": [ "<qnn_lib_path>" ]
         }
     },
     "data_configs": [
         {
-            "name": "metric_data_config",
+            "name": "mobilenet_data_config",
             "user_script": "user_script.py",
-            "load_dataset_config": { "type": "qnn_evaluation_dataset", "data_dir": "data/eval" },
-            "post_process_data_config": { "type": "qnn_post_process" },
+            "load_dataset_config": { "type": "mobilenet_dataset", "data_dir": "data/eval" },
+            "post_process_data_config": { "type": "mobilenet_post_process" },
             "dataloader_config": { "batch_size": 1 }
-        },
-        {
-            "name": "quant_data_config",
-            "user_script": "user_script.py",
-            "load_dataset_config": { "type": "simple_dataset" },
-            "dataloader_config": { "type": "mobilenet_calibration_reader", "data_dir": "data/eval" }
         }
     ],
     "evaluators": {
@@ -33,7 +21,7 @@
                 {
                     "name": "accuracy",
                     "type": "accuracy",
-                    "data_config": "metric_data_config",
+                    "data_config": "mobilenet_data_config",
                     "sub_types": [
                         {
                             "name": "accuracy_score",
@@ -45,25 +33,26 @@
                 {
                     "name": "latency",
                     "type": "latency",
-                    "data_config": "metric_data_config",
+                    "data_config": "mobilenet_data_config",
                     "sub_types": [ { "name": "avg", "priority": 2 } ]
                 }
             ]
         }
     },
     "passes": {
+        "dynamic_shape_to_fixed": { "type": "DynamicToFixedShape", "dim_param": [ "batch_size" ], "dim_value": [ 1 ] },
         "qnn_preprocess": { "type": "QNNPreprocess" },
         "quantization": {
             "type": "OnnxStaticQuantization",
-            "data_config": "quant_data_config",
+            "data_config": "mobilenet_data_config",
             "activation_type": "QUInt16",
             "weight_type": "QUInt8"
         }
     },
-    "target": "<target>",
-    "evaluator": "<evaluator>",
+    "host": "local_system",
+    "target": "local_system",
+    "evaluator": "common_evaluator",
     "evaluate_input_model": false,
     "cache_dir": "cache",
-    "clean_cache": true,
     "output_dir": "models/mobilenet_qnn_ep"
 }
diff --git a/examples/mobilenet/mobilenet_qnn_ep.py b/examples/mobilenet/mobilenet_qnn_ep.py
diff --git a/examples/mobilenet/prepare_config.py b/examples/mobilenet/prepare_config.py
@@ -2,7 +2,6 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
-import argparse
 import json
 import platform
 from pathlib import Path
@@ -41,12 +40,4 @@ def raw_qnn_config():
 
 
 if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "--use_raw_qnn_sdk",
-        action="store_true",
-        help="If set, use the raw qnn sdk instead of the qnn EP",
-    )
-    args = parser.parse_args()
-    if args.use_raw_qnn_sdk:
-        raw_qnn_config()
+    raw_qnn_config()
diff --git a/examples/mobilenet/raw_qnn_sdk_template.json b/examples/mobilenet/raw_qnn_sdk_template.json
@@ -43,6 +43,7 @@
         }
     },
     "passes": {
+        "dynamic_shape_to_fixed": { "type": "DynamicToFixedShape", "dim_param": [ "batch_size" ], "dim_value": [ 1 ] },
         "converter": { "type": "QNNConversion" },
         "quantization": { "type": "QNNConversion", "extra_args": "--input_list <input_list.txt>" },
         "build_model_lib": { "type": "QNNModelLibGenerator", "lib_targets": "x86_64-linux-clang" }

diff --git a/examples/mobilenet/user_script.py b/examples/mobilenet/user_script.py
@@ -6,11 +6,13 @@
 
 import numpy as np
 import torch
-from torch.utils.data import DataLoader, Dataset
+from torch.utils.data import Dataset
 
 from olive.data.registry import Registry
 from olive.platform_sdk.qualcomm.utils.data_loader import FileListProcessedDataLoader
 
+# QNN EP dataset and post-process functions
+
 
 class MobileNetDataset(Dataset):
     def __init__(self, data_dir: str):
@@ -28,7 +30,21 @@ def __len__(self):
     def __getitem__(self, idx):
         data = torch.unsqueeze(self.data[idx], dim=0)
         label = self.labels[idx] if self.labels is not None else -1
-        return {"input": data}, label
+        # need to remove the batch dimension, will be added by the dataloader
+        return {"input": data.squeeze(0)}, label
+
+
+@Registry.register_dataset()
+def mobilenet_dataset(data_dir, **kwargs):
+    return MobileNetDataset(data_dir)
+
+
+@Registry.register_post_process()
+def mobilenet_post_process(output):
+    return output.argmax(axis=1)
+
+
+# QNN SDK dataloader and post-process functions
 
 
 @Registry.register_dataloader()
@@ -40,22 +56,6 @@ def qnn_dataloader(dataset, data_dir: str, batch_size: int, **kwargs):
     )
 
 
-@Registry.register_dataset()
-def qnn_evaluation_dataset(data_dir, **kwargs):
-    return MobileNetDataset(data_dir)
-
-
-@Registry.register_post_process()
-def qnn_post_process(output):
-    return output.argmax(axis=1)
-
-
 @Registry.register_post_process()
 def qnn_sdk_post_process(output):
     return np.array([output.argmax(axis=-1)])
-
-
-@Registry.register_dataloader()
-def mobilenet_calibration_reader(dataset, batch_size, data_dir, **kwargs):
-    dataset = MobileNetDataset(data_dir)
-    return DataLoader(dataset, batch_size=batch_size, shuffle=False)