diff --git a/README.md b/README.md
index 7facdc71..48850672 100644
--- a/README.md
+++ b/README.md
@@ -98,16 +98,16 @@ After installation, a command named `nn-meter` is enabled. To predict the latenc
 
 ```bash
 # for Tensorflow (*.pb) file
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> 
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> 
 
 # for ONNX (*.onnx) file
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>
 
 # for torch model from torchvision model zoo (str)
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --torchvision <model-name> <model-name>... 
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --torchvision <model-name> <model-name>... 
 
 # for nn-Meter IR (*.json) file
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --nn-meter-ir <json-file_or_folder> 
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --nn-meter-ir <json-file_or_folder> 
 ```
 
 `--predictor-version <version>` arguments is optional. When the predictor version is not specified by users, nn-meter will use the latest version of the predictor.
diff --git a/docs/dataset.md b/docs/dataset.md
new file mode 100644
index 00000000..8a363410
--- /dev/null
+++ b/docs/dataset.md
@@ -0,0 +1,11 @@
+# Benchmark dataset
+
+To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. nn-Meter collects and generates 26k CNN models (Please refer the paper for the dataset generation method).
+
+We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. This interface could automatically download the nn-Meter bench dataset and return the path of the dataset when calling. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own. This [example](../examples/nn-meter_predictor_for_bench_dataset.ipynb) shows how to use nn-Meter predictor to predict latency for the bench dataset.
+
+**Note:** to measure the inference latency of models in this dataset, we generate tensorflow pb and tflite models and measure their latency on the target devices. However, since it requires hundreds of GB storage to store the full dataset, we didn't include these model files. Instead, we parse the pb files and record the model structures and parameters in 
+`nn_meter.dataset`.
+
+Since the dataset is encoded in a graph format, we also provide an interface of `nn_meter.dataset.gnn_dataloader` for GNN training. By this interface, `GNNDataset` and `GNNDataloader` build the model structure of the bench dataset in `.jsonl` format into GNN required dataset and data loader. Users could refer to this [example](../examples/nn-meter_dataset_for_gnn.ipynb) for further information of `gnn_dataloader`. Note that to apply nn-Meter bench dataset for GNN training, the package `torch` and `dgl` should be installed.
+
diff --git a/docs/input_models.md b/docs/input_models.md
index 5424616d..50f98340 100644
--- a/docs/input_models.md
+++ b/docs/input_models.md
@@ -10,17 +10,17 @@ You can save tensorflow models into frozen pb formats, and use the following nn-
 
 ```bash
 # for Tensorflow (*.pb) file
-nn-meter --predictor <hardware> --tensorflow <pb-file> 
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> 
 ```
 
 For the other frameworks (e.g., PyTorch), you can convert the models into onnx models, and use the following nn-meter command to predict the latency:
 
 ```bash
 # for ONNX (*.onnx) file
-nn-meter --predictor <hardware> --onnx <onnx-file>
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>
 ```
 
-You can download the test [tensorflow models]("https://github.com/Lynazhang/nnmeter/releases/download/0.1/pb_models.zip") and [onnx models](https://github.com/Lynazhang/nnmeter/releases/download/0.1/onnx_models.zip). 
+You can download the test [tensorflow models]("https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/pb_models.zip") and [onnx models](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/onnx_models.zip). 
 
 ### Input model as a code object
 
@@ -29,7 +29,7 @@ You can also directly apply nn-Meter in your python code. In this case, please d
 ```python
 from nn_meter import load_latency_predictor
 
-predictor = load_lat_predictor(hardware_name) # case insensitive in backend
+predictor = load_latency_predictor(hardware_name) # case insensitive in backend
 
 # build your model here
 model = ... # model is instance of torch.nn.Module
@@ -57,14 +57,14 @@ For a *node*, we use the identical node name ("conv1.conv/Conv2D") as the node k
 * outbounds: a list of outgoing node names. The inbounds and outbounds describe the node connections.
 * attr: a set of attributes for the node. The attributes can be different for different types of NN node.
 
-You can download the example nn-Meter IR graphs through [here](https://github.com/Lynazhang/nnmeter/releases/download/0.1/ir_graphs.zip).
+You can download the example nn-Meter IR graphs through [here](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/ir_graphs.zip).
 
 When you have a large amount of models to predict, you can also convert them into nn-Meter IR graphs to save the pre-processing time:
 
 ```
 # for Tensorflow (*.pb) file
-nn-meter getir --tensorflow <pb-file> --output <output-name>
+nn-meter get_ir --tensorflow <pb-file> [--output <output-name>]
 
 # for ONNX (*.onnx) file
-nn-meter getir --onnx <onnx-file> --output <output-name>
+nn-meter get_ir --onnx <onnx-file> [--output <output-name>]
 ```
diff --git a/docs/overview.md b/docs/overview.md
index 983abbed..2ee6020b 100644
--- a/docs/overview.md
+++ b/docs/overview.md
@@ -19,6 +19,8 @@ If you have a new hardware to predict DNN latency,  a re-run of nn-Meter is requ
 ## Learn More
 - [Get started](quick_start.md)
 
-- [How to use nn-Meter](usage.md)
+- [How to use nn-Meter Predictor](predictor/usage.md)
 
-- [nn-meter in hardware-aware NAS](hardware-aware-model-design.md)
\ No newline at end of file
+- [nn-Meter in hardware-aware NAS](predictor/hardware-aware-model-design.md)
+
+- [nn-Meter bench dataset](dataset.md)
\ No newline at end of file
diff --git a/docs/hardware-aware-model-design.md b/docs/predictor/hardware-aware-model-design.md
similarity index 99%
rename from docs/hardware-aware-model-design.md
rename to docs/predictor/hardware-aware-model-design.md
index 2fda6b59..ea4b2e2e 100644
--- a/docs/hardware-aware-model-design.md
+++ b/docs/predictor/hardware-aware-model-design.md
@@ -1,39 +1,39 @@
-# Hardware-aware DNN Model Design
-
-In many DNN model deployment scenarios, there are strict inference efficiency constraints as well as the model accuracy. For example, the **inference latency** and **energy consumption** are the most frequently used criteria of efficiencies to determine whether a DNN model could be deployed on a mobile phone or not. Therefore, DNN model designers have to consider the model efficiency. A typical methodology is to train a big model to meet the accuracy requirements first, and then apply model compression algorithms to get a light-weight model with similar accuracy but much smaller size. Due to many reasons, they use the number of parameters and FLOPs in the compression process.
-
-However, as pointed out in our work [[1]](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf) and many others, ***neither number of parameters nor number of FLOPs is a good metric of the real inference efficiency (e.g., latency or energy consumption)***. Operators with similar FLOPs may have very different inference latency on different hardware platforms (e.g., CPU, GPU, and ASIC) (shown in work [[1]](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf) and [[3]](https://proceedings.mlsys.org/paper/2021/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf)). This makes the effort of designing efficient DNN models for a target hardware bit of games of opening blind boxes. Recently, many hardware-aware NAS works are proposed to solve this challenge.
-
-Compared with the conventional NAS algorithms, some recent works (i.e. hardware-aware NAS, aka HW-NAS) integrated hardware-awareness into the search loop and achieves a balanced trade-off between accuracy and hardware efficiencies [[4]](http://arxiv.org/abs/2101.09336).
-
-Next, we introduce our hardware-aware NAS framework[[1]](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf), which combines the nn-Meter, to search high-accuracy DNN models within the latency constraints for target edge devices.
-
-## Hardware-aware Neural Architecture Search
-
-<img src="imgs/hw-nas.png" alt="drawing" width="800"/>
-
-**Hardware-aware Search Space Generation.** As formulated in many works, the search space is one of the three key aspects of a NAS process (the other two are the search strategy and the evaluation methodology) and matters a lot to the final results.
-
-Our HW-NAS framework firstly automatically selects the hardware-friendly operators (or blocks) by considering both representation capacity and hardware efficiency. The selected operators could establish a ***hardware-aware search space*** for most of existing NAS algorithms.
-
-**Latency Prediction in search process by nn-Meter.** Different with other simple predictors (e.g., look-up table for operators/blocks, linear regression models), [nn-Meter](overview.md) conducts kernel-level prediction, which captures the complex model graph optimizations on edge devices. nn-Meter is the first accurate latency prediction tool for DNNs on edge devices.
-
-Besides the search space specialization, our HW-NAS framework also allows combining nn-Meter with existing NAS algorithms in the optimization objectives and constraints. As described in [[4]](http://arxiv.org/abs/2101.09336), the HW-NAS algorithms often consider hardware efficiency metrics as the constraints of existing NAS formulation or part of the scalarized loss functions (e.g., the loss is weighted sum of both cross entropy loss and hardware-aware penalty). Since the NAS process may sample up to millions of candidate model architectures, the obtaining of hardware metrics must be accurate and efficient.
-
-nn-Meter is now integrated with [NNI](https://github.com/microsoft/nni), the AutoML framework also published by Microsoft, and could be combined with existing NAS algorithms seamlessly. [This doc](https://nni.readthedocs.io/en/stable/NAS/HardwareAwareNAS.html#endtoend-multi-trial-spos-demo) show how to construct a latency constraint filter in [random search algorithm](https://arxiv.org/abs/1902.07638) on [SPOS NAS](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123610528.pdf) search space. Users could use this filter in multiple phases of the NAS process, e.g., the architecture searching phase and the super-net training phase. 
-
-Another example is [ProxylessNAS](https://arxiv.org/pdf/1812.00332.pdf), a hardware-aware one-shot NAS algorithm. ProxylessNAS applies the expected latency of the model to build a differentiable metric and design efficient neural network architectures for hardware. The latency loss is added as a regularization term for architecture parameter optimization. The [official implementation](https://github.com/mit-han-lab/ProxylessNAS) of ProxylessNAS supports different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In the [current implementation](https://nni.readthedocs.io/en/stable/NAS/Proxylessnas.html) on NNI, users could use a latency estimator based on nn-Meter to predict expected latency for the mixed operation on other types of mobile and edge hardware. By using nn-Meter, please specifying the arguments of `--applied_hardware <hardware> --reference_latency <reference latency (ms)>` in the [example](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/proxylessnas/main.py).
-
-***Note that current nn-Meter project is limited to the latency prediction. For the other hardware metrics, e.g., energy consumption is another important metric in edge computing. Collaborations and contributions together with nn-Meter are highly welcomed!***
-
-## Other hardware-aware techniques
-
-Besides light weighted NAS, which search for an efficient architecture directly, there are also other techniques to achieve light weight DNN models, such as model compression and knowledge distillation (KD). Both methods tries to get a smaller but similar-performed models from a pre-trained big model. The difference is that model compression removes some of the components in the origin model, while knowledge distillation constructs a new student model and lets it learn the behavior of the origin model. Hardware awareness could also be combined with these methods.
-For example, nn-Meter could help users to construct suitable student architectures for the target hardware platform in the KD task.
-
-## References
-
-1. Li Lyna Zhang, Yuqing Yang, Yuhang Jiang, Wenwu Zhu, Yunxin Liu: [&#34;Fast hardware-aware neural architecture search.&#34;](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf) Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020.
-2. Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, Yunxin Liu: [&#34;nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices.&#34;](https://dl.acm.org/doi/10.1145/3458864.3467882) Proceedings of the 19th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2021)
-3. Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, Yunxin Liu: [&#34;To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks&#34;](https://proceedings.mlsys.org/paper/2021/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf) Proceedings of the 4th MLSys Conference (MLSys 2021)
-4. Benmeziane, H., Maghraoui, K. el, Ouarnoughi, H., Niar, S., Wistuba, M., & Wang, N. (2021).[&#34; A Comprehensive Survey on Hardware-Aware Neural Architecture Search.&#34;](http://arxiv.org/abs/2101.09336)
+# Hardware-aware DNN Model Design
+
+In many DNN model deployment scenarios, there are strict inference efficiency constraints as well as the model accuracy. For example, the **inference latency** and **energy consumption** are the most frequently used criteria of efficiencies to determine whether a DNN model could be deployed on a mobile phone or not. Therefore, DNN model designers have to consider the model efficiency. A typical methodology is to train a big model to meet the accuracy requirements first, and then apply model compression algorithms to get a light-weight model with similar accuracy but much smaller size. Due to many reasons, they use the number of parameters and FLOPs in the compression process.
+
+However, as pointed out in our work [[1]](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf) and many others, ***neither number of parameters nor number of FLOPs is a good metric of the real inference efficiency (e.g., latency or energy consumption)***. Operators with similar FLOPs may have very different inference latency on different hardware platforms (e.g., CPU, GPU, and ASIC) (shown in work [[1]](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf) and [[3]](https://proceedings.mlsys.org/paper/2021/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf)). This makes the effort of designing efficient DNN models for a target hardware bit of games of opening blind boxes. Recently, many hardware-aware NAS works are proposed to solve this challenge.
+
+Compared with the conventional NAS algorithms, some recent works (i.e. hardware-aware NAS, aka HW-NAS) integrated hardware-awareness into the search loop and achieves a balanced trade-off between accuracy and hardware efficiencies [[4]](http://arxiv.org/abs/2101.09336).
+
+Next, we introduce our hardware-aware NAS framework[[1]](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf), which combines the nn-Meter, to search high-accuracy DNN models within the latency constraints for target edge devices.
+
+## Hardware-aware Neural Architecture Search
+
+<img src="imgs/hw-nas.png" alt="drawing" width="800"/>
+
+**Hardware-aware Search Space Generation.** As formulated in many works, the search space is one of the three key aspects of a NAS process (the other two are the search strategy and the evaluation methodology) and matters a lot to the final results.
+
+Our HW-NAS framework firstly automatically selects the hardware-friendly operators (or blocks) by considering both representation capacity and hardware efficiency. The selected operators could establish a ***hardware-aware search space*** for most of existing NAS algorithms.
+
+**Latency Prediction in search process by nn-Meter.** Different with other simple predictors (e.g., look-up table for operators/blocks, linear regression models), [nn-Meter](overview.md) conducts kernel-level prediction, which captures the complex model graph optimizations on edge devices. nn-Meter is the first accurate latency prediction tool for DNNs on edge devices.
+
+Besides the search space specialization, our HW-NAS framework also allows combining nn-Meter with existing NAS algorithms in the optimization objectives and constraints. As described in [[4]](http://arxiv.org/abs/2101.09336), the HW-NAS algorithms often consider hardware efficiency metrics as the constraints of existing NAS formulation or part of the scalarized loss functions (e.g., the loss is weighted sum of both cross entropy loss and hardware-aware penalty). Since the NAS process may sample up to millions of candidate model architectures, the obtaining of hardware metrics must be accurate and efficient.
+
+nn-Meter is now integrated with [NNI](https://github.com/microsoft/nni), the AutoML framework also published by Microsoft, and could be combined with existing NAS algorithms seamlessly. [This doc](https://nni.readthedocs.io/en/stable/NAS/HardwareAwareNAS.html#endtoend-multi-trial-spos-demo) show how to construct a latency constraint filter in [random search algorithm](https://arxiv.org/abs/1902.07638) on [SPOS NAS](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123610528.pdf) search space. Users could use this filter in multiple phases of the NAS process, e.g., the architecture searching phase and the super-net training phase. 
+
+Another example is [ProxylessNAS](https://arxiv.org/pdf/1812.00332.pdf), a hardware-aware one-shot NAS algorithm. ProxylessNAS applies the expected latency of the model to build a differentiable metric and design efficient neural network architectures for hardware. The latency loss is added as a regularization term for architecture parameter optimization. The [official implementation](https://github.com/mit-han-lab/ProxylessNAS) of ProxylessNAS supports different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In the [current implementation](https://nni.readthedocs.io/en/stable/NAS/Proxylessnas.html) on NNI, users could use a latency estimator based on nn-Meter to predict expected latency for the mixed operation on other types of mobile and edge hardware. By using nn-Meter, please specifying the arguments of `--applied_hardware <hardware> --reference_latency <reference latency (ms)>` in the [example](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/proxylessnas/main.py).
+
+***Note that current nn-Meter project is limited to the latency prediction. For the other hardware metrics, e.g., energy consumption is another important metric in edge computing. Collaborations and contributions together with nn-Meter are highly welcomed!***
+
+## Other hardware-aware techniques
+
+Besides light weighted NAS, which search for an efficient architecture directly, there are also other techniques to achieve light weight DNN models, such as model compression and knowledge distillation (KD). Both methods tries to get a smaller but similar-performed models from a pre-trained big model. The difference is that model compression removes some of the components in the origin model, while knowledge distillation constructs a new student model and lets it learn the behavior of the origin model. Hardware awareness could also be combined with these methods.
+For example, nn-Meter could help users to construct suitable student architectures for the target hardware platform in the KD task.
+
+## References
+
+1. Li Lyna Zhang, Yuqing Yang, Yuhang Jiang, Wenwu Zhu, Yunxin Liu: [&#34;Fast hardware-aware neural architecture search.&#34;](https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Zhang_Fast_Hardware-Aware_Neural_Architecture_Search_CVPRW_2020_paper.pdf) Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020.
+2. Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, Yunxin Liu: [&#34;nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices.&#34;](https://dl.acm.org/doi/10.1145/3458864.3467882) Proceedings of the 19th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2021)
+3. Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, Yunxin Liu: [&#34;To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks&#34;](https://proceedings.mlsys.org/paper/2021/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf) Proceedings of the 4th MLSys Conference (MLSys 2021)
+4. Benmeziane, H., Maghraoui, K. el, Ouarnoughi, H., Niar, S., Wistuba, M., & Wang, N. (2021).[&#34; A Comprehensive Survey on Hardware-Aware Neural Architecture Search.&#34;](http://arxiv.org/abs/2101.09336)
diff --git a/docs/usage.md b/docs/predictor/usage.md
similarity index 93%
rename from docs/usage.md
rename to docs/predictor/usage.md
index 4b6fad7e..def7e76d 100644
--- a/docs/usage.md
+++ b/docs/predictor/usage.md
@@ -1,8 +1,8 @@
-# Usage
+# Usage of nn-Meter Predictor
 
 To apply for hardware latency prediction, nn-Meter provides two types of interfaces：
 
-- command line `nn-meter` after `nn-meter` [installation](QuickStart.md#Installation).
+- command line `nn-meter` after `nn-meter` [installation](../quick_start.md#Installation).
 - Python binding provided by the module `nn_meter`
 
 Here is a summary of supported inputs of the two methods.
@@ -12,7 +12,7 @@ Here is a summary of supported inputs of the two methods.
 |    Tensorflow    |         Checkpoint file dumped by `tf.saved_model()` and end with `.pb`         |                          Checkpoint file dumped by `tf.saved_model` and end with `.pb`                          |
 |       Torch       |                          Models in `torchvision.models`                          |                                            Object of `torch.nn.Module`                                            |
 |       Onnx       |           Checkpoint file dumped by `onnx.save()` and end with `.onnx`           |                    Checkpoint file dumped by `onnx.save()` or model loaded by `onnx.load()`                    |
-| nn-Meter IR graph | Json file in the format of [nn-Meter IR Graph](input_models.md#nnmeter-ir-graph) |          `dict` object following the format of [nn-Meter IR Graph](input_models.md#nnmeter-ir-graph)          |
+| nn-Meter IR graph | Json file in the format of [nn-Meter IR Graph](../input_models.md#nnmeter-ir-graph) |          `dict` object following the format of [nn-Meter IR Graph](../input_models.md#nnmeter-ir-graph)          |
 |   NNI IR graph   |                                          -                                          | NNI IR graph object |
 
 In both methods, users could appoint predictor name and version to target a specific hardware platform (device). Currently, nn-Meter supports prediction on the following four configs:
@@ -36,16 +36,16 @@ After installation, a command named `nn-meter` is enabled. To predict the latenc
 
 ```bash
 # for Tensorflow (*.pb) file
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> 
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> 
 
 # for ONNX (*.onnx) file
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>
 
 # for torch model from torchvision model zoo (str)
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --torchvision <model-name> <model-name>... 
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --torchvision <model-name> <model-name>... 
 
 # for nn-Meter IR (*.json) file
-nn-meter lat_pred --predictor <hardware> [--predictor-version <version>] --nn-meter-ir <json-file_or_folder> 
+nn-meter predict --predictor <hardware> [--predictor-version <version>] --nn-meter-ir <json-file_or_folder> 
 ```
 
 `--predictor-version <version>` arguments is optional. When the predictor version is not specified by users, nn-meter will use the latest version of the predictor.
diff --git a/docs/quick_start.md b/docs/quick_start.md
index 74970640..61a84c7b 100644
--- a/docs/quick_start.md
+++ b/docs/quick_start.md
@@ -28,7 +28,7 @@ nn-Meter is a latency predictor of models with type of Tensorflow, PyTorch, Onnx
 |    nn-Meter IR graph  |   ---                                                  |
 |      NNI IR graph     |  `nni>=2.4`                                            |
 
-[1] Please refer to [nn-Meter Usage](usage.md#torch-model-converters) for more information.
+[1] Please refer to [nn-Meter Usage](predictor/usage.md#torch-model-converters) for more information.
 
 Please also check the versions of `numpy` and `scikit_learn`. The different versions may change the prediction accuracy of kernel predictors.
 
@@ -59,4 +59,4 @@ if __name__ == '__main__':
     main()
 ```
 
-For more detailed usage of nn-Meter, please refer to [this doc](usage.md).
+For more detailed usage of nn-Meter, please refer to [this doc](predictor/usage.md).
diff --git a/docs/requirements.txt b/docs/requirements/requirements.txt
similarity index 100%
rename from docs/requirements.txt
rename to docs/requirements/requirements.txt
diff --git a/examples/README.md b/examples/README.md
index 41599e76..af15f6d4 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -2,26 +2,24 @@
 
 In this folder, we provide several examples to show the usage of nn-Meter package.
 
-The first example [1. Use nn-Meter for models with different format](nn-meter_for_different_model_format.ipynb) shows the basic python binding usage of nn-meter with models with different format of Tensorflow, PyTorch and ONNX model.
-
-#### Benchmark dataset
+The first example [1. Use nn-Meter Predictor for models with different format](nn-meter_predictor_for_different_model_format.ipynb) shows the basic python binding usage of nn-meter with models with different format of Tensorflow, PyTorch and ONNX model.
 
 To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. nn-Meter collects and generates 26k CNN models. (Please refer the paper for the dataset generation method.)
 
 We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own. 
 
-Example [2. Use nn-Meter with the bench dataset](nn-meter_for_bench_dataset.ipynb) shows how to use nn-Meter to predict latency for the bench dataset.
+Example [2. Use nn-Meter with the bench dataset](nn-meter_predictor_for_bench_dataset.ipynb) shows how to use nn-Meter to predict latency for the bench dataset.
 
-Since the dataset is encoded in a graph format, we also provide an example [3. Use bench dataset for GNN training](gnn_for_bench_dataset.ipynb) of using GCN to predict the model latency with the bench dataset.
+Since the dataset is encoded in a graph format, we also provide an example [3. Use nn-Meter bench dataset for GNN training](nn-meter_dataset_for_gnn.ipynb) of using GNN to predict the model latency with the bench dataset.
 
 Finally, we provide more hardware-ware NAS examples in NNI.
 
 ## Examples list
 
-1. [Use nn-Meter for models with different format](nn-meter_for_different_model_format.ipynb)
-2. [Use nn-Meter with the bench dataset](nn-meter_for_bench_dataset.ipynb)
-3. [Use bench dataset for GNN training](gnn_for_bench_dataset.ipynb)
-4. Use nn-Meter to construct latency constraint in SPOS NAS (TBD)
+1. [Use nn-Meter for models with different format](nn-meter_predictor_for_different_model_format.ipynb)
+2. [Use nn-Meter with the bench dataset](nn-meter_predictor_for_bench_dataset.ipynb)
+3. [Use bench dataset for GNN training](nn-meter_dataset_for_gnn.ipynb)
+4. Use nn-Meter to construct latency constraint in SPOS NAS
    - [Use nn-Meter in search part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/multi_trial.py)
-   - [Use nn-Meter in sampling part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/supernet.py)
-5. [Use nn-Meter to construct latency penalty in Proxyless NAS](https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/proxylessnas)
+   - [Use nn-Meter in sampling part (TBD)](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/supernet.py)
+5. [Use nn-Meter to construct latency penalty in ProxylessNAS](https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/proxylessnas)
diff --git a/examples/gnn_for_bench_dataset.ipynb b/examples/nn-meter_dataset_for_gnn.ipynb
similarity index 67%
rename from examples/gnn_for_bench_dataset.ipynb
rename to examples/nn-meter_dataset_for_gnn.ipynb
index db2f25cc..74ada8d4 100644
--- a/examples/gnn_for_bench_dataset.ipynb
+++ b/examples/nn-meter_dataset_for_gnn.ipynb
@@ -2,6 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "# Latency Dataset - GNN Model\n",
     "\n",
@@ -12,21 +13,30 @@
     "To better deal with the problems above, we give a GNN example with graph representation improved. We first build our GNN model, which is constructed based on GraphSAGE, and maxpooling is selected as out pooling method. Next, we will start training after the data is loaded. `GNNDataset` and `GNNDataloader` in `nn_meter/dataset/gnn_dataloader.py` build the model structure of the Dataset in `.jsonl` format into our required Dataset and Dataloader. \n",
     "\n",
     "Let's start our journey!"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Step 1: Build our GraphSAGE Model\n",
     "\n",
     "We built our model with the help of DGL library."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Using backend: pytorch\n"
+     ]
+    }
+   ],
    "source": [
     "import torch\n",
     "import torch.nn as nn\n",
@@ -72,52 +82,25 @@
     "            x = self.pooling(g, x)\n",
     "            x = self.fc1(x)\n",
     "            return self.fc(x)"
-   ],
-   "outputs": [
-    {
-     "output_type": "stream",
-     "name": "stderr",
-     "text": [
-      "Using backend: pytorch\n"
-     ]
-    }
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Step 2: Loading Data.\n",
     "\n",
     "Next, we will finish loading the data and learn about the size of the Training and Testing datasets."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "source": [
-    "import os\r\n",
-    "from nn_meter.dataset import gnn_dataloader\r\n",
-    "\r\n",
-    "target_device = \"cortexA76cpu_tflite21\"\r\n",
-    "\r\n",
-    "print(\"Processing Training Set.\")\r\n",
-    "train_set = gnn_dataloader.GNNDataset(train=True, device=target_device) \r\n",
-    "print(\"Processing Testing Set.\")\r\n",
-    "test_set = gnn_dataloader.GNNDataset(train=False, device=target_device)\r\n",
-    "\r\n",
-    "train_loader = gnn_dataloader.GNNDataloader(train_set, batchsize=1 , shuffle=True)\r\n",
-    "test_loader = gnn_dataloader.GNNDataloader(test_set, batchsize=1, shuffle=False)\r\n",
-    "print('Train Dataset Size:', len(train_set))\r\n",
-    "print('Testing Dataset Size:', len(test_set))\r\n",
-    "print('Attribute tensor shape:', next(train_loader)[1].ndata['h'].size(1))\r\n",
-    "ATTR_COUNT = next(train_loader)[1].ndata['h'].size(1)"
-   ],
+   "metadata": {},
    "outputs": [
     {
-     "output_type": "stream",
      "name": "stdout",
+     "output_type": "stream",
      "text": [
       "Processing Training Set.\n",
       "Processing Testing Set.\n",
@@ -127,101 +110,42 @@
      ]
     }
    ],
-   "metadata": {}
+   "source": [
+    "import os\n",
+    "from nn_meter.dataset import gnn_dataloader\n",
+    "\n",
+    "target_device = \"cortexA76cpu_tflite21\"\n",
+    "\n",
+    "print(\"Processing Training Set.\")\n",
+    "train_set = gnn_dataloader.GNNDataset(train=True, device=target_device) \n",
+    "print(\"Processing Testing Set.\")\n",
+    "test_set = gnn_dataloader.GNNDataset(train=False, device=target_device)\n",
+    "\n",
+    "train_loader = gnn_dataloader.GNNDataloader(train_set, batchsize=1 , shuffle=True)\n",
+    "test_loader = gnn_dataloader.GNNDataloader(test_set, batchsize=1, shuffle=False)\n",
+    "print('Train Dataset Size:', len(train_set))\n",
+    "print('Testing Dataset Size:', len(test_set))\n",
+    "print('Attribute tensor shape:', next(train_loader)[1].ndata['h'].size(1))\n",
+    "ATTR_COUNT = next(train_loader)[1].ndata['h'].size(1)"
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Step 3: Run and Test\n",
     "\n",
     "We can run the model and evaluate it now!"
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "source": [
-    "if torch.cuda.is_available():\r\n",
-    "    print(\"Using CUDA.\")\r\n",
-    "# device = \"cpu\"\r\n",
-    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\r\n",
-    "\r\n",
-    "# Start Training\r\n",
-    "load_model = False\r\n",
-    "if load_model:\r\n",
-    "    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\r\n",
-    "    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\r\n",
-    "    checkpoint = torch.load('LatencyGNN.pt')\r\n",
-    "    model.load_state_dict(checkpoint['model_state_dict'])\r\n",
-    "    opt.load_state_dict(checkpoint['optimizer_state_dict'])\r\n",
-    "    # EPOCHS = checkpoint['epoch']\r\n",
-    "    EPOCHS = 0\r\n",
-    "    loss_func = checkpoint['loss']\r\n",
-    "else:\r\n",
-    "    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\r\n",
-    "    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\r\n",
-    "    EPOCHS=20\r\n",
-    "    loss_func = nn.L1Loss()\r\n",
-    "\r\n",
-    "lr_scheduler = CosineAnnealingLR(opt, T_max=EPOCHS)\r\n",
-    "loss_sum = 0\r\n",
-    "for epoch in range(EPOCHS):\r\n",
-    "    train_length = len(train_set)\r\n",
-    "    tran_acc_ten = 0\r\n",
-    "    loss_sum = 0 \r\n",
-    "    # latency, graph, types, flops\r\n",
-    "    for batched_l, batched_g in train_loader:\r\n",
-    "        opt.zero_grad()\r\n",
-    "        batched_l = batched_l.to(device).float()\r\n",
-    "        batched_g = batched_g.to(device)\r\n",
-    "        batched_f = batched_g.ndata['h'].float()\r\n",
-    "        logits = model(batched_g, batched_f)\r\n",
-    "        for i in range(len(batched_l)):\r\n",
-    "            pred_latency = logits[i].item()\r\n",
-    "            prec_latency = batched_l[i].item()\r\n",
-    "            if (pred_latency >= 0.9 * prec_latency) and (pred_latency <= 1.1 * prec_latency):\r\n",
-    "                tran_acc_ten += 1\r\n",
-    "        # print(\"true latency: \", batched_l)\r\n",
-    "        # print(\"Predict latency: \", logits)\r\n",
-    "        batched_l = torch.reshape(batched_l, (-1 ,1))\r\n",
-    "        loss = loss_func(logits, batched_l)\r\n",
-    "        loss_sum += loss\r\n",
-    "        loss.backward()\r\n",
-    "        opt.step()\r\n",
-    "    lr_scheduler.step()\r\n",
-    "    print(\"[Epoch \", epoch, \"]: \", \"Training accuracy within 10%: \", tran_acc_ten / train_length * 100, \" %.\")\r\n",
-    "    # print('Learning Rate:', lr_scheduler.get_last_lr())\r\n",
-    "    # print('Loss:', loss_sum / train_length)\r\n",
-    "\r\n",
-    "# Save The Best Model\r\n",
-    "torch.save({\r\n",
-    "    'epoch': EPOCHS,\r\n",
-    "    'model_state_dict': model.state_dict(),\r\n",
-    "    'optimizer_state_dict': opt.state_dict(),\r\n",
-    "    'loss': loss_func,\r\n",
-    "}, 'LatencyGNN.pt')\r\n",
-    "\r\n",
-    "# Start Testing\r\n",
-    "count = 0\r\n",
-    "with torch.no_grad():\r\n",
-    "    test_length = len(test_set)\r\n",
-    "    test_acc_ten = 0\r\n",
-    "    for batched_l, batched_g in test_loader:\r\n",
-    "        batched_l = batched_l.to(device).float()\r\n",
-    "        batched_g = batched_g.to(device)\r\n",
-    "        batched_f = batched_g.ndata['h'].float()\r\n",
-    "        result = model(batched_g, batched_f)\r\n",
-    "        if (result.item() >= 0.9 * batched_l.item()) and (result.item() <= 1.1 * batched_l.item()):\r\n",
-    "            test_acc_ten += 1\r\n",
-    "        acc = (abs(result.item() - batched_l.item()) / batched_l.item()) * 100\r\n",
-    "        count += 1\r\n",
-    "    print(\"Testing accuracy within 10%: \", test_acc_ten / test_length * 100, \" %.\")"
-   ],
+   "metadata": {},
    "outputs": [
     {
-     "output_type": "stream",
      "name": "stdout",
+     "output_type": "stream",
      "text": [
       "[Epoch  0 ]:  Training accuracy within 10%:  21.999807061547365  %.\n",
       "[Epoch  1 ]:  Training accuracy within 10%:  27.725255643449742  %.\n",
@@ -247,7 +171,83 @@
      ]
     }
    ],
-   "metadata": {}
+   "source": [
+    "if torch.cuda.is_available():\n",
+    "    print(\"Using CUDA.\")\n",
+    "# device = \"cpu\"\n",
+    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# Start Training\n",
+    "load_model = False\n",
+    "if load_model:\n",
+    "    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\n",
+    "    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\n",
+    "    checkpoint = torch.load('LatencyGNN.pt')\n",
+    "    model.load_state_dict(checkpoint['model_state_dict'])\n",
+    "    opt.load_state_dict(checkpoint['optimizer_state_dict'])\n",
+    "    # EPOCHS = checkpoint['epoch']\n",
+    "    EPOCHS = 0\n",
+    "    loss_func = checkpoint['loss']\n",
+    "else:\n",
+    "    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\n",
+    "    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\n",
+    "    EPOCHS=20\n",
+    "    loss_func = nn.L1Loss()\n",
+    "\n",
+    "lr_scheduler = CosineAnnealingLR(opt, T_max=EPOCHS)\n",
+    "loss_sum = 0\n",
+    "for epoch in range(EPOCHS):\n",
+    "    train_length = len(train_set)\n",
+    "    tran_acc_ten = 0\n",
+    "    loss_sum = 0 \n",
+    "    # latency, graph, types, flops\n",
+    "    for batched_l, batched_g in train_loader:\n",
+    "        opt.zero_grad()\n",
+    "        batched_l = batched_l.to(device).float()\n",
+    "        batched_g = batched_g.to(device)\n",
+    "        batched_f = batched_g.ndata['h'].float()\n",
+    "        logits = model(batched_g, batched_f)\n",
+    "        for i in range(len(batched_l)):\n",
+    "            pred_latency = logits[i].item()\n",
+    "            prec_latency = batched_l[i].item()\n",
+    "            if (pred_latency >= 0.9 * prec_latency) and (pred_latency <= 1.1 * prec_latency):\n",
+    "                tran_acc_ten += 1\n",
+    "        # print(\"true latency: \", batched_l)\n",
+    "        # print(\"Predict latency: \", logits)\n",
+    "        batched_l = torch.reshape(batched_l, (-1 ,1))\n",
+    "        loss = loss_func(logits, batched_l)\n",
+    "        loss_sum += loss\n",
+    "        loss.backward()\n",
+    "        opt.step()\n",
+    "    lr_scheduler.step()\n",
+    "    print(\"[Epoch \", epoch, \"]: \", \"Training accuracy within 10%: \", tran_acc_ten / train_length * 100, \" %.\")\n",
+    "    # print('Learning Rate:', lr_scheduler.get_last_lr())\n",
+    "    # print('Loss:', loss_sum / train_length)\n",
+    "\n",
+    "# Save The Best Model\n",
+    "torch.save({\n",
+    "    'epoch': EPOCHS,\n",
+    "    'model_state_dict': model.state_dict(),\n",
+    "    'optimizer_state_dict': opt.state_dict(),\n",
+    "    'loss': loss_func,\n",
+    "}, 'LatencyGNN.pt')\n",
+    "\n",
+    "# Start Testing\n",
+    "count = 0\n",
+    "with torch.no_grad():\n",
+    "    test_length = len(test_set)\n",
+    "    test_acc_ten = 0\n",
+    "    for batched_l, batched_g in test_loader:\n",
+    "        batched_l = batched_l.to(device).float()\n",
+    "        batched_g = batched_g.to(device)\n",
+    "        batched_f = batched_g.ndata['h'].float()\n",
+    "        result = model(batched_g, batched_f)\n",
+    "        if (result.item() >= 0.9 * batched_l.item()) and (result.item() <= 1.1 * batched_l.item()):\n",
+    "            test_acc_ten += 1\n",
+    "        acc = (abs(result.item() - batched_l.item()) / batched_l.item()) * 100\n",
+    "        count += 1\n",
+    "    print(\"Testing accuracy within 10%: \", test_acc_ten / test_length * 100, \" %.\")"
+   ]
   }
  ],
  "metadata": {
@@ -269,9 +269,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.8"
+   "version": "3.6.10"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
\ No newline at end of file
+}
diff --git a/examples/nn-meter_for_bench_dataset.ipynb b/examples/nn-meter_predictor_for_bench_dataset.ipynb
similarity index 100%
rename from examples/nn-meter_for_bench_dataset.ipynb
rename to examples/nn-meter_predictor_for_bench_dataset.ipynb
diff --git a/examples/nn-meter_for_different_model_format.ipynb b/examples/nn-meter_predictor_for_different_model_format.ipynb
similarity index 100%
rename from examples/nn-meter_for_different_model_format.ipynb
rename to examples/nn-meter_predictor_for_different_model_format.ipynb
diff --git a/nn_meter/__init__.py b/nn_meter/__init__.py
index 56e3dfa2..48740fba 100644
--- a/nn_meter/__init__.py
+++ b/nn_meter/__init__.py
@@ -7,20 +7,26 @@
 except ModuleNotFoundError:
     __version__ = 'UNKNOWN'
 
-from .nn_meter import (
-    nnMeter,
+import logging
+from functools import partial, partialmethod
+
+from .predictor import (
+    nnMeterPredictor,
     load_latency_predictor,
     list_latency_predictors,
+    latency_metrics
+)
+from .ir_converter import (
     model_file_to_graph,
-    model_to_graph,
+    model_to_graph
+)
+from .utils import (
     create_user_configs,
     change_user_data_folder
 )
-from .utils.utils import download_from_url
-from .prediction import latency_metrics
 from .dataset import bench_dataset
-import logging
-from functools import partial, partialmethod
+from .utils import download_from_url
+
 
 logging.KEYINFO = 22
 logging.addLevelName(logging.KEYINFO, 'KEYINFO')
diff --git a/nn_meter/dataset/bench_dataset.py b/nn_meter/dataset/bench_dataset.py
index 1c034b4e..428fe780 100644
--- a/nn_meter/dataset/bench_dataset.py
+++ b/nn_meter/dataset/bench_dataset.py
@@ -1,13 +1,12 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
 import os, sys
-from nn_meter.prediction import latency_metrics
+import logging
+import jsonlines
 from glob import glob
 
-from nn_meter.nn_meter import list_latency_predictors, load_latency_predictor, get_user_data_folder
-from nn_meter import download_from_url
-import jsonlines
-import logging
+from nn_meter.predictor import latency_metrics, list_latency_predictors, load_latency_predictor
+from nn_meter.utils import download_from_url, get_user_data_folder
 
 
 __user_dataset_folder__ = os.path.join(get_user_data_folder(), 'dataset')
diff --git a/nn_meter/dataset/gnn_dataloader.py b/nn_meter/dataset/gnn_dataloader.py
index 3a541149..81ab1f3f 100644
--- a/nn_meter/dataset/gnn_dataloader.py
+++ b/nn_meter/dataset/gnn_dataloader.py
@@ -1,16 +1,18 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-import torch
-import jsonlines
 import os
 import random
+import torch
+import jsonlines
 from .bench_dataset import bench_dataset
-from nn_meter.nn_meter import get_user_data_folder
-from nn_meter.utils.utils import try_import_dgl
+from nn_meter.utils import get_user_data_folder
+from nn_meter.utils.import_package import try_import_dgl
+
 
 RAW_DATA_URL = "https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip"
 __user_dataset_folder__ = os.path.join(get_user_data_folder(), 'dataset')
 
+
 hws = [
     "cortexA76cpu_tflite21",
     "adreno640gpu_tflite21",
diff --git a/nn_meter/ir_converters/__init__.py b/nn_meter/ir_converter/__init__.py
similarity index 100%
rename from nn_meter/ir_converters/__init__.py
rename to nn_meter/ir_converter/__init__.py
diff --git a/nn_meter/ir_converters/frozenpb_converter/__init__.py b/nn_meter/ir_converter/frozenpb_converter/__init__.py
similarity index 100%
rename from nn_meter/ir_converters/frozenpb_converter/__init__.py
rename to nn_meter/ir_converter/frozenpb_converter/__init__.py
diff --git a/nn_meter/ir_converters/frozenpb_converter/frozenpb_converter.py b/nn_meter/ir_converter/frozenpb_converter/frozenpb_converter.py
similarity index 100%
rename from nn_meter/ir_converters/frozenpb_converter/frozenpb_converter.py
rename to nn_meter/ir_converter/frozenpb_converter/frozenpb_converter.py
index aac61a4e..34950490 100644
--- a/nn_meter/ir_converters/frozenpb_converter/frozenpb_converter.py
+++ b/nn_meter/ir_converter/frozenpb_converter/frozenpb_converter.py
@@ -2,10 +2,10 @@
 # Licensed under the MIT license.
 import numpy as np
 
-from nn_meter.utils.graph_tool import ModelGraph
 from .frozenpb_parser import FrozenPbParser
 from .shape_inference import ShapeInference
 from .shape_fetcher import ShapeFetcher
+from nn_meter.utils.graph_tool import ModelGraph
 
 class FrozenPbConverter:
     def __init__(self, file_name):
diff --git a/nn_meter/ir_converters/frozenpb_converter/frozenpb_parser.py b/nn_meter/ir_converter/frozenpb_converter/frozenpb_parser.py
similarity index 99%
rename from nn_meter/ir_converters/frozenpb_converter/frozenpb_parser.py
rename to nn_meter/ir_converter/frozenpb_converter/frozenpb_parser.py
index 913a0510..479522c3 100644
--- a/nn_meter/ir_converters/frozenpb_converter/frozenpb_parser.py
+++ b/nn_meter/ir_converter/frozenpb_converter/frozenpb_parser.py
@@ -1,12 +1,13 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from nn_meter.utils.utils import try_import_tensorflow
-from .protobuf_helper import ProtobufHelper
-from .shape_fetcher import ShapeFetcher
-import copy
 import re
+import copy
 import logging
 
+from .protobuf_helper import ProtobufHelper
+from nn_meter.utils.import_package import try_import_tensorflow
+
+
 logging = logging.getLogger(__name__)
 
 
diff --git a/nn_meter/ir_converters/frozenpb_converter/protobuf_helper.py b/nn_meter/ir_converter/frozenpb_converter/protobuf_helper.py
similarity index 100%
rename from nn_meter/ir_converters/frozenpb_converter/protobuf_helper.py
rename to nn_meter/ir_converter/frozenpb_converter/protobuf_helper.py
diff --git a/nn_meter/ir_converters/frozenpb_converter/shape_fetcher.py b/nn_meter/ir_converter/frozenpb_converter/shape_fetcher.py
similarity index 97%
rename from nn_meter/ir_converters/frozenpb_converter/shape_fetcher.py
rename to nn_meter/ir_converter/frozenpb_converter/shape_fetcher.py
index 9c2f594b..a2f2025a 100644
--- a/nn_meter/ir_converters/frozenpb_converter/shape_fetcher.py
+++ b/nn_meter/ir_converter/frozenpb_converter/shape_fetcher.py
@@ -1,9 +1,8 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from nn_meter.utils.utils import try_import_tensorflow
 import numpy as np
 from typing import List
-
+from nn_meter.utils.import_package import try_import_tensorflow
 
 class ShapeFetcher:
     def __init__(self, input_graph):
diff --git a/nn_meter/ir_converters/frozenpb_converter/shape_inference.py b/nn_meter/ir_converter/frozenpb_converter/shape_inference.py
similarity index 100%
rename from nn_meter/ir_converters/frozenpb_converter/shape_inference.py
rename to nn_meter/ir_converter/frozenpb_converter/shape_inference.py
index f3b3dae0..da05d208 100644
--- a/nn_meter/ir_converters/frozenpb_converter/shape_inference.py
+++ b/nn_meter/ir_converter/frozenpb_converter/shape_inference.py
@@ -1,10 +1,10 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from .protobuf_helper import ProtobufHelper as ph
 from functools import reduce
 import copy
 import math
 import logging
+from .protobuf_helper import ProtobufHelper as ph
 
 
 logging = logging.getLogger(__name__)
diff --git a/nn_meter/ir_converters/onnx_converter/__init__.py b/nn_meter/ir_converter/onnx_converter/__init__.py
similarity index 100%
rename from nn_meter/ir_converters/onnx_converter/__init__.py
rename to nn_meter/ir_converter/onnx_converter/__init__.py
diff --git a/nn_meter/ir_converters/onnx_converter/constants.py b/nn_meter/ir_converter/onnx_converter/constants.py
similarity index 100%
rename from nn_meter/ir_converters/onnx_converter/constants.py
rename to nn_meter/ir_converter/onnx_converter/constants.py
diff --git a/nn_meter/ir_converters/onnx_converter/converter.py b/nn_meter/ir_converter/onnx_converter/converter.py
similarity index 98%
rename from nn_meter/ir_converters/onnx_converter/converter.py
rename to nn_meter/ir_converter/onnx_converter/converter.py
index d4d214f9..04786f89 100644
--- a/nn_meter/ir_converters/onnx_converter/converter.py
+++ b/nn_meter/ir_converter/onnx_converter/converter.py
@@ -1,11 +1,11 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from nn_meter.utils.utils import try_import_onnx
+import logging
 import networkx as nx
+from itertools import chain
 from .utils import get_tensor_shape
 from .constants import SLICE_TYPE
-from itertools import chain
-import logging
+from nn_meter.utils.import_package import try_import_onnx
 
 
 class OnnxConverter:
diff --git a/nn_meter/ir_converters/onnx_converter/utils.py b/nn_meter/ir_converter/onnx_converter/utils.py
similarity index 99%
rename from nn_meter/ir_converters/onnx_converter/utils.py
rename to nn_meter/ir_converter/onnx_converter/utils.py
index 9abe0452..172cc82b 100644
--- a/nn_meter/ir_converters/onnx_converter/utils.py
+++ b/nn_meter/ir_converter/onnx_converter/utils.py
@@ -1,7 +1,5 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-
-
 def get_tensor_shape(tensor):
     shape = []
     for dim in tensor.type.tensor_type.shape.dim:
diff --git a/nn_meter/ir_converters/torch_converter/__init__.py b/nn_meter/ir_converter/torch_converter/__init__.py
similarity index 100%
rename from nn_meter/ir_converters/torch_converter/__init__.py
rename to nn_meter/ir_converter/torch_converter/__init__.py
diff --git a/nn_meter/ir_converters/torch_converter/converter.py b/nn_meter/ir_converter/torch_converter/converter.py
similarity index 96%
rename from nn_meter/ir_converters/torch_converter/converter.py
rename to nn_meter/ir_converter/torch_converter/converter.py
index 451e4a67..f595985b 100644
--- a/nn_meter/ir_converters/torch_converter/converter.py
+++ b/nn_meter/ir_converter/torch_converter/converter.py
@@ -1,11 +1,9 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from nn_meter.utils.utils import try_import_onnx, try_import_torch, try_import_onnxsim, try_import_nni
 import tempfile
-from nn_meter.ir_converters.onnx_converter import OnnxConverter
-
-
+from ..onnx_converter import OnnxConverter
 from .opset_map import nni_attr_map, nni_type_map
+from nn_meter.utils.import_package import try_import_onnx, try_import_torch, try_import_onnxsim, try_import_nni
 
 
 def _nchw_to_nhwc(shapes):
diff --git a/nn_meter/ir_converters/torch_converter/opset_map.py b/nn_meter/ir_converter/torch_converter/opset_map.py
similarity index 100%
rename from nn_meter/ir_converters/torch_converter/opset_map.py
rename to nn_meter/ir_converter/torch_converter/opset_map.py
diff --git a/nn_meter/ir_converters/utils.py b/nn_meter/ir_converter/utils.py
similarity index 98%
rename from nn_meter/ir_converters/utils.py
rename to nn_meter/ir_converter/utils.py
index 67055bb7..ecfe49f4 100644
--- a/nn_meter/ir_converters/utils.py
+++ b/nn_meter/ir_converter/utils.py
@@ -2,11 +2,12 @@
 # Licensed under the MIT license.
 import json
 import logging
-from nn_meter.utils.utils import try_import_onnx, try_import_torch, try_import_torchvision_models
+from nn_meter.utils.import_package import try_import_onnx, try_import_torch, try_import_torchvision_models
 from .onnx_converter import OnnxConverter
 from .frozenpb_converter import FrozenPbConverter
 from .torch_converter import NNIBasedTorchConverter, OnnxBasedTorchConverter, NNIIRConverter
 
+
 def model_file_to_graph(filename: str, model_type: str, input_shape=(1, 3, 224, 224), apply_nni=False):
     """
     read the given file and convert the model in the file content to nn-Meter IR graph object 
@@ -106,10 +107,12 @@ def onnx_model_to_graph(model):
     converter = OnnxConverter(model)
     return converter.convert()
 
+
 def nni_model_to_graph(model):
     converter = NNIIRConverter(model)
     return converter.convert()
 
+
 def torch_model_to_graph(model, input_shape=(1, 3, 224, 224), apply_nni=False):
     torch = try_import_torch()
     args = torch.randn(*input_shape)
diff --git a/nn_meter/kerneldetection/utils/__init__.py b/nn_meter/kernel_detector/__init__.py
similarity index 62%
rename from nn_meter/kerneldetection/utils/__init__.py
rename to nn_meter/kernel_detector/__init__.py
index 9a045456..bd89c7e5 100644
--- a/nn_meter/kerneldetection/utils/__init__.py
+++ b/nn_meter/kernel_detector/__init__.py
@@ -1,2 +1,3 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
+from .kernel_detector import KernelDetector
diff --git a/nn_meter/kerneldetection/fusionlib/__init__.py b/nn_meter/kernel_detector/fusionlib/__init__.py
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/__init__.py
rename to nn_meter/kernel_detector/fusionlib/__init__.py
diff --git a/nn_meter/kerneldetection/fusionlib/add-relu_fusionunit.json b/nn_meter/kernel_detector/fusionlib/add-relu_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/add-relu_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/add-relu_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/bn-relu_fusionunit.json b/nn_meter/kernel_detector/fusionlib/bn-relu_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/bn-relu_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/bn-relu_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/channelshuffle_fusionunit.json b/nn_meter/kernel_detector/fusionlib/channelshuffle_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/channelshuffle_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/channelshuffle_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/conv-bn-relu_fusionunit.json b/nn_meter/kernel_detector/fusionlib/conv-bn-relu_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/conv-bn-relu_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/conv-bn-relu_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/conv-bn_fusionunit.json b/nn_meter/kernel_detector/fusionlib/conv-bn_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/conv-bn_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/conv-bn_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/dwconv-bn-relu_fusionunit.json b/nn_meter/kernel_detector/fusionlib/dwconv-bn-relu_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/dwconv-bn-relu_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/dwconv-bn-relu_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/elewise_fusionunit.json b/nn_meter/kernel_detector/fusionlib/elewise_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/elewise_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/elewise_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/gap_fusionunit.json b/nn_meter/kernel_detector/fusionlib/gap_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/gap_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/gap_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/hswish_fusionunit.json b/nn_meter/kernel_detector/fusionlib/hswish_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/hswish_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/hswish_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/se_fusionunit.json b/nn_meter/kernel_detector/fusionlib/se_fusionunit.json
similarity index 100%
rename from nn_meter/kerneldetection/fusionlib/se_fusionunit.json
rename to nn_meter/kernel_detector/fusionlib/se_fusionunit.json
diff --git a/nn_meter/kerneldetection/fusionlib/utils.py b/nn_meter/kernel_detector/fusionlib/utils.py
similarity index 88%
rename from nn_meter/kerneldetection/fusionlib/utils.py
rename to nn_meter/kernel_detector/fusionlib/utils.py
index 9e9f6f32..2ba05023 100644
--- a/nn_meter/kerneldetection/fusionlib/utils.py
+++ b/nn_meter/kernel_detector/fusionlib/utils.py
@@ -3,7 +3,7 @@
 import os
 import json
 from nn_meter.utils.graph_tool import ModelGraph
-from nn_meter.kerneldetection.utils.ir_tools import convert_nodes
+from ..utils.ir_tools import convert_nodes
 
 
 BASE_DIR = os.path.dirname(os.path.abspath(__file__))
diff --git a/nn_meter/kerneldetection/detection/detector.py b/nn_meter/kernel_detector/kernel_detector.py
similarity index 92%
rename from nn_meter/kerneldetection/detection/detector.py
rename to nn_meter/kernel_detector/kernel_detector.py
index ab650b13..2f14948d 100644
--- a/nn_meter/kerneldetection/detection/detector.py
+++ b/nn_meter/kernel_detector/kernel_detector.py
@@ -1,11 +1,10 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from nn_meter.kerneldetection.rulelib.rule_reader import RuleReader
-from nn_meter.kerneldetection.rulelib.rule_splitter import RuleSplitter
 from nn_meter.utils.graph_tool import ModelGraph
-from nn_meter.kerneldetection.utils.constants import DUMMY_TYPES
-from nn_meter.kerneldetection.utils.ir_tools import convert_nodes
-# import logging
+from .utils.constants import DUMMY_TYPES
+from .utils.ir_tools import convert_nodes
+from .rulelib.rule_reader import RuleReader
+from .rulelib.rule_splitter import RuleSplitter
 
 
 class KernelDetector:
diff --git a/nn_meter/kerneldetection/detection/__init__.py b/nn_meter/kernel_detector/rulelib/__init__.py
similarity index 100%
rename from nn_meter/kerneldetection/detection/__init__.py
rename to nn_meter/kernel_detector/rulelib/__init__.py
diff --git a/nn_meter/kerneldetection/rulelib/rule_reader.py b/nn_meter/kernel_detector/rulelib/rule_reader.py
similarity index 96%
rename from nn_meter/kerneldetection/rulelib/rule_reader.py
rename to nn_meter/kernel_detector/rulelib/rule_reader.py
index 62da24c8..4e2fdfd0 100644
--- a/nn_meter/kerneldetection/rulelib/rule_reader.py
+++ b/nn_meter/kernel_detector/rulelib/rule_reader.py
@@ -1,8 +1,8 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
 import json
+from ..fusionlib import get_fusion_unit
 from nn_meter.utils.graph_tool import ModelGraph
-from nn_meter.kerneldetection.fusionlib import get_fusion_unit
 
 
 class RuleReader:
diff --git a/nn_meter/kerneldetection/rulelib/rule_splitter.py b/nn_meter/kernel_detector/rulelib/rule_splitter.py
similarity index 94%
rename from nn_meter/kerneldetection/rulelib/rule_splitter.py
rename to nn_meter/kernel_detector/rulelib/rule_splitter.py
index 7d4971bd..084e5af4 100644
--- a/nn_meter/kerneldetection/rulelib/rule_splitter.py
+++ b/nn_meter/kernel_detector/rulelib/rule_splitter.py
@@ -1,9 +1,9 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
 from .rule_reader import RuleReader
+from ..utils.match_helper import MatchHelper
+from ..utils.fusion_aware_graph import FusionAwareGraph
 from nn_meter.utils.graph_tool import ModelGraph
-from nn_meter.kerneldetection.utils.match_helper import MatchHelper
-from nn_meter.kerneldetection.utils.fusion_aware_graph import FusionAwareGraph
 
 
 class RuleSplitter:
diff --git a/nn_meter/kerneldetection/rulelib/__init__.py b/nn_meter/kernel_detector/utils/__init__.py
similarity index 100%
rename from nn_meter/kerneldetection/rulelib/__init__.py
rename to nn_meter/kernel_detector/utils/__init__.py
diff --git a/nn_meter/kerneldetection/utils/constants.py b/nn_meter/kernel_detector/utils/constants.py
similarity index 100%
rename from nn_meter/kerneldetection/utils/constants.py
rename to nn_meter/kernel_detector/utils/constants.py
diff --git a/nn_meter/kerneldetection/utils/fusion_aware_graph.py b/nn_meter/kernel_detector/utils/fusion_aware_graph.py
similarity index 100%
rename from nn_meter/kerneldetection/utils/fusion_aware_graph.py
rename to nn_meter/kernel_detector/utils/fusion_aware_graph.py
index fc6dd01e..ab75f6e0 100644
--- a/nn_meter/kerneldetection/utils/fusion_aware_graph.py
+++ b/nn_meter/kernel_detector/utils/fusion_aware_graph.py
@@ -1,8 +1,8 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from nn_meter.utils.graph_tool import ModelGraph
-from .union_find import UF
 import networkx as nx
+from .union_find import UF
+from nn_meter.utils.graph_tool import ModelGraph
 
 
 class FusionAwareGraph:
diff --git a/nn_meter/kerneldetection/utils/ir_tools.py b/nn_meter/kernel_detector/utils/ir_tools.py
similarity index 100%
rename from nn_meter/kerneldetection/utils/ir_tools.py
rename to nn_meter/kernel_detector/utils/ir_tools.py
diff --git a/nn_meter/kerneldetection/utils/match_helper.py b/nn_meter/kernel_detector/utils/match_helper.py
similarity index 100%
rename from nn_meter/kerneldetection/utils/match_helper.py
rename to nn_meter/kernel_detector/utils/match_helper.py
diff --git a/nn_meter/kerneldetection/utils/union_find.py b/nn_meter/kernel_detector/utils/union_find.py
similarity index 100%
rename from nn_meter/kerneldetection/utils/union_find.py
rename to nn_meter/kernel_detector/utils/union_find.py
diff --git a/nn_meter/kerneldetection/__init__.py b/nn_meter/kerneldetection/__init__.py
deleted file mode 100644
index 2b31f4e7..00000000
--- a/nn_meter/kerneldetection/__init__.py
+++ /dev/null
@@ -1,3 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-from .detection.detector import KernelDetector
diff --git a/nn_meter/nn_meter_cli.py b/nn_meter/nn_meter_cli.py
index 55dccd27..8438fcb5 100644
--- a/nn_meter/nn_meter_cli.py
+++ b/nn_meter/nn_meter_cli.py
@@ -5,12 +5,7 @@
 import sys
 import argparse
 import logging
-from nn_meter.nn_meter import *
-
-__user_config_folder__ = os.path.expanduser('~/.nn_meter/config')
-__user_data_folder__ = os.path.expanduser('~/.nn_meter/data')
-
-__predictors_cfg_filename__ = 'predictors.yaml'
+from nn_meter import *
 
 
 def list_latency_predictors_cli():
@@ -62,11 +57,12 @@ def apply_latency_predictor_cli(args):
     
     return result
 
+
 def get_nnmeter_ir_cli(args):
     """convert pb file or onnx file to nn-Meter IR graph according to the command line interface arguments
     """
     import json
-    from nn_meter.utils.graph_tool import NumpyEncoder
+    from nn_meter.utils.utils import NumpyEncoder
     if args.tensorflow:
         graph = model_file_to_graph(args.tensorflow, 'pb')
         filename = args.output if args.output else args.tensorflow.replace(".pb", "_pb_ir.json") 
@@ -117,7 +113,7 @@ def nn_meter_cli():
     subparsers = parser.add_subparsers()
 
     # Usage 1: latency predictors
-    lat_pred = subparsers.add_parser('lat_pred', help='apply latency predictor for testing model')
+    lat_pred = subparsers.add_parser('predict', aliases=['lat_pred'], help='apply latency predictor for testing model')
     lat_pred.add_argument(
         "--predictor",
         type=str,
diff --git a/nn_meter/prediction/__init__.py b/nn_meter/prediction/__init__.py
deleted file mode 100644
index 6bb62174..00000000
--- a/nn_meter/prediction/__init__.py
+++ /dev/null
@@ -1,5 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-from .predictors.utils import latency_metrics
-
-
diff --git a/nn_meter/predictor/__init__.py b/nn_meter/predictor/__init__.py
new file mode 100644
index 00000000..0f9eeef7
--- /dev/null
+++ b/nn_meter/predictor/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+from .prediction.utils import latency_metrics
+from .nn_meter_predictor import nnMeterPredictor, list_latency_predictors, load_latency_predictor
diff --git a/nn_meter/nn_meter.py b/nn_meter/predictor/nn_meter_predictor.py
similarity index 68%
rename from nn_meter/nn_meter.py
rename to nn_meter/predictor/nn_meter_predictor.py
index fa384549..aed99e2b 100644
--- a/nn_meter/nn_meter.py
+++ b/nn_meter/predictor/nn_meter_predictor.py
@@ -1,75 +1,20 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from glob import glob
-from nn_meter.prediction.predictors.predict_by_kernel import nn_predict
-from nn_meter.kerneldetection import KernelDetector
-from nn_meter.ir_converters import model_file_to_graph, model_to_graph
-from nn_meter.prediction.load_predictors import loading_to_local
-
-import yaml
 import os
+import yaml
 import pkg_resources
 from shutil import copyfile
 from packaging import version
 import logging
 
-__user_config_folder__ = os.path.expanduser('~/.nn_meter/config')
-__default_user_data_folder__ = os.path.expanduser('~/.nn_meter/data')
-
-__predictors_cfg_filename__ = 'predictors.yaml'
-
-
-def create_user_configs():
-    """create user configs from distributed configs
-    """
-    os.makedirs(__user_config_folder__, exist_ok=True)
-    # TODO/backlog: to handle config merging when upgrading
-    for f in pkg_resources.resource_listdir(__name__, 'configs'):
-        copyfile(pkg_resources.resource_filename(__name__, f'configs/{f}'), os.path.join(__user_config_folder__, f))
-    # make default setting yaml file
-    with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
-        yaml.dump({'data_folder': __default_user_data_folder__}, fp)
-
+from .utils import loading_to_local
+from .prediction.predict_by_kernel import nn_predict
+from nn_meter.kernel_detector import KernelDetector
+from nn_meter.utils import load_config_file, get_user_data_folder
+from nn_meter.ir_converter import model_file_to_graph, model_to_graph
 
-def get_user_data_folder():
-    """get user data folder in settings.yaml
-    """
-    filepath = os.path.join(__user_config_folder__, 'settings.yaml')
-    try:
-        with open(filepath) as fp:
-            return os.path.join(yaml.load(fp, yaml.FullLoader)['data_folder'])
-    except FileNotFoundError:
-        logging.info(f"setting file {filepath} not found, created")
-        create_user_configs()
-        return get_user_data_folder()
-
-
-def change_user_data_folder(new_folder):
-    """change user data folder in settings.yaml
-    """
-    os.makedirs(new_folder, exist_ok=True)
-    with open(os.path.join(__user_config_folder__, 'settings.yaml')) as fp:
-        setting = yaml.load(fp, yaml.FullLoader)
-    with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
-        setting['data_folder'] = new_folder
-        yaml.dump(setting, fp)
 
-
-def load_config_file(fname: str, loader=None):
-    """load config file from __user_config_folder__;
-    if the file not located in __user_config_folder__, copy it from distribution
-    """
-    filepath = os.path.join(__user_config_folder__, fname)
-    try:
-        with open(filepath) as fp:
-            if loader is None:
-                return yaml.load(fp, yaml.FullLoader)
-            else:
-                return loader(fp)
-    except FileNotFoundError:
-        logging.info(f"config file {filepath} not found, created")
-        create_user_configs()
-        return load_config_file(fname)
+__predictors_cfg_filename__ = 'predictors.yaml'
 
 
 def list_latency_predictors():
@@ -121,10 +66,10 @@ def load_latency_predictor(predictor_name: str, predictor_version: float = None)
     user_data_folder = get_user_data_folder()
     pred_info = load_predictor_config(predictor_name, predictor_version)
     kernel_predictors, fusionrule = loading_to_local(pred_info, os.path.join(user_data_folder, 'predictor'))
-    return nnMeter(kernel_predictors, fusionrule)
+    return nnMeterPredictor(kernel_predictors, fusionrule)
 
 
-class nnMeter:
+class nnMeterPredictor:
     def __init__(self, predictors, fusionrule):
         self.kernel_predictors = predictors
         self.fusionrule = fusionrule
diff --git a/nn_meter/prediction/predictors/__init__.py b/nn_meter/predictor/prediction/__init__.py
similarity index 100%
rename from nn_meter/prediction/predictors/__init__.py
rename to nn_meter/predictor/prediction/__init__.py
diff --git a/nn_meter/prediction/predictors/extract_feature.py b/nn_meter/predictor/prediction/extract_feature.py
similarity index 100%
rename from nn_meter/prediction/predictors/extract_feature.py
rename to nn_meter/predictor/prediction/extract_feature.py
index 1c2a91df..8a37c76c 100644
--- a/nn_meter/prediction/predictors/extract_feature.py
+++ b/nn_meter/predictor/prediction/extract_feature.py
@@ -1,8 +1,8 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
+import logging
 import numpy as np
 from sklearn.metrics import mean_squared_error
-import logging
 
 
 def get_flop(input_channel, output_channel, k, H, W, stride):
diff --git a/nn_meter/prediction/predictors/kernel_predictor.py b/nn_meter/predictor/prediction/kernel_predictor.py
similarity index 100%
rename from nn_meter/prediction/predictors/kernel_predictor.py
rename to nn_meter/predictor/prediction/kernel_predictor.py
diff --git a/nn_meter/prediction/predictors/predict_by_kernel.py b/nn_meter/predictor/prediction/predict_by_kernel.py
similarity index 100%
rename from nn_meter/prediction/predictors/predict_by_kernel.py
rename to nn_meter/predictor/prediction/predict_by_kernel.py
diff --git a/nn_meter/prediction/predictors/utils.py b/nn_meter/predictor/prediction/utils.py
similarity index 99%
rename from nn_meter/prediction/predictors/utils.py
rename to nn_meter/predictor/prediction/utils.py
index 2cf40a7a..3ffe1814 100644
--- a/nn_meter/prediction/predictors/utils.py
+++ b/nn_meter/predictor/prediction/utils.py
@@ -1,9 +1,9 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-
 import numpy as np
 from sklearn.metrics import mean_squared_error
 
+
 def get_kernel_name(optype):
     """
     for many similar kernels, we use one kernel predictor since their latency difference is negligible,
@@ -34,6 +34,7 @@ def get_kernel_name(optype):
 
     return optype
 
+
 def get_accuracy(y_pred, y_true, threshold=0.01):
     a = (y_true - y_pred) / y_true
     b = np.where(abs(a) <= threshold)
diff --git a/nn_meter/prediction/load_predictors.py b/nn_meter/predictor/utils.py
similarity index 97%
rename from nn_meter/prediction/load_predictors.py
rename to nn_meter/predictor/utils.py
index add48c2b..cf5f5801 100644
--- a/nn_meter/prediction/load_predictors.py
+++ b/nn_meter/predictor/utils.py
@@ -7,7 +7,7 @@
 from tqdm import tqdm
 import requests
 import logging
-from nn_meter.utils.utils import download_from_url
+from nn_meter.utils import download_from_url
 
 
 def loading_to_local(pred_info, dir="data/predictorzoo"):
diff --git a/nn_meter/utils/__init__.py b/nn_meter/utils/__init__.py
index 9a045456..7c5eb826 100644
--- a/nn_meter/utils/__init__.py
+++ b/nn_meter/utils/__init__.py
@@ -1,2 +1,9 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
+from .config_manager import (
+    create_user_configs,
+    get_user_data_folder,
+    change_user_data_folder,
+    load_config_file
+)
+from .utils import download_from_url
\ No newline at end of file
diff --git a/nn_meter/utils/config_manager.py b/nn_meter/utils/config_manager.py
new file mode 100644
index 00000000..ad7218c3
--- /dev/null
+++ b/nn_meter/utils/config_manager.py
@@ -0,0 +1,66 @@
+import yaml
+import os
+import logging
+import pkg_resources
+from shutil import copyfile
+
+
+__user_config_folder__ = os.path.expanduser('~/.nn_meter/config')
+__default_user_data_folder__ = os.path.expanduser('~/.nn_meter/data')
+
+__predictors_cfg_filename__ = 'predictors.yaml'
+
+
+def create_user_configs():
+    """create user configs from distributed configs
+    """
+    os.makedirs(__user_config_folder__, exist_ok=True)
+    # TODO/backlog: to handle config merging when upgrading    
+    for f in pkg_resources.resource_listdir(".".join(__name__.split('.')[:-2]), 'configs'):
+        copyfile(
+            pkg_resources.resource_filename(".".join(__name__.split('.')[:-2]), f'configs/{f}'), 
+            os.path.join(__user_config_folder__, f))
+    # make default setting yaml file
+    with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
+        yaml.dump({'data_folder': __default_user_data_folder__}, fp)
+
+
+def get_user_data_folder():
+    """get user data folder in settings.yaml
+    """
+    filepath = os.path.join(__user_config_folder__, 'settings.yaml')
+    try:
+        with open(filepath) as fp:
+            return os.path.join(yaml.load(fp, yaml.FullLoader)['data_folder'])
+    except FileNotFoundError:
+        logging.info(f"setting file {filepath} not found, created")
+        create_user_configs()
+        return get_user_data_folder()
+
+
+def change_user_data_folder(new_folder):
+    """change user data folder in settings.yaml
+    """
+    os.makedirs(new_folder, exist_ok=True)
+    with open(os.path.join(__user_config_folder__, 'settings.yaml')) as fp:
+        setting = yaml.load(fp, yaml.FullLoader)
+    with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
+        setting['data_folder'] = new_folder
+        yaml.dump(setting, fp)
+
+
+def load_config_file(fname: str, loader=None):
+    """load config file from __user_config_folder__;
+    if the file not located in __user_config_folder__, copy it from distribution
+    """
+    filepath = os.path.join(__user_config_folder__, fname)
+    try:
+        with open(filepath) as fp:
+            if loader is None:
+                return yaml.load(fp, yaml.FullLoader)
+            else:
+                return loader(fp)
+    except FileNotFoundError:
+        logging.info(f"config file {filepath} not found, created")
+        create_user_configs()
+        return load_config_file(fname)
\ No newline at end of file
diff --git a/nn_meter/utils/graph_tool.py b/nn_meter/utils/graph_tool.py
index 69c4f2ad..f88f6849 100644
--- a/nn_meter/utils/graph_tool.py
+++ b/nn_meter/utils/graph_tool.py
@@ -2,18 +2,8 @@
 # Licensed under the MIT license.
 import copy
 import json
-import numpy as np
 import logging
-
-
-class NumpyEncoder(json.JSONEncoder):
-    def default(self, obj):
-        if isinstance(obj, np.ndarray):
-            return obj.tolist()
-        if isinstance(obj, (bytes, bytearray)):
-            return obj.decode("utf-8")
-        return json.JSONEncoder.default(self, obj)
-
+from .utils import NumpyEncoder
 
 class ModelGraph:
     def __init__(self, filename=None, graph=None):
diff --git a/nn_meter/utils/import_package.py b/nn_meter/utils/import_package.py
new file mode 100644
index 00000000..08a5cb79
--- /dev/null
+++ b/nn_meter/utils/import_package.py
@@ -0,0 +1,78 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+import logging
+from packaging import version
+
+
+def try_import_onnx(require_version = ["1.9.0"]):
+    if isinstance(require_version, str):
+        require_version = [require_version]
+    try:
+        import onnx
+        if version.parse(onnx.__version__).release not in [version.parse(v).release for v in require_version]:
+            logging.warning(f'onnx=={onnx.__version__} is not well tested now, well tested version: onnx=={", ".join(require_version)}' )
+        return onnx
+    except ImportError:
+        logging.error(f'You have not install the onnx package, please install onnx=={require_version[0]} and try again.')
+        exit()
+
+def try_import_torch(require_version = ["1.9.0", "1.7.1"]):
+    if isinstance(require_version, str):
+        require_version = [require_version]
+    try:
+        import torch
+        if version.parse(torch.__version__).release not in [version.parse(v).release for v in require_version]:
+            logging.warning(f'torch=={torch.__version__} is not well tested now, well tested version: torch=={", ".join(require_version)}' )
+        return torch
+    except ImportError:
+        logging.error(f'You have not install the torch package, please install torch=={require_version[0]} and try again.')
+        exit()
+
+def try_import_tensorflow(require_version = ["1.15.0"]):
+    if isinstance(require_version, str):
+        require_version = [require_version]
+    try:
+        import tensorflow
+        if version.parse(tensorflow.__version__).release not in [version.parse(v).release for v in require_version]:
+            logging.warning(f'tensorflow=={tensorflow.__version__} is not well tested now, well tested version: tensorflow=={", ".join(require_version)}' )
+        return tensorflow
+    except ImportError:
+        logging.error(f'You have not install the tensorflow package, please install tensorflow=={require_version[0]} and try again.')
+        exit()
+
+def try_import_nni(require_version = ["2.4", "2.5"]):
+    if isinstance(require_version, str):
+        require_version = [require_version]
+    try:
+        import nni
+        if version.parse(nni.__version__).release not in [version.parse(v).release for v in require_version]:
+            logging.warning(f'nni=={nni.__version__} is not well tested now, well tested version: nni=={", ".join(require_version)}' )
+        return nni
+    except ImportError:
+        logging.error(f'You have not install the tensorflow package, please install tensorflow=={require_version[0]} and try again.')
+        exit()
+
+def try_import_torchvision_models():
+    try:
+        import torchvision
+        return torchvision.models
+    except ImportError:
+        logging.error(f'You have not install the torchvision package, please install torchvision and try again.')
+        exit()
+
+def try_import_onnxsim():
+    try:
+        from onnxsim import simplify
+        return simplify
+    except ImportError:
+        logging.error(f'You have not install the onnx-simplifier package, please install onnx-simplifier and try again.')
+        exit()
+
+def try_import_dgl():
+    try:
+        import dgl
+        return dgl
+    except ImportError:
+        logging.error(f'You have not install the dgl package, please install dgl and try again.')
+        exit()
+    
\ No newline at end of file
diff --git a/nn_meter/utils/utils.py b/nn_meter/utils/utils.py
index 980ba133..3ffd23e0 100644
--- a/nn_meter/utils/utils.py
+++ b/nn_meter/utils/utils.py
@@ -4,9 +4,9 @@
 from zipfile import ZipFile
 from tqdm import tqdm
 import requests
-from packaging import version
 import logging
-
+import json
+import numpy as np
 
 def download_from_url(urladdr, ppath):
     """
@@ -21,7 +21,7 @@ def download_from_url(urladdr, ppath):
     if not os.path.isdir(ppath):
         os.makedirs(ppath)
 
-    # logging.keyinfo(f'Download from {urladdr}')
+    logging.keyinfo(f'Download from {urladdr}')
     response = requests.get(urladdr, stream=True)
     total_size_in_bytes = int(response.headers.get("content-length", 0))
     block_size = 2048  # 2 Kibibyte
@@ -36,75 +36,11 @@ def download_from_url(urladdr, ppath):
     progress_bar.close()
     os.remove(file_name)
 
-def try_import_onnx(require_version = ["1.9.0"]):
-    if isinstance(require_version, str):
-        require_version = [require_version]
-    try:
-        import onnx
-        if version.parse(onnx.__version__).release[:2] not in [version.parse(v).release[:2] for v in require_version]:
-            logging.warning(f'onnx=={onnx.__version__} is not well tested now, well tested version: onnx=={", ".join(require_version)}' )
-        return onnx
-    except ImportError:
-        logging.error(f'You have not install the onnx package, please install onnx=={require_version[0]} and try again.')
-        exit()
-
-def try_import_torch(require_version = ["1.9.0", "1.7.1"]):
-    if isinstance(require_version, str):
-        require_version = [require_version]
-    try:
-        import torch
-        if version.parse(torch.__version__).release[:2] not in [version.parse(v).release[:2] for v in require_version]:
-            logging.warning(f'torch=={torch.__version__} is not well tested now, well tested version: torch=={", ".join(require_version)}' )
-        return torch
-    except ImportError:
-        logging.error(f'You have not install the torch package, please install torch=={require_version[0]} and try again.')
-        exit()
-
-def try_import_tensorflow(require_version = ["2.6.0", "1.15.0"]):
-    if isinstance(require_version, str):
-        require_version = [require_version]
-    try:
-        import tensorflow
-        if version.parse(tensorflow.__version__).release[:2] not in [version.parse(v).release[:2] for v in require_version]:
-            logging.warning(f'tensorflow=={tensorflow.__version__} is not well tested now, well tested version: tensorflow=={", ".join(require_version)}' )
-        return tensorflow
-    except ImportError:
-        logging.error(f'You have not install the tensorflow package, please install tensorflow=={require_version[0]} and try again.')
-        exit()
-
-def try_import_nni(require_version = ["2.4", "2.5"]):
-    if isinstance(require_version, str):
-        require_version = [require_version]
-    try:
-        import nni
-        if version.parse(nni.__version__).release[:2] not in [version.parse(v).release[:2] for v in require_version]:
-            logging.warning(f'nni=={nni.__version__} is not well tested now, well tested version: nni=={", ".join(require_version)}' )
-        return nni
-    except ImportError:
-        logging.error(f'You have not install the tensorflow package, please install tensorflow=={require_version[0]} and try again.')
-        exit()
-
-def try_import_torchvision_models():
-    try:
-        import torchvision
-        return torchvision.models
-    except ImportError:
-        logging.error(f'You have not install the torchvision package, please install torchvision and try again.')
-        exit()
-
-def try_import_onnxsim():
-    try:
-        from onnxsim import simplify
-        return simplify
-    except ImportError:
-        logging.error(f'You have not install the onnx-simplifier package, please install onnx-simplifier and try again.')
-        exit()
 
-def try_import_dgl():
-    try:
-        import dgl
-        return dgl
-    except ImportError:
-        logging.error(f'You have not install the dgl package, please install dgl and try again.')
-        exit()
-    
\ No newline at end of file
+class NumpyEncoder(json.JSONEncoder):
+    def default(self, obj):
+        if isinstance(obj, np.ndarray):
+            return obj.tolist()
+        if isinstance(obj, (bytes, bytearray)):
+            return obj.decode("utf-8")
+        return json.JSONEncoder.default(self, obj)
diff --git a/setup.py b/setup.py
index ce0f12fb..46d98862 100644
--- a/setup.py
+++ b/setup.py
@@ -26,7 +26,7 @@
         ],
     packages=find_packages(),
     package_data={
-        'nn_meter': ['configs/*.yaml', 'kerneldetection/fusionlib/*.json'],
+        'nn_meter': ['configs/*.yaml', 'kernel_detector/fusionlib/*.json'],
     },
     entry_points={
         'console_scripts': ['nn-meter=nn_meter.nn_meter_cli:nn_meter_cli'],
diff --git a/tests/README.md b/tests/README.md
index e97e8109..88c8df7b 100644
--- a/tests/README.md
+++ b/tests/README.md
@@ -2,9 +2,9 @@ In nn-Meter/tests, we implement the integration test for all usages of nn-Meter.
 
 ## Integration test
 
-According to [nn-Meter usage](nn-Meter/docs/usage.md), nn-Meter is a latency predictor of models with type of Tensorflow, PyTorch, Onnx, nn-meter IR graph and NNI IR graph. In integration test, we run the test for mentioned models, collect the latency results, and compare the results with the reference results. For time saving and readability, we separate the integration test into two scripts with PyTorch model and others, respectively. 
+According to [nn-Meter usage](../docs/predictor/usage.md), nn-Meter is a latency predictor of models with type of Tensorflow, PyTorch, Onnx, nn-meter IR graph and NNI IR graph. In integration test, we run the test for mentioned models, collect the latency results, and compare the results with the reference results. For time saving and readability, we separate the integration test into two scripts with PyTorch model and others, respectively. 
 
-For PyTorch model, we accomplished two graph converters, namely NNI-based torch converter and ONNX-based torch converter (Refer to [this doc](docs/usage.md#torch-model-converters) for more information). We test both converters in `tests/integration_test_torch.py`. Note that the NNI-based torch converter needs API from `nni.retiarii.nn.pytorch` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model)) to build the torch module, thus we collected torchvision models and modified the import package to meet NNI requirements. The modified model are saved in tests/torchmodels.
+For PyTorch model, we accomplished two graph converters, namely NNI-based torch converter and ONNX-based torch converter (Refer to [this doc](../docs/predictor/usage.md#torch-model-converters) for more information). We test both converters in `tests/integration_test_torch.py`. Note that the NNI-based torch converter needs API from `nni.retiarii.nn.pytorch` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model)) to build the torch module, thus we collected torchvision models and modified the import package to meet NNI requirements. The modified model are saved in tests/torchmodels.
 
 
 ## github actions workflow
diff --git a/tests/integration_test.py b/tests/integration_test.py
index eb2371a3..b20f2ad9 100644
--- a/tests/integration_test.py
+++ b/tests/integration_test.py
@@ -82,7 +82,7 @@ def integration_test(model_type, url, ppath, output_name = "tests/test_result.tx
         try:
             since = time.time()
             # print(f'nn-meter --{model_type} {ppath} --predictor {pred_name} --predictor-version {pred_version}')
-            result = subprocess.check_output(['nn-meter', 'lat_pred', f'--{model_type}', f'{ppath}', '--predictor', f'{pred_name}', '--predictor-version', f'{pred_version}'])
+            result = subprocess.check_output(['nn-meter', 'predict', f'--{model_type}', f'{ppath}', '--predictor', f'{pred_name}', '--predictor-version', f'{pred_version}'])
             runtime = time.time() - since
         except NotImplementedError:
             logging.error(f"Meets ERROR when checking --{model_type} {ppath} --predictor {pred_name} --predictor-version {pred_version}")
diff --git a/tests/integration_test_torch.py b/tests/integration_test_torch.py
index afbe237d..ad9e9099 100644
--- a/tests/integration_test_torch.py
+++ b/tests/integration_test_torch.py
@@ -42,6 +42,7 @@ def integration_test_onnx_based_torch(model_type, model_list, output_name = "tes
     for pred_name, pred_version in get_predictors():
         try:
             since = time.time()
+
             print(" ".join(['nn-meter', 'lat_pred', '--torchvision'] + model_list + ['--predictor', pred_name, '--predictor-version', pred_version]))
             result = subprocess.run(
                 ['nn-meter', 'lat_pred', '--torchvision'] + model_list + ['--predictor', pred_name, '--predictor-version', pred_version],