Skip to content

Commit

Permalink
Merge branch 'master' into release/1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangyang2057 committed Mar 22, 2023
2 parents ed7c4f1 + f89ddb0 commit 3473131
Show file tree
Hide file tree
Showing 13 changed files with 181 additions and 80 deletions.
48 changes: 44 additions & 4 deletions docs/USAGE_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

nncase provides both python wheel package and ncc client to compile your neural models.

- nncase wheel package can be downloaded at [nncase release](https://github.com/kendryte/nncase/releases), target wheel package except for both cpu and K210 can be got from nncase sdk for your target.
- nncase wheel package can be downloaded at [nncase release](https://github.com/kendryte/nncase/releases)
- For ncc client, you should git clone nncase repository and then build it by yourself.

# nncase python APIs
Expand All @@ -13,21 +13,61 @@ nncase provides Python APIs to compile neural network model and inference on you

## Installation

The nncase toolchain compiler consists of nncase and plug-in wheel packages.

- Both nncase and plug-in wheel packages are released at [nncase github](https://github.com/kendryte/nncase/releases)
- Nncase wheel package supports Python 3.6/3.7/3.8/3.9/3.10, You can download it according to your operating system and Python version.
- The plug-in wheel package does not depend on Python version, you can install it directly.

You can make use of [nncase docker image](https://github.com/kendryte/nncase/blob/master/docs/build.md)(Ubuntu 20.04 + Python 3.8) if you do not have Ubuntu development.

```shell
$ cd /path/to/nncase_sdk
$ docker pull registry.cn-hangzhou.aliyuncs.com/kendryte/nncase:latest
$ docker run -it --rm -v `pwd`:/mnt -w /mnt registry.cn-hangzhou.aliyuncs.com/kendryte/nncase:latest /bin/bash -c "/bin/bash"
```

Take Ubuntu 20.04 + Python 3.8 for example


### cpu/K210

- Download nncase wheel package and then install it.

```
root@2b11cc15c7f8:/mnt# wget -P x86_64 https://github.com/kendryte/nncase/releases/download/v1.8.0/nncase-1.8.0.20220929-cp38-cp38-manylinux_2_24_x86_64.whl
root@2b11cc15c7f8:/mnt# pip3 install x86_64/*.whl
```



### K510

- Download both nncase and nncase_k510 wheel packages and then install them.

```shell
root@f74598de4a02:/mnt# pip3 install nncase_github/nncase-1.0.0.20211029-cp38-cp38-manylinux_2_24_x86_64.whl
root@2b11cc15c7f8:/mnt# wget -P x86_64 https://github.com/kendryte/nncase/releases/download/v1.8.0/nncase-1.8.0.20220929-cp38-cp38-manylinux_2_24_x86_64.whl

root@2b11cc15c7f8:/mnt# wget -P x86_64 https://github.com/kendryte/nncase/releases/download/v1.8.0/nncase_k510-1.8.0.20220930-py2.py3-none-manylinux_2_24_x86_64.whl

root@2b11cc15c7f8:/mnt# pip3 install x86_64/*.whl
```



### Check nncase version

```python
root@469e6a4a9e71:/mnt# python3
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _nncase
>>> print(_nncase.__version__)
1.8.0-55be52f
```

> You should get and install target wheel package from your nncase sdk if you do not take cpu/K210 as your target


## nncase compile model APIs

Expand Down
50 changes: 45 additions & 5 deletions docs/USAGE_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

nncase目前提供了python wheel包和ncc客户端两种方法编译模型.

- nncase wheel包需要去[nncase release](https://github.com/kendryte/nncase/releases)获取, target wheel包除cpu/K210不需要安装外, 其它target需要从nncase sdk离线获取
- nncase wheel包需要去[nncase release](https://github.com/kendryte/nncase/releases)获取
- ncc客户端需要用户下载并编译nncase

# nncase python APIs
Expand All @@ -11,21 +11,61 @@ nncase提供了Python APIs, 用于在PC上编译/推理深度学习模型.

## 安装

用户若没有Ubuntu环境, 可使用[nncase docker image](https://github.com/kendryte/nncase/blob/master/docs/build.md)(Ubuntu 20.04 + Python 3.8)
nncase工具链compiler部分包括nncase和插件包

- nncase 和插件包均在[nncase github](https://github.com/kendryte/nncase/releases)发布
- nncase wheel包支持Python 3.6/3.7/3.8/3.9/3.10, 用户可根据操作系统和Python选择相应版本下载 .
- 插件包不依赖Python版本, 可直接安装

用户若没有Ubuntu环境, 可使用[nncase docker](https://github.com/kendryte/nncase/blob/master/docs/build.md#docker)(Ubuntu 20.04 + Python 3.8)

```shell
$ cd /path/to/nncase_sdk
$ docker pull registry.cn-hangzhou.aliyuncs.com/kendryte/nncase:latest
$ docker run -it --rm -v `pwd`:/mnt -w /mnt registry.cn-hangzhou.aliyuncs.com/kendryte/nncase:latest /bin/bash -c "/bin/bash"
```

下面以Ubuntu 20.04 + Python 3.8平台安装nncase为例


### cpu/K210

- 下载nncase wheel包, 直接安装即可.

```
root@2b11cc15c7f8:/mnt# wget -P x86_64 https://github.com/kendryte/nncase/releases/download/v1.8.0/nncase-1.8.0.20220929-cp38-cp38-manylinux_2_24_x86_64.whl
root@2b11cc15c7f8:/mnt# pip3 install x86_64/*.whl
```



### K510

- 分别下载nncase和nncase_k510插件包,再一起安装

```shell
root@f74598de4a02:/mnt# pip3 install nncase_github/nncase-1.0.0.20211029-cp38-cp38-manylinux_2_24_x86_64.whl
root@2b11cc15c7f8:/mnt# wget -P x86_64 https://github.com/kendryte/nncase/releases/download/v1.8.0/nncase-1.8.0.20220929-cp38-cp38-manylinux_2_24_x86_64.whl

root@2b11cc15c7f8:/mnt# wget -P x86_64 https://github.com/kendryte/nncase/releases/download/v1.8.0/nncase_k510-1.8.0.20220930-py2.py3-none-manylinux_2_24_x86_64.whl

root@2b11cc15c7f8:/mnt# pip3 install x86_64/*.whl
```



### 查看版本信息

```python
root@469e6a4a9e71:/mnt# python3
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _nncase
>>> print(_nncase.__version__)
1.8.0-55be52f
```

> 若不使用cpu/K210作为target, 需要从相应target的nncase sdk中获取wheel包并进行安装


## nncase 编译模型APIs

Expand Down
5 changes: 5 additions & 0 deletions src/evaluator/ops/neutral/neutral_ops.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,11 @@ void register_neutral_evaluators()
output.buffer().as_span<int32_t>().data(), input.shape(), to(rnode.axis()), input.strides(), output.strides(), rnode.keep_dims())
.unwrap_or_throw();
break;
case dt_int64:
kernels::reduce(rnode.reduce_op(), static_cast<int64_t>(rnode.init_value()), input.buffer().as_span<int64_t>().data(),
output.buffer().as_span<int64_t>().data(), input.shape(), to(rnode.axis()), input.strides(), output.strides(), rnode.keep_dims())
.unwrap_or_throw();
break;
default:
std::cerr << "unsupported dtype for reduce: " + std::string(datatype_names(input_type));
} });
Expand Down
76 changes: 40 additions & 36 deletions src/ir/graph.partition.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ struct region
std::unordered_set<node *> nodes_set;
std::unordered_set<input_connector *> region_inputs;
std::unordered_set<output_connector *> outputs;
std::unordered_map<output_connector *, int> need_remove_outputs;

region(module_type_t module_type)
: module_type(module_type)
Expand All @@ -49,25 +50,30 @@ struct region
for (auto out : n.outputs())
outputs.emplace(out);

std::unordered_map<output_connector *, int> need_remove_outputs;
for (auto it = region_inputs.begin(); it != region_inputs.end();)
{
if (outputs.contains((*it)->connection()))
{
if (need_remove_outputs.find((*it)->connection()) != need_remove_outputs.end())
need_remove_outputs.at((*it)->connection()) += 1;
need_remove_outputs.at((*it)->connection()) -= 1;
else
need_remove_outputs.emplace((*it)->connection(), 1);
need_remove_outputs.emplace((*it)->connection(),
(*it)->connection()->connections().size() - 1);
it = region_inputs.erase(it);
}
else
++it;
}

for (auto it : need_remove_outputs)
for (auto it = need_remove_outputs.begin(); it != need_remove_outputs.end();)
{
if (it.first->connections().size() == it.second)
outputs.erase(it.first);
if (it->second == 0)
{
outputs.erase(it->first);
it = need_remove_outputs.erase(it);
}
else
it++;
}

if (is_all_noaction && n.attributes() & node_attr_action)
Expand Down Expand Up @@ -127,12 +133,13 @@ typedef struct Region_node
class Region_tree
{
public:
Region_node *create_tree(std::list<region>::iterator new_node, std::list<region> &regions, int depth)
Region_tree(std::list<region> &rg)
: regions_(rg) { }
Region_node *create_tree(std::list<region>::iterator new_node, int depth)
{

Region_node *root = create_node();
root->node = new_node;
auto bro = root->bro;

// find a path from itb--> ita
if (new_node == target_region_)
Expand All @@ -150,20 +157,20 @@ class Region_tree

for (auto it : new_node->region_inputs)
{
for (auto itb = regions.begin(); itb != regions.end(); itb++)
for (auto itb = regions_.begin(); itb != regions_.end(); itb++)
{
if (itb->outputs.contains(it->connection()))
{
if (root->child == nullptr)
{
root->child = create_tree(itb, regions, depth + 1);
root->child = create_tree(itb, depth + 1);
root->child->parent = root;
}
else
{
bro = create_tree(itb, regions, depth);
bro->parent = root;
bro = bro->bro;
root->bro = create_tree(itb, depth);
root->bro->parent = root;
root->bro = root->bro->bro;
}
}
}
Expand All @@ -174,15 +181,16 @@ class Region_tree

bool not_have_circle()
{
// if tree depth > 20, ignore merge itb--> ita
// if tree depth > 10, ignore merge itb--> ita
if (skip_)
return false;

// each leaf has only one path to root.
// if all the paths of leaves to root don't have CPU op ,itb can merge to ita.
for (auto it : leaves_)
{
auto condition_ptr = it->parent;
if (condition_ptr->node == start_region_)
continue;
while (condition_ptr != nullptr)
{
if (condition_ptr->node->module_type == runtime::stackvm::stackvm_module_type && !condition_ptr->node->is_all_noaction)
Expand All @@ -203,20 +211,9 @@ class Region_tree
{
if (root != nullptr)
{
if (root->child != nullptr)
{
free_tree(root->child);
}
else if (root->bro != nullptr)
{
free_tree(root->bro);
}

free_tree(root->child);
free_tree(root->bro);
delete root;
root->child = nullptr;
root->bro = nullptr;
root->parent = nullptr;
root = nullptr;
}
}

Expand All @@ -231,6 +228,7 @@ class Region_tree
std::list<region>::iterator target_region_;
std::vector<Region_node *> leaves_;
bool skip_;
std::list<region> &regions_;
};

class graph_merger
Expand Down Expand Up @@ -333,22 +331,28 @@ class graph_merger
bool check_circle(std::list<region>::iterator ita, std::list<region>::iterator itb)
{
// merge directly
if (ita->outputs.size() == 1)
bool merge_directly = true;
for (auto it : ita->outputs)
{
for (auto it : ita->outputs)
{
if (it->connections().size() == 1)
return true;
}
if (std::all_of(it->connections().begin(), it->connections().end(),
[&](input_connector *out) {
return itb->region_inputs.contains(out);
}))
continue;
else
merge_directly = false;
}
if (merge_directly)
return true;

if (itb->region_inputs.size() == 1)
{
return true;
}

auto check = std::make_shared<Region_tree>();
auto check = std::make_shared<Region_tree>(regions_);
check->set_label_region(ita, itb);
auto root = check->create_tree(itb, regions_, 0);
auto root = check->create_tree(itb, 0);
auto flag = check->not_have_circle();
check->free_tree(root);
return flag;
Expand Down
3 changes: 3 additions & 0 deletions src/kernels/cpu/optimized/riscv64/reduce.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ result<void> optimized::reduce<float>(reduce_op_t op, float init_value, const fl
template result<void> optimized::reduce<int32_t>(reduce_op_t op, int32_t init_value, const int32_t *input, int32_t *output, const runtime_shape_t &in_shape, const runtime_shape_t &axis,
const runtime_shape_t &in_strides, const runtime_shape_t &out_strides, bool keep_dims, kernel_context &context) noexcept;

template result<void> optimized::reduce<int64_t>(reduce_op_t op, int64_t init_value, const int64_t *input, int64_t *output, const runtime_shape_t &in_shape, const runtime_shape_t &axis,
const runtime_shape_t &in_strides, const runtime_shape_t &out_strides, bool keep_dims, kernel_context &context) noexcept;

template <typename T>
result<void> optimized::reduce(reduce_op_t op, T init_value, const T *input, T *output, const runtime_shape_t &in_shape, const runtime_shape_t &axis,
const runtime_shape_t &in_strides, const runtime_shape_t &out_strides, bool keep_dims, kernel_context &context) noexcept
Expand Down
4 changes: 2 additions & 2 deletions src/kernels/cpu/optimized/riscv64/unary.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ struct unary_op_abs_rvv

struct unary_op_ceil_rvv
{
vfloat32m8_t operator()(const vfloat32m8_t &x, const word_type &vl) const
vfloat32m8_t operator()(const vfloat32m8_t &x, const size_t &vl) const
{
vint32m8_t _xi = vfcvt_x_f_v_i32m8(x, vl);
vbool4_t _mask = vmflt_vv_f32m8_b4(vfcvt_f_x_v_f32m8(_xi, vl), x, vl);
Expand All @@ -61,7 +61,7 @@ struct unary_op_cos_rvv

struct unary_op_exp_rvv
{
vfloat32m8_t operator()(const vfloat32m8_t &x, const word_type &vl) const
vfloat32m8_t operator()(const vfloat32m8_t &x, const size_t &vl) const
{
return exp_ps(x, vl);
}
Expand Down
Loading

0 comments on commit 3473131

Please sign in to comment.