diff --git a/README.md b/README.md index 504fcc54a..07f420e77 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ offline. A tool that adapts models trained by above algorithms to be inferred by fixed point arithmetic. - **SeeDot**: Floating-point to fixed-point quantization tool. -Applications demonstrating usecases of these algorithms. +Applications demonstrating usecases of these algorithms, such as [GesturePod](/docs/publications). ### Organization - The `tf` directory contains the `edgeml_tf` package which specifies these architectures in TensorFlow, @@ -41,16 +41,18 @@ Please see install/run instructions in the README pages within these directories ### Details and project pages For details, please see our - [project page](https://microsoft.github.io/EdgeML/) and - [Microsoft Research page](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/). -our ICML'17 publications on [Bonsai](docs/publications/Bonsai.pdf) and -[ProtoNN](docs/publications/ProtoNN.pdf) algorithms, -NeurIPS'18 publications on [EMI-RNN](docs/publications/emi-rnn-nips18.pdf) and -[FastGRNN](docs/publications/FastGRNN.pdf), -and PLDI'19 publication on [SeeDot](docs/publications/SeeDot.pdf). - - -Checkout the [ELL](https://github.com/Microsoft/ELL) project which can + [project page](https://microsoft.github.io/EdgeML/), + [Microsoft Research page](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/), +the ICML'17 publications on [Bonsai](/docs/publications/Bonsai.pdf) and +[ProtoNN](/docs/publications/ProtoNN.pdf) algorithms, +the NeurIPS'18 publications on [EMI-RNN](/docs/publications/emi-rnn-nips18.pdf) and +[FastGRNN](/docs/publications/FastGRNN.pdf), +the PLDI'19 publication on [SeeDot compiler](/docs/publications/SeeDot.pdf), +the UIST'19 publication on [Gesturepod](/docs/publications/ICane-UIST19.pdf), +and the NeurIPS'19 publication on [S-RNN](/docs/publications/SRNN.pdf). + + +Also checkout the [ELL](https://github.com/Microsoft/ELL) project which can provide optimized binaries for some of the ONNX models trained by this library. ### Contributors: @@ -75,7 +77,8 @@ If you use software from this library in your work, please use the BibTex entry ``` @software{edgeml01, author = {{Dennis, Don Kurian and Gaurkar, Yash and Gopinath, Sridhar and Gupta, Chirag and - Kumar, Ashish and Kusupati, Aditya and Lovett, Chris and Patil, Shishir G and Simhadri, Harsha Vardhan}}, + Jain, Moksh and Kumar, Ashish and Kusupati, Aditya and Lovett, Chris + and Patil, Shishir G and Simhadri, Harsha Vardhan}}, title = {{EdgeML: Machine Learning for resource-constrained edge devices}}, url = {https://github.com/Microsoft/EdgeML}, version = {0.2}, diff --git a/examples/pytorch/Bonsai/README.md b/examples/pytorch/Bonsai/README.md index 5a80a88bf..60b9c312a 100644 --- a/examples/pytorch/Bonsai/README.md +++ b/examples/pytorch/Bonsai/README.md @@ -7,7 +7,8 @@ use-case on the USPS10 public dataset. `edgeml_pytorch.graph.bonsai` implements the Bonsai prediction graph in pytorch. The three-phase training routine for Bonsai is decoupled from the forward graph to facilitate a plug and play behaviour wherein Bonsai can be combined with or -used as a final layer classifier for other architectures (RNNs, CNNs). +used as a final layer classifier for other architectures (RNNs, CNNs). +See `edgeml_pytorch.trainer.bonsaiTrainer` for 3-phase training. Note that `bonsai_example.py` assumes that data is in a specific format. It is assumed that train and test data is contained in two files, `train.npy` and diff --git a/examples/pytorch/FastCells/README.md b/examples/pytorch/FastCells/README.md index f3d8c3474..abdfb20e2 100644 --- a/examples/pytorch/FastCells/README.md +++ b/examples/pytorch/FastCells/README.md @@ -1,34 +1,36 @@ # EdgeML FastCells on a sample public dataset -This directory includes example notebook and general execution script of -FastCells (FastRNN & FastGRNN) developed as part of EdgeML along with modified +This directory includes example notebooks and scripts of +FastCells (FastRNN & FastGRNN) along with modified UGRNN, GRU and LSTM to support the LSQ training routine. -Also, we include a sample cleanup and use-case on the USPS10 public dataset. - -`edgeml_pytorch.graph.rnn` implements the custom RNN cells of **FastRNN** ([`FastRNNCell`](../../pytorch_edgeml/graph/rnn.py#L226)) and **FastGRNN** ([`FastGRNNCell`](../../pytorch_edgeml/graph/rnn.py#L80)) with -multiple additional features like Low-Rank parameterisation, custom -non-linearities etc., Similar to Bonsai and ProtoNN, the three-phase training -routine for FastRNN and FastGRNN is decoupled from the custom cells to -facilitate a plug and play behaviour of the custom RNN cells in other -architectures (NMT, Encoder-Decoder etc.,) in place of the inbuilt `RNNCell`, `GRUCell`, `BasicLSTMCell` etc., -`edgeml_pytorch.graph.rnn` also contains modified RNN cells of **UGRNN** ([`UGRNNLRCell`](../../pytorch_edgeml/graph/rnn.py#L742)), -**GRU** ([`GRULRCell`](../../edgeml/graph/rnn.py#L565)) and **LSTM** ([`LSTMLRCell`](../../pytorch_edgeml/graph/rnn.py#L369)). These cells also can be substituted for FastCells where ever feasible. - -`edgeml_pytorch.graph.rnn` also contains fully wrapped RNNs which are equivalent to `nn.LSTM` and `nn.GRU`. Implemented cells: -**FastRNN** ([`FastRNN`](../../pytorch_edgeml/graph/rnn.py#L968)), **FastGRNN** ([`FastGRNN`](../../pytorch_edgeml/graph/rnn.py#L993)), **UGRNN** ([`UGRNN`](../../edgeml_pytorch/graph/rnn.py#L945)), **GRU** ([`GRU`](../../edgeml/graph/rnn.py#L922)) and **LSTM** ([`LSTM`](../../pytorch_edgeml/graph/rnn.py#L899)). - -Note that all the cells and wrappers (when used independently from `fastcell_example.py` or `edgeml_pytorch.trainer.fastTrainer`) take in data in a batch first format ie., [batchSize, timeSteps, inputDims] by default but it can also support [timeSteps, batchSize, inputDims] format by setting `batch_first` argument to False when used. `fast_example.py` automatically takes care it while assuming the standard format between tf, c++ and pytorch. +There is also a sample cleanup and train/test script for the USPS10 public dataset. + +[`edgeml_pytorch.graph.rnn`](../../../pytorch/pytorch_edgeml/graph/rnn.py) +provides two RNN cells **FastRNNCell** and **FastGRNNCell** with additional +features like low-rank parameterisation and custom non-linearities. Akin to +Bonsai and ProtoNN, the three-phase training routine for FastRNN and FastGRNN +is decoupled from the custom cells to facilitate a plug and play behaviour of +the custom RNN cells in other architectures (NMT, Encoder-Decoder etc.). +Additionally, numerically equivalent CUDA-based implementations FastRNNCuda +and FastGRNNCuda are provided for faster training. +`edgeml_pytorch.graph.rnn` also contains modified RNN cells of **UGRNNCell**, +**GRUCell**, and **LSTMCell**, which can be substituted for Fast(G)RNN, +as well as untrolled RNNs which are equivalent to `nn.LSTM` and `nn.GRU`. + +Note that all the cells and wrappers, when used independently from `fastcell_example.py` +or `edgeml_pytorch.trainer.fastTrainer`, take in data in a batch first format, i.e., +[batchSize, timeSteps, inputDims] by default, but can also support [timeSteps, +batchSize, inputDims] format if `batch_first` argument is set to False. +`fast_example.py` automatically adjusts to the correct format across tf, c++ and pytorch. For training FastCells, `edgeml_pytorch.trainer.fastTrainer` implements the three-phase -FastCell training routine in PyTorch. A simple example, -`examples/fastcell_example.py` is provided to illustrate its usage. - -Note that `fastcell_example.py` assumes that data is in a specific format. It -is assumed that train and test data is contained in two files, `train.npy` and -`test.npy`. Each containing a 2D numpy array of dimension `[numberOfExamples, +FastCell training routine in PyTorch. A simple example `fastcell_example.py` is provided +to illustrate its usage. Note that `fastcell_example.py` assumes that data is in a specific format. +It is assumed that train and test data is contained in two files, `train.npy` and +`test.npy`, each containing a 2D numpy array of dimension `[numberOfExamples, numberOfFeatures]`. numberOfFeatures is `timesteps x inputDims`, flattened -across timestep dimension. So the input of 1st timestep followed by second and -so on. For an N-Class problem, we assume the labels are integers from 0 +across timestep dimension with the input of the first time step followed by the second +and so on. For an N-Class problem, we assume the labels are integers from 0 through N-1. Lastly, the training data, `train.npy`, is assumed to well shuffled as the training routine doesn't shuffle internally. @@ -36,9 +38,8 @@ as the training routine doesn't shuffle internally. ## Download and clean up sample dataset -We will be testing out the validation of the code by using the USPS dataset. -The download and cleanup of the dataset to match the above-mentioned format is -done by the script [fetch_usps.py](fetch_usps.py) and +To validate the code with USPS dataset, first download and format the dataset to match +the required format using the script [fetch_usps.py](fetch_usps.py) and [process_usps.py](process_usps.py) ``` @@ -46,17 +47,17 @@ python fetch_usps.py python process_usps.py ``` +Note: Even though usps10 is not a time-series dataset, it can be regarding as a time-series +dataset where time step sees a new row. So the number of timesteps = 16 and inputDims = 16. ## Sample command for FastCells on USPS10 -The following sample run on usps10 should validate your library: - -Note: Even though usps10 is not a time-series dataset, it can be assumed as, a time-series where each row is coming in at one single time. -So the number of timesteps = 16 and inputDims = 16 +The following is a sample run on usps10 : ```bash python fastcell_example.py -dir usps10/ -id 16 -hd 32 ``` -This command should give you a final output screen which reads roughly similar to (might not be exact numbers due to various version mismatches): +This command should give you a final output that reads roughly similar to +(might not be exact numbers due to various version mismatches): ``` Maximum Test accuracy at compressed model size(including early stopping): 0.9407075 at Epoch: 262 @@ -64,23 +65,26 @@ Final Test Accuracy: 0.93721974 Non-Zeros: 1932 Model Size: 7.546875 KB hasSparse: False ``` -`usps10/` directory will now have a consolidated results file called `FastRNNResults.txt` or `FastGRNNResults.txt` depending on the choice of the RNN cell. -A directory `FastRNNResults` or `FastGRNNResults` with the corresponding models with each run of the code on the `usps10` dataset. +`usps10/` directory will now have a consolidated results file called `FastRNNResults.txt` or +`FastGRNNResults.txt` depending on the choice of the RNN cell. A directory `FastRNNResults` or +`FastGRNNResults` with the corresponding models with each run of the code on the `usps10` dataset. -Note that the scalars like `alpha`, `beta`, `zeta` and `nu` are all before the application of the sigmoid function over them. +Note that the scalars like `alpha`, `beta`, `zeta` and `nu` correspond to the values before +the application of the sigmoid function. ## Byte Quantization(Q) for model compression -If you wish to quantize the generated model to use byte quantized integers use `quantizeFastModels.py`. Usage Instructions: +If you wish to quantize the generated model, use `quantizeFastModels.py`. Usage Instructions: ``` python quantizeFastModels.py -h ``` -This will generate quantized models with a suffix of `q` before every param stored in a new directory `QuantizedFastModel` inside the model directory. -One can use this model further on edge devices. +This will generate quantized models with a suffix of `q` before every param stored in a +new directory `QuantizedFastModel` inside the model directory. -Note that the scalars like `qalpha`, `qbeta`, `qzeta` and `qnu` are all after the application of the sigmoid function over them and quantization, they can be directly plugged into the inference pipleines. +Note that the scalars like `qalpha`, `qbeta`, `qzeta` and `qnu` correspond to values +after the application of the sigmoid function over them post quantization; +they can be directly plugged into the inference pipleines. Copyright (c) Microsoft Corporation. All rights reserved. - Licensed under the MIT license. diff --git a/pytorch/README.md b/pytorch/README.md index 13f253f69..3cfac80b1 100644 --- a/pytorch/README.md +++ b/pytorch/README.md @@ -1,24 +1,39 @@ ## Edge Machine Learning: Pytorch Library -This directory includes PyTorch implementations of various techniques and -algorithms developed as part of EdgeML. Currently, the following algorithms are -available in Tensorflow: - -1. [Bonsai](/docs/publications/Bonsai.pdf) -2. S-RNN -3. [FastRNN & FastGRNN](/docs/publications/FastGRNN.pdf) -4. [ProtoNN](/docs/publications/ProtoNN.pdf) - -The PyTorch graphs for these algoriths are packaged as `edgeml_pytorch.graph`. -Trainers for these algorithms are in `edgeml_pytorch.trainer`. -Usage directions and examples for these algorithms are provided in -`$EDGEML_ROOT/examples/pytorch` directory. To get started with any -of the provided algorithms, please follow the notebooks in the the -`examples/pytorch` directory. +This package includes PyTorch implementations of following algorithms and training +techniques developed as part of EdgeML. The PyTorch graphs for the forward/backward +pass of these algorithms are packaged as `edgeml_pytorch.graph` and the trainers +for these algorithms are in `edgeml_pytorch.trainer`. -## Installation +1. [Bonsai](/docs/publications/Bonsai.pdf): `edgeml_pytorch.graph.bonsai` implements + the Bonsai prediction graph. The three-phase training routine for Bonsai is decoupled + from the forward graph to facilitate a plug and play behaviour wherein Bonsai can be + combined with or used as a final layer classifier for other architectures (RNNs, CNNs). + See `edgeml_pytorch.trainer.bonsaiTrainer` for 3-phase training. +2. [ProtoNN](/docs/publications/ProtoNN.pdf): `edgeml_pytorch.graph.protoNN` implements the + ProtoNN prediction functions. The training routine for ProtoNN is decoupled from the forward + graph to facilitate a plug and play behaviour wherein ProtoNN can be combined with or used + as a final layer classifier for other architectures (RNNs, CNNs). The training routine is + implemented in `edgeml_pytorch.trainer.protoNNTrainer`. +3. [FastRNN & FastGRNN](/docs/publications/FastGRNN.pdf): `edgeml_pytorch.graph.rnn` provides + various RNN cells --- including new cells `FastRNNCell` and `FastGRNNCell` as well as + `UGRNNCell`, `GRUCell`, and `LSTMCell` --- with features like low-rank parameterisation + of weight matrices and custom non-linearities. Akin to Bonsai and ProtoNN, the three-phase + training routine for FastRNN and FastGRNN is decoupled from the custom cells to enable plug and + play behaviour of the custom RNN cells in other architectures (NMT, Encoder-Decoder etc.). + Additionally, numerically equivalent CUDA-based implementations `FastRNNCUDACell` and + `FastGRNNCUDACell` are provided for faster training. `edgeml_pytorch.graph.rnn`. + `edgeml_pytorch.graph.rnn.Fast(G)RNN(CUDA)` provides unrolled RNNs equivalent to `nn.LSTM` and `nn.GRU`. + `edgeml_pytorch.trainer.fastmodel` presents a sample multi-layer RNN + multi-class classifier model. +4. [S-RNN](/docs/publications/SRNN.pdf): `edgeml_pytorch.graph.rnn.SRNN2` implements a + 2 layer SRNN network which can be instantied with a choice of RNN cell. The training + routine for SRNN is in `edgeml_pytorch.trainer.srnnTrainer`. + +Usage directions and examples notebooks for this package are provided [here](/examples/pytorch). +## Installation + It is highly recommended that EdgeML be installed in a virtual environment. Please create a new virtual environment using your environment manager ([virtualenv](https://virtualenv.pypa.io/en/stable/userguide/#usage) or diff --git a/pytorch/edgeml_pytorch/trainer/fastmodel.py b/pytorch/edgeml_pytorch/trainer/fastmodel.py index a55c0bd75..f8baa52af 100644 --- a/pytorch/edgeml_pytorch/trainer/fastmodel.py +++ b/pytorch/edgeml_pytorch/trainer/fastmodel.py @@ -38,6 +38,7 @@ def __init__(self, rnn_name, input_dim, num_layers, hidden_units_list, self.linear = linear self.batch_first = batch_first self.apply_softmax = apply_softmax + self.rnn_name = rnn_name if self.linear: if not self.num_classes: @@ -57,6 +58,18 @@ def __init__(self, rnn_name, input_dim, num_layers, hidden_units_list, batch_first = self.batch_first) for l in range(self.num_layers)]) + if rnn_name == "FastGRNNCUDA": + RNN_ = getattr(getattr(getattr(__import__('edgeml_pytorch'), 'graph'), 'rnn'), 'FastGRNN') + self.rnn_list_ = nn.ModuleList([ + RNN_(self.input_dim if l==0 else self.hidden_units_list[l-1], + self.hidden_units_list[l], + gate_nonlinearity=self.gate_nonlinearity, + update_nonlinearity=self.update_nonlinearity, + wRank=self.wRank_list[l], uRank=self.uRank_list[l], + wSparsity=self.wSparsity_list[l], + uSparsity=self.uSparsity_list[l], + batch_first = self.batch_first) + for l in range(self.num_layers)]) # The linear layer is a fully connected layer that maps from hidden state space # to number of expected keywords if self.linear: @@ -66,16 +79,30 @@ def __init__(self, rnn_name, input_dim, num_layers, hidden_units_list, def sparsify(self): for rnn in self.rnn_list: - rnn.cell.sparsify() + if self.rnn_name is "FastGRNNCUDA": + rnn.to(torch.device("cpu")) + rnn.sparsify() + rnn.to(torch.device("cuda")) + else: + rnn.cell.sparsify() def sparsifyWithSupport(self): for rnn in self.rnn_list: - rnn.cell.sparsifyWithSupport() + if self.rnn_name is "FastGRNNCUDA": + rnn.to(torch.device("cpu")) + rnn.sparsifyWithSupport() + rnn.to(torch.device("cuda")) + else: + rnn.cell.sparsifyWithSupport() def get_model_size(self): total_size = 4 * self.hidden_units_list[self.num_layers-1] * self.num_classes + print(self.rnn_name) for rnn in self.rnn_list: - total_size += rnn.cell.get_model_size() + if self.rnn_name == "FastGRNNCUDA": + total_size += rnn.get_model_size() + else: + total_size += rnn.cell.get_model_size() return total_size def normalize(self, mean, std): @@ -130,15 +157,32 @@ def forward(self, input): input = (input - self.mean) / self.std rnn_in = input - for l in range(self.num_layers): - rnn = self.rnn_list[l] - model_output = rnn(rnn_in, hiddenState=self.hidden_states[l]) - self.hidden_states[l] = model_output.detach()[-1, :, :] + if self.rnn_name == "FastGRNNCUDA": if self.tracking: - weights = rnn.getVars() - model_output = onnx_exportable_rnn(rnn_in, weights, - rnn.cell, output=model_output) - rnn_in = model_output + for l in range(self.num_layers): + print("Layer: ", l) + rnn_ = self.rnn_list_[l] + model_output = rnn_(rnn_in, hiddenState=self.hidden_states[l]) + self.hidden_states[l] = model_output.detach()[-1, :, :] + weights = self.rnn_list[l].getVars() + weights = [weight.clone() for weight in weights] + model_output = onnx_exportable_rnn(rnn_in, weights, rnn_.cell, output=model_output) + rnn_in = model_output + else: + for l in range(self.num_layers): + rnn = self.rnn_list[l] + model_output = rnn(rnn_in, hiddenState=self.hidden_states[l]) + self.hidden_states[l] = model_output.detach()[-1, :, :] + rnn_in = model_output + else: + for l in range(self.num_layers): + rnn = self.rnn_list[l] + model_output = rnn(rnn_in, hiddenState=self.hidden_states[l]) + self.hidden_states[l] = model_output.detach()[-1, :, :] + if self.tracking: + weights = rnn.getVars() + model_output = onnx_exportable_rnn(rnn_in, weights, rnn.cell, output=model_output) + rnn_in = model_output if self.linear: model_output = self.hidden2keyword(model_output[-1, :, :]) diff --git a/tf/README.md b/tf/README.md index 83494456f..68f8a39b3 100644 --- a/tf/README.md +++ b/tf/README.md @@ -9,12 +9,10 @@ available in Tensorflow: 3. [FastRNN & FastGRNN](/docs/publications/FastGRNN.pdf) 4. [ProtoNN](/docs/publications/ProtoNN.pdf) -The TensorFlow compute graphs for these algoriths are packaged as -`edgeml_tf.graph`. Trainers for these algorithms are in `edgeml_tf.trainer`. -Usage directions and examples for these algorithms are provided in - `$EDGEML_ROOT/examples/tf` directory. -To get started with any of the provided algorithms, please follow -the notebooks in the `examples/tf` directory. +The TensorFlow compute graphs for these algoriths are packaged as `edgeml_tf.graph` +and trainers are in `edgeml_tf.trainer`. Usage directions and example notebook for +these algorithms are provided in the [examples/tf directory](/examples/tf). + ## Installation