index.json

[{"categories":null,"contents":"This tutorial will walk you through the required steps to configure and use the PyTorch C++ API (LibTorch) in Microsoft Visual Studio. Although the recommended build system for LibTorch is CMake, you might find yourself in situations where you need to integrate your code into an existing Visual Studio Project/Solution and don\u0026rsquo;t want to deal with CMake files in Windows. Following the steps in this tutorial should get you up and running with LibTorch in Visual Studio without needing to use CMake to build it. These steps have been tested on Visual Studio 2019 and the CPU version of LibTorch 1.5.1. Let\u0026rsquo;s get started!\nStep 1: Download LibTorch Download and extract the CPU version of LibTorch for Windows from here. Note that the Release and Debug versions have different download links, so get the one you need depending on your target configuration. We work with the Debug version here.\nStep 2: Set Windows Environment Variables We\u0026rsquo;re going to create some environment variables to make things easier. Use the following commands in a Windows Terminal to create an environment variable for the LibTorch directory:\nsetx LIBTORCH_DEBUG_HOME \u0026#34;C:\\libtorch-debug-v1.5.1\u0026#34; set LIBTORCH_DEBUG_HOME \u0026#34;C:\\libtorch-debug-v1.5.1\u0026#34; where \u0026quot;C:\\libtorch-debug-v1.5.1\u0026quot; is the path to the extracted LibTorch directory on your computer. Note that the setx command creates the variable globally for Windows, and the set command creates it just for the current session.\nIf you want to build PyTorch C++ extensions, you\u0026rsquo;ll need to add the Python header files to your Visual Studio project. Use the following commands to create environment variables for your Python path. Note that if Python was installed as part of the Visual Studio, the Python directory should be in \u0026quot;C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\u0026quot;. Otherwise locate your Python installation directory and change the path accordingly.\nsetx PYTHON_HOME \u0026#34;C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\\Python37_64\u0026#34; set PYTHON_HOME \u0026#34;C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\\Python37_64\u0026#34; Step 3: Configure Your Visual Studio Project Open Visual Studio, create a project and make sure the Platform is set to x64. Following this tutorial, let\u0026rsquo;s create a simple C++ file called dcgan.cpp with the following contents to test our setup later.\n#include \u0026lt;torch\\torch.h\u0026gt; #include \u0026lt;iostream\u0026gt; int main(){ torch::Tensor tensor = torch::eye(3); std::cout \u0026lt;\u0026lt; tensor \u0026lt;\u0026lt; std::endl; return 0; } In Project Properties under C/C++ -\u0026gt; General -\u0026gt; Additional Include Directories add the path to LibTorch and Python include folders:\n$(LIBTORCH_DEBUG_HOME)\\include $(PYTHON_HOME)\\include In Project Properties under Linker -\u0026gt; General -\u0026gt; Additional Library Directories add the path to LibTorch and Python lib folders:\n$(LIBTORCH_DEBUG_HOME)\\lib $(PYTHON_HOME)\\lib In Project Properties under Linker -\u0026gt; Input -\u0026gt; Additional Dependencies add the following libraries:\ntorch.lib torch_cpu.lib c10.lib python37.lib We also need to add a Post-Build Event to copy all the LibTorch dll files to the target directory after every build. Without these files you\u0026rsquo;ll be getting a runtime error when executing your program. In Project Properties under Build Events -\u0026gt; Post-Build Event -\u0026gt; Command Line add the following command:\ncopy /y $(LIBTORCH_DEBUG_HOME)\\lib\\*.dll $(TargetDir) Step 4: Build and Run the Code Now you can go ahead and build the project. Make sure you choose the same build configuration (Debug/Release) as the downloaded LibTorch package. To test our setup, run the generated executable file. The output should be a 3x3 diagonal matrix:\n1 0 0 0 1 0 0 0 1 [ CPUFloatType{3,3} ] ","date":"June 22, 2020","hero":"/posts/004_adv_pytorch_integrating_pytorch_cpp_frontend_in_visual_studio_on_windows/images/featured.png","permalink":"https://mrnabati.github.io/posts/004_adv_pytorch_integrating_pytorch_cpp_frontend_in_visual_studio_on_windows/","summary":"This tutorial will walk you through the required steps to configure and use the PyTorch C++ API (LibTorch) in Microsoft Visual Studio. Although the recommended build system for LibTorch is CMake, you might find yourself in situations where you need to integrate your code into an existing Visual Studio Project/Solution and don\u0026rsquo;t want to deal with CMake files in Windows. Following the steps in this tutorial should get you up and running with LibTorch in Visual Studio without needing to use CMake to build it.","tags":null,"title":"Adv. PyTorch: Configuring MS Visual Studio for Using PyToch C++ API in Windows"},{"categories":null,"contents":"All the pre-trained models provided in the torchvision package in PyTorch are trained on the ImageNet dataset and can be used out of the box on this dataset. But often times you want to use these models on other available image datasets or even your own custom dataset. This usually requires modifying and fine-tuning the model to work with the new dataset. Changing the output dimension of the last layer in the model is usually among the first changes you need to make, and that\u0026rsquo;s the focus of this post.\nLet\u0026rsquo;s start with loading a pre-trained model from the torchvision package. We use the VGG16 model, pretrained on the ImageNet dataset with 1000 object categories. Let\u0026rsquo;s take a look at the modules on this model:\nimport torch import torch.nn as nn import torchvision.models as models vgg16 = models.vgg16(pretrained=True) print(vgg16._modules.keys()) odict_keys([\u0026#39;features\u0026#39;, \u0026#39;avgpool\u0026#39;, \u0026#39;classifier\u0026#39;]) We are only interested in the last layer, so let\u0026rsquo;s print the layers in the \u0026lsquo;classifier\u0026rsquo; module:\nprint(vgg16._modules[\u0026#39;classifier\u0026#39;]) Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) As expected, the output dimension for the last layer is 1000. Let\u0026rsquo;s assume we are going to use this model on the COCO dataset with 80 object categories. To change the output dimension of the model to 80, we simply replace the last sub-layer with a new Linear layer. The Linear layer takes two required arguments: in_features and out_features. The in_features is going to be the same as before, and out_features is goint to be 80:\nin_features = vgg16._modules[\u0026#39;classifier\u0026#39;][-1].in_features out_features = 80 vgg16._modules[\u0026#39;classifier\u0026#39;][-1] = nn.Linear(in_features, out_features, bias=True) print(vgg16._modules[\u0026#39;classifier\u0026#39;]) Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=80, bias=True) ) That\u0026rsquo;s it! The output dimension is now 80. You need to keep in mind that by replacing the last layer we removed any learned parameter in this layer. You need to finetune the model on the new dataset at this point to learn the parameters again.\n","date":"June 21, 2020","hero":"/posts/003_adv_pytorch_modifying_the_last_layer/images/featured.png","permalink":"https://mrnabati.github.io/posts/003_adv_pytorch_modifying_the_last_layer/","summary":"All the pre-trained models provided in the torchvision package in PyTorch are trained on the ImageNet dataset and can be used out of the box on this dataset. But often times you want to use these models on other available image datasets or even your own custom dataset. This usually requires modifying and fine-tuning the model to work with the new dataset. Changing the output dimension of the last layer in the model is usually among the first changes you need to make, and that\u0026rsquo;s the focus of this post.","tags":null,"title":"Adv. PyTorch: Modifying the Last Layer"},{"categories":null,"contents":"If you\u0026rsquo;re planning to fine-tune a trained model on a different dataset, chances are you\u0026rsquo;re going to freeze some of the early layers and only update the later layers. I won\u0026rsquo;t go into the details of why you may want to freeze some layers and which ones should be frozen, but I\u0026rsquo;ll show you how to do it in PyTorch. Let\u0026rsquo;s get started!\nWe first need a pre-trained model to start with. The models subpackage in the torchvision package provides definitions for many of the poplular model architectures for image classification. You can construct these models by simply calling their constructor, which would initialize the model with random weights. To use the pre-trained models from the PyTorch Model Zoo, you can call the constructor with the pretrained=True argument. Let\u0026rsquo;s load the pretrained VGG16 model:\nimport torch import torch.nn as nn import torchvision.models as models vgg16 = models.vgg16(pretrained=True) This will start downloading the pretrained model into your computer\u0026rsquo;s PyTorch cache folder, which usually is the .cache/torch/checkpoints folder under your home directory.\nThere are multiple ways you can look into the model to see its modules and layers. One way is using the .modules() member function, which returns in iterator containing all the member objects of the model. The .modules() functions recursively goes thruogh all the modules and submodules of the model:\nprint(list(vgg16.modules())) [VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) ), Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ), Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), ReLU(inplace=True), MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), AdaptiveAvgPool2d(output_size=(7, 7)), Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ), Linear(in_features=25088, out_features=4096, bias=True), ReLU(inplace=True), Dropout(p=0.5, inplace=False), Linear(in_features=4096, out_features=4096, bias=True), ReLU(inplace=True), Dropout(p=0.5, inplace=False), Linear(in_features=4096, out_features=1000, bias=True)] That\u0026rsquo;s a lot of information spewed out onto the screen! Let\u0026rsquo;s use the .named_module() function instead, which returns a (name, module) tuple and only print the names:\nfor (name, module) in vgg16.named_modules(): print(name) features features.0 features.1 features.2 features.3 features.4 features.5 features.6 features.7 features.8 features.9 features.10 features.11 features.12 features.13 features.14 features.15 features.16 features.17 features.18 features.19 features.20 features.21 features.22 features.23 features.24 features.25 features.26 features.27 features.28 features.29 features.30 avgpool classifier classifier.0 classifier.1 classifier.2 classifier.3 classifier.4 classifier.5 classifier.6 That\u0026rsquo;s much better! We can see the top level modules are features, avgpool and classifier. We can also see that the features and calssifier modules consist of 31 and 7 layers respectively. These layers are not named, and only have numbers associated with them. If you want to see an even more concise representation of the network, you can use the .named_children() function which does not go inside the top level modules recursively:\nfor (name, module) in vgg16.named_children(): print(name) features avgpool classifier Now let\u0026rsquo;s see what layers are there under the features module. Here we use the .children() function to get the layers under the features module, since these layers are not \u0026rsquo;named':\nfor (name, module) in vgg16.named_children(): if name == \u0026#39;features\u0026#39;: for layer in module.children(): print(layer) Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace=True) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) We can even go deeper and look at the parameters in each layer. Let\u0026rsquo;s get the parameters of the first layer under the features module:\nfor (name, module) in vgg16.named_children(): if name == \u0026#39;features\u0026#39;: for layer in module.children(): for param in layer.parameters(): print(param) break Parameter containing: tensor([[[[-5.5373e-01, 1.4270e-01, 5.2896e-01], [-5.8312e-01, 3.5655e-01, 7.6566e-01], [-6.9022e-01, -4.8019e-02, 4.8409e-01]], [[ 1.7548e-01, 9.8630e-03, -8.1413e-02], [ 4.4089e-02, -7.0323e-02, -2.6035e-01], [ 1.3239e-01, -1.7279e-01, -1.3226e-01]], [[ 3.1303e-01, -1.6591e-01, -4.2752e-01], [ 4.7519e-01, -8.2677e-02, -4.8700e-01], [ 6.3203e-01, 1.9308e-02, -2.7753e-01]]], [[[ 2.3254e-01, 1.2666e-01, 1.8605e-01], [-4.2805e-01, -2.4349e-01, 2.4628e-01], [-2.5066e-01, 1.4177e-01, -5.4864e-03]], [[-1.4076e-01, -2.1903e-01, 1.5041e-01], [-8.4127e-01, -3.5176e-01, 5.6398e-01], [-2.4194e-01, 5.1928e-01, 5.3915e-01]], [[-3.1432e-01, -3.7048e-01, -1.3094e-01], [-4.7144e-01, -1.5503e-01, 3.4589e-01], [ 5.4384e-02, 5.8683e-01, 4.9580e-01]]], [[[ 1.7715e-01, 5.2149e-01, 9.8740e-03], [-2.7185e-01, -7.1709e-01, 3.1292e-01], [-7.5753e-02, -2.2079e-01, 3.3455e-01]], [[ 3.0924e-01, 6.7071e-01, 2.0546e-02], [-4.6607e-01, -1.0697e+00, 3.3501e-01], [-8.0284e-02, -3.0522e-01, 5.4460e-01]], [[ 3.1572e-01, 4.2335e-01, -3.4976e-01], [ 8.6354e-02, -4.6457e-01, 1.1803e-02], [ 1.0483e-01, -1.4584e-01, -1.5765e-02]]], ..., [[[ 7.7599e-02, 1.2692e-01, 3.2305e-02], [ 2.2131e-01, 2.4681e-01, -4.6637e-02], [ 4.6407e-02, 2.8246e-02, 1.7528e-02]], [[-1.8327e-01, -6.7425e-02, -7.2120e-03], [-4.8855e-02, 7.0427e-03, -1.2883e-01], [-6.4601e-02, -6.4566e-02, 4.4235e-02]], [[-2.2547e-01, -1.1931e-01, -2.3425e-02], [-9.9171e-02, -1.5143e-02, 9.5385e-04], [-2.6137e-02, 1.3567e-03, 1.4282e-01]]], [[[ 1.6520e-02, -3.2225e-02, -3.8450e-03], [-6.8206e-02, -1.9445e-01, -1.4166e-01], [-6.9528e-02, -1.8340e-01, -1.7422e-01]], [[ 4.2781e-02, -6.7529e-02, -7.0309e-03], [ 1.1765e-02, -1.4958e-01, -1.2361e-01], [ 1.0205e-02, -1.0393e-01, -1.1742e-01]], [[ 1.2661e-01, 8.5046e-02, 1.3066e-01], [ 1.7585e-01, 1.1288e-01, 1.1937e-01], [ 1.4656e-01, 9.8892e-02, 1.0348e-01]]], [[[ 3.2176e-02, -1.0766e-01, -2.6388e-01], [ 2.7957e-01, -3.7416e-02, -2.5471e-01], [ 3.4872e-01, 3.0041e-02, -5.5898e-02]], [[ 2.5063e-01, 1.5543e-01, -1.7432e-01], [ 3.9255e-01, 3.2306e-02, -3.5191e-01], [ 1.9299e-01, -1.9898e-01, -2.9713e-01]], [[ 4.6032e-01, 4.3399e-01, 2.8352e-01], [ 1.6341e-01, -5.8165e-02, -1.9196e-01], [-1.9521e-01, -4.5630e-01, -4.2732e-01]]]], requires_grad=True) Parameter containing: tensor([ 0.4034, 0.3778, 0.4644, -0.3228, 0.3940, -0.3953, 0.3951, -0.5496, 0.2693, -0.7602, -0.3508, 0.2334, -1.3239, -0.1694, 0.3938, -0.1026, 0.0460, -0.6995, 0.1549, 0.5628, 0.3011, 0.3425, 0.1073, 0.4651, 0.1295, 0.0788, -0.0492, -0.5638, 0.1465, -0.3890, -0.0715, 0.0649, 0.2768, 0.3279, 0.5682, -1.2640, -0.8368, -0.9485, 0.1358, 0.2727, 0.1841, -0.5325, 0.3507, -0.0827, -1.0248, -0.6912, -0.7711, 0.2612, 0.4033, -0.4802, -0.3066, 0.5807, -1.3325, 0.4844, -0.8160, 0.2386, 0.2300, 0.4979, 0.5553, 0.5230, -0.2182, 0.0117, -0.5516, 0.2108], requires_grad=True) Now that we have access to all the modules, layers and their parameters, we can easily freeze them by setting the parameters\u0026rsquo; requires_grad flag to False. This would prevent calculating the gradients for these parameters in the backward step which in turn prevents the optimizer from updating them.\nNow let\u0026rsquo;s freeze all the parameters in the features module:\nlayer_counter = 0 for (name, module) in vgg16.named_children(): if name == \u0026#39;features\u0026#39;: for layer in module.children(): for param in layer.parameters(): param.requires_grad = False print(\u0026#39;Layer \u0026#34;{}\u0026#34; in module \u0026#34;{}\u0026#34; was frozen!\u0026#39;.format(layer_counter, name)) layer_counter+=1 Layer \u0026#34;0\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;1\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;2\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;3\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;4\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;5\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;6\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;7\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;8\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;9\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;10\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;11\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;12\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;13\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;14\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;15\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;16\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;17\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;18\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;19\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;20\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;21\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;22\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;23\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;24\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;25\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;26\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;27\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;28\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;29\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Layer \u0026#34;30\u0026#34; in module \u0026#34;features\u0026#34; was frozen! Now that some of the parameters are frozen, the optimizer needs to be modified to only get the parameters with requires_grad=True. We can do this by writing a Lambda function when constructing the optimizer:\noptimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, vgg16.parameters()), lr=0.001) You can now start training your partially frozen model!\n","date":"May 22, 2020","hero":"/posts/002_adv_pytorch_freezing_layers/images/featured.jpg","permalink":"https://mrnabati.github.io/posts/002_adv_pytorch_freezing_layers/","summary":"If you\u0026rsquo;re planning to fine-tune a trained model on a different dataset, chances are you\u0026rsquo;re going to freeze some of the early layers and only update the later layers. I won\u0026rsquo;t go into the details of why you may want to freeze some layers and which ones should be frozen, but I\u0026rsquo;ll show you how to do it in PyTorch. Let\u0026rsquo;s get started!\nWe first need a pre-trained model to start with.","tags":null,"title":"Adv. PyTorch: Freezing Layers"},{"categories":["Project"],"contents":"Introduction The perception system in autonomous vehicles is responsible for detecting and tracking the surrounding objects. This is usually done by taking advantage of several sensing modalities to increase robustness and accuracy, which makes sensor fusion a crucial part of the perception system. In this paper, we focus on the problem of radar and camera sensor fusion and propose a middle-fusion approach to exploit both radar and camera data for 3D object detection. Our approach, called CenterFusion, first uses a center point detection network to detect objects by identifying their center points on the image. It then solves the key data association problem using a novel frustum-based method to associate the radar detections to their corresponding object’s center point. The associated radar detections are used to generate radar-based feature maps to complement the image features, and regress to object properties such as depth, rotation and velocity.\nOur Approach We propose CenterFusion, a middle-fusion approach to exploit radar and camera data for 3D object detection. CenterFusion focuses on associating radar detections to preliminary detection results obtained from the image, then generates radar feature maps and uses it in addition to image features to accurately estimate 3D bounding boxes for objects. Particularly, we generate preliminary 3D detections using a key point detection network, and propose a novel frustum-based radar association method to accurately associate radar detections to their corresponding objects in the 3D space. These radar detections are then mapped to the image plane and used to create feature maps to complement the image-based features. Finally, the fused features are used to accurately estimate objects’ 3D properties such as depth, rotation and velocity. The network architecture for CenterFusion is shown in the figure below.\nCenterFusion network architecture Center Point Detection We adopt the CenterNet detection network for generating preliminary detections on the image. The image features are first extracted using a fully convolutional encoder-decoder backbone network. We follow CenterNet and use a modified version of the Deep Layer Aggregation (DLA) network as the backbone. The extracted image features are then used to predict object center points on the image, as well as the object 2D size (width and height), center offset, 3D dimensions, depth and rotation. These values are predicted by the primary regression heads as shown in the network architecture figure. Each primary regression head consists of a 3×3 convolution layer with 256 channels and a 1×1 convolutional layer to generate the desired output. This provides an accurate 2D bounding box as well as a preliminary 3D bounding box for every detected object in the scene.\nRadar Association The center point detection network only uses the image features at the center of each object to regress to all other object properties. To fully exploit radar data in this process, we first need to associate the radar detections to their corresponding object on the image plane. To accomplish this, a naive approach would be mapping each radar detection point to the image plane and associating it to an object if the point is mapped inside the 2D bounding box of that object. This is not a very robust solution, as there is not a one-to-one mapping between radar detections and objects in the image; Many objects in the scene generate multiple radar detections, and there are also radar detections that do not correspond to any object. Additionally, because the z dimension of the radar detection is not accurate (or does not exist at all), the mapped radar detection might end up outside the 2D bounding box of its corresponding object. Finally, radar detections obtained from occluded objects would map to the same general area in the image, which makes differentiating them in the 2D image plane difficult, if possible at all.\nWe develop a frustum association method that uses the object’s 2D bounding box as well as its estimated depth and size to create a 3D Region of Interest (RoI) frustum for the object. Having an accurate 2D bounding box for an object, we create a frustum for that object as shown in the figure below. This significantly narrows down the radar detections that need to be checked for association, as any point outside this frustum can be ignored. We then use the estimated object depth, dimension and rotation to create a RoI around the object, to further filter out radar detections that are not associated with this object. If there are multiple radar detections inside this RoI, we take the closest point as the radar detection corresponding to this object.\nFrustum association. An object detected using the image features (left), generating the ROI frustum based on object\u0026#39;s 3D bounding box (middle), and the BEV of the ROI frustum showing radar detections inside the frustum (right). $\\delta$ is used to increase the frustum size in the testing phase. $\\hat{d}$ is the ground truth depth in the training phase and the estimated object depth in the testing phase. The RoI frustum approach makes associating overlapping objects effortless, as objects are separated in the 3D space and would have separate RoI frustums. It also eliminates the multi-detection association problem, as only the closest radar detection inside the RoI frustum is associated to the object. It does not, however, help with the inaccurate z dimension problem, as radar detections might be outside the ROI frustum of their corresponding object due to their inaccurate height information.\nTo address the inaccurate height information problem, we introduce a radar point cloud preprocessing step called pillar expansion, where each radar point is expanded to a fixed-size pillar, as illustrated in the figure below. Pillars create a better representation for the physical objects detected by the radar, as these detections are now associated with a dimension in the 3D space. Having this new representation, we simply consider a radar detection to be inside a frustum if all or part of its corresponding pillar is inside the frustum, as illustrated in the network architecture figure.\nExpanding radar points to 3D pillars (top image). Directly mapping the pillars to the image and replacing with radar depth information results in poor association with objects\u0026#39; center and many overlapping depth values (middle image). Frustum association accurately maps the radar detections to the center of objects and minimizes overlapping (bottom image). In the above figure, radar detections are only associated to objects with a valid ground truth or detection box, and only if all or part of the radar detection pillar is inside the box. Frustum association also prevents associating radar detections caused by background objects such as buildings to foreground objects, as seen in the case of pedestrians on the right hand side of the image.\nRadar Feature Extraction After associating radar detections to their corresponding objects, we use the depth and velocity of the radar detections to create complementary features for the image. Particularly, for every radar detection associated to an object, we generate three heat map channels centered at and inside the object’s 2D bounding box, as shown in the figure above. The width and height of the heatmaps are proportional to the object\u0026rsquo;s 2D bounding box, and are controlled by a parameter $\\alpha$. The heatmap values are the normalized object depth $d$ and also the $x$ and $y$ components of the radial velocity ($v_x$ and $v_y$) in the egocentric coordinate system:\n$$ \\begin{equation*} F_{x,y,i}^{j} = \\frac{1}{M_i} \\begin{cases} f_i \u0026amp; \\hskip{5pt} |x-c_{x}^{j}|\\leq \\alpha w^j \\hspace{5pt} \\text{and} \\hspace{5pt} |y-c_{y}^{i}| \\leq \\alpha h^j \\\\ 0 \u0026amp; \\hskip{5pt} \\text{otherwise} \\end{cases} \\end{equation*} $$\nwhere $i \\in 1, 2, 3$ is the feature map channel, $M_i$ is a normalizing factor, $f_i$ is the feature value ($d$, $v_x$ or $v_y$), $c^j_x$ and $c^j_y$ are the $x$ and $y$ coordinates of the $j$th object’s center point on the image and $w^j$ and $h^j$ are the width and height of the $j$th object’s 2D bounding box. If two objects have overlapping heatmap areas, the one with a smaller depth value dominates, as only the closest object is fully visible in the image.\nThe generated heat maps are then concatenated to the image features as extra channels. These features are used as inputs to the secondary regression heads to recalculate the object’s depth and rotation, as well as velocity and attributes. The velocity regression head estimates the x and y components of the object’s actual velocity in the vehicle coordinate system. The attribute regression head estimates different attributes for different object classes, such as moving or parked for the Car class and standing or sitting for the Pedestrian class. The secondary regression heads consist of three convolutional layers with 3$\\times$3 kernels followed by a 1$\\times$1 convolutional layer to generate the desired output. The extra convolutional layers compared to the primary regression heads help with learning higher level features from the radar feature maps. The last step is decoding the regression head results into 3D bounding boxes. The box decoder block uses the estimated depth, velocity, rotation, and attributes from the secondary regression heads, and takes the other object properties from the primary heads.\nResults We compare our radar and camera fusion network with the state-of-the-art camera-based models on the nuScenes benchmark, and also a LIDAR based method. Table 1 shows the results on both test and validation splits of the nuScenes dataset.\nQuantitative results The detection results compared to the ground truth in camera-view and birds eye view are shown in the figure below:\nQualitative results from CenterFusion (row 1 \u0026amp; 2) and CenterNet (row 3 \u0026amp; 4) in camera view and BEV. In the BEV plots, detection boxes are shown in cyan and ground truth boxes in red. The radar point cloud is shown in green. Red and blue arrows on objects show the ground truth and predicted velocity vectors respectively. For a more detailed discussion on the results and also the ablation study, see our WACV 2021 conference paper.\n","date":"May 10, 2020","hero":"/posts/projects/06_centerfusion/images/featured.png","permalink":"https://mrnabati.github.io/posts/projects/06_centerfusion/","summary":"A center-based radar and camera fusion for 3D object detection in autonomous vehicles.","tags":["Autonomous Driving","Sensor Fusion","Object Detection"],"title":"CenterFusion"},{"categories":["Project"],"contents":"Introduction Radar Region Proposal Network (RRPN) is a Radar-based real-time region proposal algorithm for object detection in autonomous driving vehicles. RRPN generates object proposals by mapping Radar detections to the image coordinate system and generating pre-defined anchor boxes for each mapped Radar detection point. These anchor boxes are then transformed and scaled based on the object’s distance from the vehicle, to provide more accurate proposals for the detected objects. The generated proposals can be used in any two-stage object detection network such as Fast-RCNN. Relying only on Radar detections to generate object proposals makes an extremely fast RPN, making it suitable for autonomous driving applications. Aside from being a RPN for an object detection algorithm, the proposed network also inherently acts as a sensor fusion algorithm by fusing the Radar and camera data to obtain higher accuracy and reliability.\nRRPN also provides an attention mechanism to focus the underlying computational resources on the more important parts of the input data. While in other object detection applications the entire image may be of equal importance. In an autonomous driving application more attention needs to be given to objects on the road. For example in a highway driving scenario, the perception system needs to be able to detect all the vehicles on the road, but there is no need to dedicate resources to detect a picture of a vehicle on a billboard. A Radar based RPN focuses only on the physical objects surrounding the vehicle, hence inherently creating an attention mechanism focusing on parts of the input image that are more important.\nOur Approach The first step in generating ROIs is mapping the radar detections from the vehicle coordinates to the camera-view coordinates. Radar detections are reported in a bird’s eye view perspective as shown in image (a) in the figure below, with the object’s range and azimuth measured in the vehicle’s coordinate system. By mapping these detections to the camera-view coordinates, we are able to associate the objects detected by the Radars to those seen in the images obtained by the camera.\nGenerating anchors from radar detections. Anchor Generation Once the Radar detections are mapped to the image coordinates, we have the approximate location of every detected object in the image. These mapped Radar detections, hereafter called Points of Interest (POI), provide valuable information about the objects in each image, without any processing on the image itself. Having this information, a simple approach for proposing ROIs would be introducing a bounding box centered at every POI. One problem with this approach is that Radar detections are not always mapped to the center of the detected objects in every image. Another problem is the fact that Radars do not provide any information about the size of the detected objects and proposing a fixed-size bounding box for objects of different sizes would not be an effective approach.\nWe use the idea of anchor bounding boxes from Faster R-CNN to alleviate the problems mentioned above. For every POI, we generate several bounding boxes with different sizes and aspect ratios centered at the POI, as shown in the figure above (b). We use 4 different sizes and 3 different aspect ratios to generate these anchors. To account for the fact that the POI is not always mapped to the center of the object in the image coordinate, we also generate different translated versions of the anchors. These translated anchors provide more accurate bounding boxes when the POI is mapped towards the right, left or the bottom of the object as shown in figure above. The generated anchors for a radar detection is shown in the figure below:\nAnchors generated from a radar detection Distance Compensation The distance of each object from the vehicle plays an important role in determining its size in the image. Generally, objects’ sizes in an image have an inverse relationship with their distance from the camera. Radar detections have the range information for every detected object, which is used in this step to scale all generated anchors. We use the following formula to determine the scaling factor to use on the anchors:\n$$ S_i = \\alpha \\dfrac{1}{d_i} + \\beta $$\nwhere $d_i$ is the distance to the $i$th object, and $\\alpha$ and $\\beta$ are two parameters used to adjust the scale factor. These parameters are learned by maximizing the Intersection Over Union (IOU) between the generated bounding boxes and the ground truth bounding boxes in each image. The generated proposals for two radar detection points after distance compensation are shown in the figure below:\nProposals after distance compensation. Figures below show a sample image and the proposals generated from all radar detections:\nSample image with ground truth and radar detections Generated proposals from all radar detections. The evaluation results are provided in our ICIP 2019 conference paper.\n","date":"May 10, 2020","hero":"/posts/projects/04_rrpn/images/featured.png","permalink":"https://mrnabati.github.io/posts/projects/04_rrpn/","summary":"Radar Region Proposal Network (RRPN) uses Radar detections in an autonomous vehicle to generate real-time region proposals for two-stage object detection networks.","tags":["Autonomous Driving","Sensor Fusion","Object Detection"],"title":"Radar Region Proposal Network"},{"categories":["Project"],"contents":"Introduction In this project, we designed and implemented a radar-camera fusion algorithm for joint object detection and distance estimation in autonomous driving applications. The proposed method is designed as a two-stage object detection network that fuses radar point clouds and learned image features to generate accurate object proposals. For every object proposal, a depth value is also calculated to estimate the object’s distance from the vehicle. These proposals are then fed into the second stage of the detection network for object classification. We evaluate our network on the nuScenes dataset, which provides synchronized data from multiple radar and camera sensors on a vehicle. Our experiments show that the proposed method outperforms other radar-camera fusion methods in the object detection task and is capable of accurately estimating distance for all detected objects.\nApproach Network architecture Our proposed sensor fusion network is shown in the figure above. The network takes radar point clouds and RGB images as input and generates accurate object proposals for a two-stage object detection framework. We take a middle-fusion approach for fusing the radar and image data, where outputs of each sensor are processed independently first, and are merged at a later stage for more processing. More specifically, we first use the radar detections to generate 3D object proposals, then map the proposals to the image and use the image features extracted by a backbone network to improve their localization. These proposals are then merged with image-based proposals generated in a RPN, and are fed to the second stage for classification. All generated proposals are associated with an estimated depth, calculated either directly from the radar detections, or via a distance regressor layer in the RPN network.\nProposal Generation We treat every radar point as a stand-alone detection and generate 3D object proposals for them directly without any feature extraction. These proposals are generated using predefined 3D anchors for every object class in the dataset. Each 3D anchor is parameterized as $(x, y, z, w, l, h, r)$, where $(x, y, z)$ is the center, $(w, l, h)$ is the size, and $r$ is the orientation of the box in vehicle’s coordinate system. The anchor size, $(w, l, h)$, is fixed for each object category, and is set to the average size of the objects in each category in the training dataset. For every radar point, we generate $2n$ boxes from the 3D anchors, where $n$ is the number of object classes in the dataset, each having two different orientations at $0 $ and $90$ degrees. The 3D anchors for a radar detection is shown in the figure below:\n3D anchors for one radar detection point In the next step, all 3D anchors are mapped to the image plane and converted to equivalent 2D bounding boxes by finding the smallest enclosing box for each mapped anchor. Since every 3D proposal is generated from a radar detection, it has an accurate distance associated with it. This distance is used as the proposed distance for the generated 2D bounding box. This is illustrated in the figure below:\n3D anchors for one radar detection point All generated 2D proposals are fed into the Radar Proposal Refinement (RPR) subnetwork. This is where the information obtained from the radars (radar proposals) is fused with the information obtained from the camera (image features). RPR uses the features extracted from the image by the backbone network to adjust the size and location of the radar proposals on the image. As radar detections are not always centered on the corresponding objects on the image, the generated 3D anchors and corresponding 2D proposals might be offset as well. The box regressor layer in the RPR uses the image features inside each radar proposal to regress offset values for the proposal corner points. The RPR also contains a box classification layer, which estimates an objectness score for every radar proposal. The objectness score is used to eliminate proposals that are generated by radar detections coming from background objects, such as buildings and light poles. Figures below show the resulting 2D radar proposals before and after the refinement step.\nRadar proposals before refinement Radar proposals after refinement The Radar proposals are then merged with image-based proposals obtained from a Region Proposal Network (RPN). Before using these proposals in the next stage, redundant proposals are removed by applying NonMaximum Suppression (NMS). The NMS would normally remove overlapping proposals without discriminating based on the bounding box’s origin, but we note that radar-based proposals have more reliable distance information than the image-based proposals. This is because image-based distances are estimated only from 2D image feature maps with no depth information. To make sure the radar-based distances are not unnecessarily discarded in the NMS process, we first calculate the Intersection over Union (IoU) between radar and image proposals. Next we use an IoU threshold to find the matching proposals, and overwrite the imagebased distances by their radar-based counterparts for these matching proposals.\nDetection Network The inputs to the second stage detection network are the feature map from the image and object proposals. The structure of this network is similar to Fast R-CNN. The feature map is cropped for every object proposals and is fed into the RoI pooling layer to obtain feature vectors of the same size for all proposals. These feature vectors are further processed by a set of fully connected layers and are passed to the softmax and bounding box regression layers. The output is the category classification and bounding box regression for each proposal, in addition to the distance associated to every detected object. Similar to the RPN network, we use a cross entropy loss for object classification and a Smooth L1 loss for the box regression layer.\nResults The performance of our method is shown in the table below. This table shows the overall Average Precision (AP) and Average Recall (AR) for the detection task, and Mean Absolute Error for the distance estimation task. We use the Faster RCNN network as our image-based detection baseline, and compare our results with RRPN.\nMethod AP AP50 AP75 AR MAE Faster R-CNN 34.95 58.23 36.89 40.21 - RRPN 35.45 59.00 37.00 42.10 - Ours 35.60 60.53 37.38 42.10 2.65 The per-class performance is show in the table below:\nMethod Car Truck Person Bus Bicycle Motorcycle Faster R-CNN 51.46 33.26 27.06 47.73 24.27 25.93 RRPN 41.80 44.70 17.10 57.20 21.40 30.50 Ours 52.31 34.45 27.59 48.30 25.00 25.97 Figures below show the detection results for two different scenes:\nDetection results ","date":"May 10, 2020","hero":"/posts/projects/05_radar_camera_fusion/images/featured.jpg","permalink":"https://mrnabati.github.io/posts/projects/05_radar_camera_fusion/","summary":"A novel radar-camera sensor fusion framework for accurate object detection and distance estimation in autonomous vehicles.","tags":["Autonomous Driving","Sensor Fusion","Object Detection"],"title":"Radar-Camera Sensor Fusion and Depth Estimation"},{"categories":["Project"],"contents":"Introduction EcoCAR Mobility Challenge is the newest U.S. Department of Energy (DOE) Advanced Vehicle Technology Competition (AVTC) series challenging 12 North American university teams. It is sponsored by DOE, General Motors and Mathworks and managed by Argonne National Laboratory. Teams have four years to redesign a 2019 Chevrolet Blazer and apply advanced propulsion systems, electrification, SAE Level 2 automation, and vehicle connectivity. Teams will use different sensors and wireless communication devices to design a perception system for the vehicle and deploy VehicleToX (V2X) communication to improve overall operation efficiency in the connected urban environment of the future.\nThe Society of Automotive Engineers (SAE) provides a taxonomy for six levels of driving automation, ranging from no driving automation (level 0) to full driving automation (level 5). SAE Level 2 automation refers to a vehicle with combined automated functions, like acceleration and steering, but the driver must remain engaged with the driving task and monitor the environment at all times.\nThe SAE Levels of Automation (Source: [NHTSA](https://www.nhtsa.gov/technology-innovation/automated-vehicles-safety)) CAVs Team In the Connected and Automated Vehicle Technologies (CAVs) swimlane, teams will implement connected and automated vehicle technology to improve the stock vehicle’s energy efficiency and utility for a Mobility as a Service (MaaS) carsharing application.\nIn particular, the CAVs team is tasked with design and implementation of perception and control systems needed for an SAE Level 2 autonomous vehicle. To design the perception system, teams can integrate different sensors such as cameras, Radars and LiDARs into their vehicles. The CAVs team is also tasked with the design and implementation of a V2X system capable of communicating with infrustructures such as smart traffic lights, or other vehicles with a V2X system.\nPerception System Design Designing a perception system for the vehicle requires a lot of simulation to determine the sensor placement on the vehicle. Matlab\u0026rsquo;s Automated Driving System Toolbox provides algorithms and tools for designing and testing ADAS and autonomous driving systems, including tools for sensor placement and Field of View (FOV) simulation. The following figure shows a simple cuboid simulation in this toolbox using the Driving Scenario Designer app.\nSample Simulation from Matlab\u0026#39;s Driving Scenario Designer App After designing and simulating the sensor placements, the sensors are mounted on a mule vehicle to collect real-world data and verify the simulation results. The following video shows the data collected from a LIDAR, front camera and front Radar mounted on a Chevrolet Camaro while driving on highway. It also displays the outputs of the diagnostic tools designed to monitor the sensor data for possible sensor failure or any other issues while recording the data.\nReferences EcoCAR Mobility Challenge Website AVTC on Flicker National Highway Traffic Safety Administration, Automated Vehicles for Safety The Society of Automotive Engineers, Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles ","date":"November 22, 2018","hero":"/posts/projects/03_ecocarmc/images/featured.jpg","permalink":"https://mrnabati.github.io/posts/projects/03_ecocarmc/","summary":"The EcoCAR Mobility Challenge tasks 12 North American universities to apply advanced propulsion systems, electrification, SAE Level 2 automation, and vehicle connectivity to improve the energy efficiency of a 2019 Chevrolet Blazer.","tags":["Autonomous Driving","ADAS"],"title":"EcoCAR Mobility Challenge"},{"categories":null,"contents":"","date":"November 9, 2018","hero":"/posts/000_enabling_can_on_nvidia_jetson_xavier_developer_kit/images/featured.jpeg","permalink":"https://mrnabati.github.io/posts/000_enabling_can_on_nvidia_jetson_xavier_developer_kit/","summary":"This tutorial covers the step by step process of integrating CAN transceivers and enabling the CAN controllers on an Nvidia Jetson Xavier developer kit.","tags":null,"title":"Enabling CAN on Nvidia Jetson Xavier"},{"categories":null,"contents":"","date":"November 9, 2018","hero":"/posts/001_installing_nvme_ssd_jetson_xavier/images/featured.png","permalink":"https://mrnabati.github.io/posts/001_installing_nvme_ssd_jetson_xavier/","summary":"This tutorial covers the process of installing an NVMe M.2 SSD on the Nvidia Jetson Xavier developer kit.","tags":null,"title":"Installing NVMe SSD on Nvidia Jetson Xavier"},{"categories":["Project"],"contents":"Introduction EcoCAR 3 is a U.S. Department of Energy (DOE) Advanced Vehicle Technology Competition (AVTC) series challenging 16 North American university teams. It is sponsored by DOE and General Motors and managed by Argonne National Laboratory. Teams have four years to redesign a 2016 Chevrolet Camaro and convert it into a hybrid vehicle, while still maintaining the performance expected from this iconic American muscle car.\nEcoCAR 3 is the first AVTC competition requiring teams to implement an Advanced Driver Assistance System (ADAS) on vehicles. In this competition, ADAS systems are not allowed to actively intervene in the vehicle’s propulsion or vehicle control systems and are only allowed to provide passive feedback for drivers.\nADAS Architecture UTK ADAS Architecture UT\u0026rsquo;s ADAS architecture is shown in Figure 1. One camera and one front Radar has been used as the sensor set for this architecture and haptic devices have been installed in the driver seat to relay the driver feedback signals. Two embedded devices are used as the main processors in this architecture: an NXP S32V234 Evaluation Board and an Nvidia Jetson TX2 Developer Kit.\nTeam Tennessee has used the Single Shot MultiBox Detector (SSD) as the object detection algorithm and has developed its own sensor fusion algorithm to fuse the vision based object detection outputs and Radar detections. A sample otuput of the sensor fusion algorithm is shown in Figure 2. The bounding box displays the detected object category, an object ID and also the relative distance of the object.\nFusion Algorithm Output Driver Feedback Driver feedback signals are generated based on the distance of the objects in the same lane as the vehicle, the relative speed of the objects and the stopping distance of the vehicle. According to these parameters, three different levels of vibration can be generated in the haptic devices installed in the driver seat to relay an immediate warning. The driver feedback system operates as a collision avoidance system, generating warning signals only when there is a chance of collision.\nCompetition Results In May 2018, Team Tennessee won the award for Best ADAS Vehicle Demonstration at the EcoCAR 3 Competition held in Yuma AZ, Pomona CA and Los Angeles CA, ranking first among 16 North American universities in this event. The University of Tennessee EcoCAR3 team placed sixth overall and won numerous top prizes in this competition.\nTeam Tennessee Camaro References EcoCAR3 webpage. ","date":"June 21, 2018","hero":"/posts/projects/02_ecocar3/images/featured.jpg","permalink":"https://mrnabati.github.io/posts/projects/02_ecocar3/","summary":"EcoCAR3 challenged 16 North American university teams to redesign a 2016 Chevrolet Camaro. The ADAS team focused on integrating the sensing system on the Camaro and deploy driver feedback to improve efficiency and safety.","tags":["Autonomous Driving","ADAS"],"title":"EcoCAR 3"},{"categories":["Project"],"contents":"Background In this challenge, we were tasked with finding automated methods for extracting map-ready road networks from high-resolution satellite imagery. Moving towards more accurate fully automated extraction of road networks will help bring innovation to computer vision methodologies applied to high-resolution satellite imagery, and ultimately help create better maps where they are needed most. The goal is to extract navigable road networks that represent roads from satellite images.\nWhile Pixel-level F1 score is a widely used segmentation evaluation metric, it is suboptimal for routing applications like road network detection. Because the F1 metric weights each pixel equally, a perfect score is only possible if all pixels are classified correctly as either road or background. With this metric, a brief break in an inferred road (caused for example by an overhanging tree) is lightly penalized while a slight error in road width is penalized heavily. This problem is illustrated in this figure:\nLeft: Ground truth road mask in green. Middle: proposal mask in orange, Right: proposal mask in cyan. Credit: [Adam Van Etten](https://medium.com/the-downlinq/spacenet-road-detection-and-routing-challenge-part-i-d4f59d55bfce) According to the figure, while the orange road proposal achieves a lower F1 score (0.82) compared to the cyan proposal (0.95), it is a better proposal because the cyan road mask misses an important intersection and severs a road. This clearly shows that the pixel-based F1 score is suboptimal in this application.\nThe Spacenet 3 challenge proposes a graph theoretic metric based upon Dijkstra’s shortest path algorithm, called the Average Path Length Similarity (APLS) metric. APLS sums the differences in optimal path lengths between nodes in the ground truth graph G and the proposal graph G’. More details on this metric is provided here.\nOur Solution We approached the road detection problem in SpacenNet challenge as a semantic segmentation task in computer vision. Our model is based on a variant of U-Net, one of the most successful and popular convolutional neural network architectures for image segmentation. Since we use a image segmentation network to attack this problem, our results are in the form of segmentation masks, which needs to be converted to a line-string graph format. To achieve this, we first extract a binary mask of the detected road network. The result would be something like this:\nBinary road map. Then we skeletonize the segmentation masks to make it as thin as possible. This step helps with converting the mask to nodes and edges in the next step.\nSkeletonized road map. Having lines representing detected roads in the image, the next step is converting the lines to line strings. We simply traverse the lines in each continuous segments and keep adding coordinates to a line string. To fill small gaps in the lines, we introduced memory to our traversing algorithm where the algorithm continues in the previous direction for a certain number of steps even after reaching the end of a line. If another line is found, the two line strings are joined and the traversal continues. Depending on the interval used to record the coordinates while traversing, the resulting line strings can have different number of nodes.\n","date":"January 12, 2018","hero":"/posts/projects/01_spacenet/images/featured.png","permalink":"https://mrnabati.github.io/posts/projects/01_spacenet/","summary":"The Spacenet 3 challenge is focused on determining road networks and routing information directly from satellite imagery. The SpaceNet 3 Dataset contains ~8,000 km of roads across the four SpaceNet Areas of Interest.","tags":["Segmentation"],"title":"Spacenet 3: Road Network Detection"},{"categories":["Project"],"contents":"Background \u0026ldquo;The Functional Map of the World (fMoW) Challenge seeks to foster breakthroughs in the automated analysis of overhead imagery by harnessing the collective power of the global data science and machine learning communities. The challenge will publish one of the largest publicly available satellite-image datasets to date, with more than one million points of interest from around the world. The dataset contains satellite-specific metadata that researchers can exploit to build a competitive algorithm that classifies facility, building, and land use.\u0026rdquo; [IARPA]\nDataset The fMoW dataset contains ~3.5TB of 4-band and 8-band multispectral images in TIFF format. There is also a RGB version of the dataset available in JPEG format, with all multispectral imagery converted to RGB. Figure 1 shows the categories in the dataset and the number of instances in each category.\nThe dataset is available for download from AWS or using BitTorrent. To download from AWS, the following AWS CLI can be used:\naws s3 ls s3://spacenet-dataset/Hosted-Datasets/fmow/fmow-full/ aws s3 ls s3://spacenet-dataset/Hosted-Datasets/fmow/fmow-rgb/ To download using BitTorrent, refer to the instructions here.\nOur Solution ","date":"December 30, 2017","hero":"/posts/projects/00_fmow/images/featured.jpeg","permalink":"https://mrnabati.github.io/posts/projects/00_fmow/","summary":"The IARPA Functional Map of the World (fMoW) challenge focuses on promoting research in object identification and classification to automatically identify facility, building, and land use from satellite imagery. The dataset consists of 4- and 8-band multispectral images in 63 categories.","tags":["Segmentation"],"title":"IARPA fMoW Challenge"}]