Using a custom deep learning model
This repository contains a neural network model that detects whether a given application is a malware or not
I created the same deep learning model in Tensorflow and PyTorch to identify whether a given application is malicious or not. Check the notebooks or the website for more
The widespread proliferation of Android devices has led to a concerning increase in malware threats, which pose significant risks to users' personal data and digital security. Malicious apps often disguise themselves as legitimate software, making them difficult to identify without specialized tools.
The provided dataset, contains some of the features that an application may have or services that it may be using. Given this input, I developed an AI model that tries to find a pattern between the features that may reveal whether an application is malicious or not
This was done specifically for educational purposes to learn better the two biggest ML frameworks: Tensorflow and PyTorch.
In the process of learning more about the implementation of neural networks and their respective frameworks, this model was created initially in Tensorflow and later on in PyTorch.
In general, there is no reason nor a significant difference between the two implementations. However, for educational purposes the two frameworks have been used
The dataset was clean enough yet some pre-processing steps had to be taken before feeding the data to the model. Briefly put, a few missing values had to be replaced with the mean value of the respective column and the labels had to be encoded. For more details, please check the two notebooks
The problem is of a binary classification. In other words, the model developed will output whether the given attributes consist of an android malware or a goodware.
To tackle this, a neural network has been used with the an input layer of 241 features and 3 hidden layers in between. For further details, please check the two notebooks
The main differences between the two models were the control of each step/algorithm. With Tensorflow, most of the steps were abstractly defined. For instance the train method was not implemented, the splitting of the dataset was not implemented. Tensorflow required only a few method calls to cover the aforementioned steps
In PyTorch, more effort was necessary to achieve the same output yet this allowed for more control of the output. For instance, the training method had to be manually written. Additionally, in PyTorch, a manual seed was also added in order to "esnure" reproducibility
- Accuracy: 99.89%
- Precision: 99.46%
- Recall: 100%
- F1-Score: 99.73%
- Pico CSS
- Vanilla JS
- Jyputer notebooks as html
- Python
- Tensorflow
- PyTorch
- Numpy
- Pandas
- Sci-kit learn
- Matplot
- Seaborn
Special thanks to