Android Malware Detection

Using a custom deep learning model

This repository contains a neural network model that detects whether a given application is a malware or not

TLDR;

I created the same deep learning model in Tensorflow and PyTorch to identify whether a given application is malicious or not. Check the notebooks or the website for more

Problem Description

The widespread proliferation of Android devices has led to a concerning increase in malware threats, which pose significant risks to users' personal data and digital security. Malicious apps often disguise themselves as legitimate software, making them difficult to identify without specialized tools.

The provided dataset, contains some of the features that an application may have or services that it may be using. Given this input, I developed an AI model that tries to find a pattern between the features that may reveal whether an application is malicious or not

This was done specifically for educational purposes to learn better the two biggest ML frameworks: Tensorflow and PyTorch.

Approach

In the process of learning more about the implementation of neural networks and their respective frameworks, this model was created initially in Tensorflow and later on in PyTorch.

In general, there is no reason nor a significant difference between the two implementations. However, for educational purposes the two frameworks have been used

Pre-processing

The dataset was clean enough yet some pre-processing steps had to be taken before feeding the data to the model. Briefly put, a few missing values had to be replaced with the mean value of the respective column and the labels had to be encoded. For more details, please check the two notebooks

Model

The problem is of a binary classification. In other words, the model developed will output whether the given attributes consist of an android malware or a goodware.

To tackle this, a neural network has been used with the an input layer of 241 features and 3 hidden layers in between. For further details, please check the two notebooks

Tensorflow Vs PyTorch model

The main differences between the two models were the control of each step/algorithm. With Tensorflow, most of the steps were abstractly defined. For instance the train method was not implemented, the splitting of the dataset was not implemented. Tensorflow required only a few method calls to cover the aforementioned steps

In PyTorch, more effort was necessary to achieve the same output yet this allowed for more control of the output. For instance, the training method had to be manually written. Additionally, in PyTorch, a manual seed was also added in order to "esnure" reproducibility

Results

Accuracy: 99.89%
Precision: 99.46%
Recall: 100%
F1-Score: 99.73%

Confusion Matrix

Technologies used

Docs website

Pico CSS
Vanilla JS
Jyputer notebooks as html

AI Models

Python
Tensorflow
PyTorch
Numpy
Pandas
Sci-kit learn
Matplot
Seaborn

Credits

Special thanks to

Freepik and sentavio for the featured image. More here
Joakim Arvidsson for the dataset. More here

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
images		images
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Android Malware Detection

TLDR;

Table of Contents

Problem Description

Approach

Pre-processing

Model

Tensorflow Vs PyTorch model

Results

Confusion Matrix

Technologies used

Docs website

AI Models

Credits

About

Languages

michaelkonstantinou/android-malware-detection

Folders and files

Latest commit

History

Repository files navigation

Android Malware Detection

TLDR;

Table of Contents

Problem Description

Approach

Pre-processing

Model

Tensorflow Vs PyTorch model

Results

Confusion Matrix

Technologies used

Docs website

AI Models

Credits

About

Topics

Resources

Stars

Watchers

Forks

Languages