Skip to content

Latest commit

 

History

History
260 lines (147 loc) · 16.6 KB

GSoC-2021-YuShiangDang.md

File metadata and controls

260 lines (147 loc) · 16.6 KB
title date
GSoC-2021-YuShiangDang
2021-08-17 08:03:55 -0700

New Rule Generation Technique & Make Quark Everywhere Among Security Open Source Projects

Summary

My name is YuShiang Dang. I am a third-year graduate student at NKUST, Taiwan. This summer, I participated in Google Summer of Code project for Honeynet Project to contribute to Quark-Engine. Two main goals of my project are 1. To boost up rule generation for Quark and 2. Make Quark everywhere among open source projects.

As for my project, I worked on multiple repositories, including quark-rule-generate, Jadx, MobSF and APKLab.

The following section is the summary of 7 important works I've done and its impacts:

1. First Goal - Implement a new rule generate technique for quark-rule-generate

-- Work: The new rule generate technique is implemented.
-- Impact: And is proved to be helpful finding important rules within a relatively short time.
-- Related PR: PR: #2
-- Details: Go to details on this page.

2. First Goal - Solve CPU idle problem for quark-rule-generate

-- Work: A solution is proposed and is implmented for the CPU idle problem.
-- Impact: The work has made huge improvement of the performace.
-- Related PR: PR: #3
-- Details: Go to details on this page.

3. Second Goal - Improve the UX for using Quark in Jadx

-- Work #1: Implemented Error Dialog.
-- Work #2: Implemented Quark auto installation for Jadx.
-- Impact: Improve the user experience for using Quark in Jadx.
-- Related PRs: PR: #1203, PR: #1202, PR: #1199
-- Details: Go to details on this page.

4. Second Goal - Provide more details of Quark’s summary report in Jadx

-- Work: We plan to provide more details so as to help users quickly locate malicious behaviors in the binary. However, during the GSoC, the founder of Jadx, Skylot unexpectedly helped us implement this feature.
-- Impact: With this added information, users can have an overview. Then they know where to start and can dive into the source codes.
-- Related Commit: Commit: #b5720b
-- Details: Go to details on this page.

5. Second Goal - Integrate Quark to MobSF

-- Work: An implementation of integration to MobSF was done and was merged.
-- Impact: This can definitly help Quark to grow huge number of users since MobSF is a well-known project in mobile security.
-- Related PR: PR: #135
-- Details: Go to details on this page.

6. Second Goal - Implement behavior map of Quark to APKLab

-- Work: This feature is implemented and is submitted to APKLab. However, this PR is awaiting to be solved since we encountered some CI issues.
-- Impact: This feature allows APKLab users to have a deeper insight of function/method calls in the suspicious binary.
-- Related PR: PR: #135
-- Details: Go to details on this page.

7. Second Goal - Improve UI/UX of Quark integration in APKLab

-- Work: Three issues were opened for improving Quark UI/UX in APKLab, more discussion are going with the founder of APKLab, Surendrajat.
-- Impact: These issues help to improve the readability of Quark reports and performance of Quark in APKLab.
-- Related Issue: Issue: #142, Issue: #141, Issue: #140
-- Details: Go to details on this page.

Details

1. Implement a new rule generate technique for quark-rule-generate

1.1. Drawbacks of the old technique

As described in here, we need two native APIs to construct one detection rule. The old technique simply finds all possible combinations of native APIs in the target apk. For example, if the target apk has N native APIs, then the old technique would generate N x N rules and verify all of them. This is time and disk volume consuming. And the worst of all, we found out that most of the rules are useless.

1.2. The new technique

The main goal of the new technique is to find valuable detection rules within relatively short time. Hence, we first calculate the number of API call for each API. Then, we sort the numbers. Then we define P (primary) as the set of 20% least used APIs and S (secondary) as the set of 80% most used APIs. In other words, the total number of API (N) is the sum of P and S.

The reason why we choose 20% least used APIs as primary APIs is because we find API such as toSting() is being called everywhere. And toString() is not a helpful API for malware researchers. Why 20%? The answer is simple, we believe in 20-80 rule and we'd like to give it try.

So, despite of the N x N combination. We now have four sets to choose. See Table 1.1. for our inferences on rule value and computational cost of each set. Apparently, set (P x P) is our top priority for hunting high value detection rules.

Table 1.1. The comparison of four different sets

1.3. Experiment for the new technique:

To prove our inferences, we choose Ahmyth RAT as our target APK.:

Figure 1.1. The line chart for the ratio of the number of rules to search times

According to the graph, the Y axis represents (the number of 100% rules / the number of API searches). In other words,Y is the average number of 100% rules per API search. On the other hand, X axis represents the percentage of P.

The result shows perfectly that set PP has the greatest performance among all percentages of P. Set SS, not surprisingly, has the worst performance. Set PS and Set SP have almost the same performances. Last but not least, the result proves that 20-80 rule can be applied perfectly in the rule generating technique.

One drawback in this experiment is that we only use one target APK. Our future goal is to prove that our findings are still applicable in other APKs.

Related PR:

quark-rule-generate PR#2: New rule generate technique

Go back to summary


2. Solve CPU idle problem for quark-rule-generate

The multiprocess evenly distribute APIs to each process (one CPU core) for analysis. However, some processes are idle when they finish the analysis of distributed APIs. And this is a waste of the CPU resource. Therefore, for maximizing the usage of the CPU, I implemented a feature that continuously checks the CPU status and reallocates the APIs that are yet to be analyzed to all CPU cores when the idle CPU core is found.

Related PR:

quark-rule-generate PR#3: Fix the CPU idle problem

Go back to summary


3. Improve the UX for using Quark in Jadx

3.1. Implemented Quark auto installation for Jadx

In previous integration, Jadx users need to install quark themselves before using the quark analysis module. Therefore, for better UX, I implemented the auto installation of quark in PR#1199.

3.2. Implemented Error Dialog

In the previuos integration of quark to Jadx, Error/Warning messages show only in the logger. In other words, users won't have a clue when until they check the log message. Therefore, I implemented a feature that pops up the Error or Warning dialog when Quark is not working properly. See Figure 3.1. and Figure 3.2. for the demo.

Figure 3.1. Error dialog

Figure 3.2. Warning dialog

3.3. Implement the progress bar

The time consumed for different analysis varies. And we think users can have better UX if they know the progress of the analysis. Therefore, I implemented a progress bar on the main window of Jadx to remind users the analysis progress. See Figure 3.3. for the demo.

Figure 3.3. Progress bar for Quark in Jadx

Related PRs

Jadx PR#1203: Change Quark task to background task
Jadx PR#1202: Add Error/Warning dialogs
Jadx PR#1199: Improvements of Quark integration

Go back to summary


4. Provide more details of Quark’s summary report in Jadx

We plan to provide more details so as to help users quickly locate malicious behaviors in the binary. With this added information, users can have an overview. Then they know where to start and can dive into the source codes.

However, during the GSoC, the founder of Jadx, Skylot unexpetedly helped us implement this feature. We appreciate his kindly and unexpected help. :D

See Figure 4.1. and Figure 4.2. for the demo.

Figure 4.1. The binary overview

Figure 4.2. Jump to the source code

Related Commits

Jadx commit b5720b: fix(gui): improve Quark tasks scheduling and report viewer

Go back to summary


5. Integrate Quark to MobSF

5.1. Add Quark Analysis Report

An implementation of quark integration to MobSF is done and is merged. See Figure 5.1. for the demo.

Figure 5.1. Quark analysis report in MobSF

5.2. Dive into the Source Code

We also implemented a killer feature in MobSF. Users can jump to where the suspicious behavior happens when clicking on the activity shows in Figure 5.1. See Figure 5.2. for the demo.

Figure 5.2. The demo for source code overview

Related PR:

MobSF PR#1761: Add Quark Engine as one of the static analyzers

Go back to summary


6. Implement behavior map of Quark to APKLab

6.1. Add behavior map to APKlab

Before we implemented this feature, we fix a permission issue for APKLab. See PR#135 for more information. The behavior map is implemented. See Figure 6.1. and Figure 6.2. With the behavior map, users can quickly understand the relationship between the suspicious behaviors quark detected. However, this PR is awaiting to be solved since we encountered some CI issues.

Figure 6.1. Behavior map in APKLab

Figure 6.2. Behavior map

Related PR:

APKLab PR#135: Quark integration improvement

Go back to summary


7. Improve UI/UX of Quark integration in APKLab

7.1. [UX] Time Consuming when analyzing large size APKs

In the previous integration of quark to APKLab, we've found out that it may take a long time when analyzing large size apks. There are several reasons we think might cause this problem. The problem could be the slow performance of Quark. Or the problem could be the slow performance when executing tools that works with Quark simultaneously.

As for the performance of Quark, our team has conducted a performance assessment and improvement proposal in another GSoC project. Therefore, in this project, we choose to make tools work with Quark to be executed asynchronously. So that users can use other features first and no time is wasted. See issue#140 for my discussion with APKLab.

7.2. [UX]More options for the suspicious behavior traversal

This issue discusses that the Quark report only shows activities with 100% confidence, but we found out that most 80% confidence activities are also valuable.

Therefore, we add a checkbox to filter the percentage of confidence and provide a better UX for the suspicious traversal. See issue#141 for my discussion with APKLab.

7.3. [UI] Indention of the sub-item in Quark report

Lack of indention of the sub-item in the Quark report may cause misunderstanding for users. Therefore, we opened an issue to discuss with APKLab. See issue#142 for more information.

Related Issues:

APKLab Issue#140 Quark analysis may take a long time
APKLab Issue#141 Quark report only shows 100% confidence activities
APKLab Issue#142 There are no indent spaces at the sub-item in the Quark report

Go back to summary

Acknowledgments

Thank Google, for providing such a great project.
Thank Honeynet Project gives me the opportunity.
Thank my mentor, KunYu Chen, for all his sincere support and guidance.
Thank Jadx, MobSF, APKLab for helping me reach my goal.
Thank TTC for providing me the working environment.