Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][TA] Analyze out-of-BT func with tainted parameters #47

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

niktesic
Copy link
Contributor

@niktesic niktesic commented Apr 6, 2023

Summary

This patch introduces support for analysis of functions, which call site is not in the backtrace, if the parameter is tainted. For now, only parameters forwarded through registers are supported.

This partially resolves #36

Description

To be able to track the value of function parameters, this patch introduces forward TaintAnalysis, which propagates the Taint from the function entry point to the point where the parameter value is set. If the path of the Taint is not terminated in the call (it is not set to constant value like in the tainted-param-inside-blame.test ), we should run the backward TaintAnalysis to try to propagate the Taint from the point where the parameter is set to a value, to the point where that value is set to constant ( tainted-params-outside-blame.test ).

Content of commits

0001-Target-Helpers-for-param-forwarding-registers.patch

  • Define Target dependent helpers in Target/CATargetInfo:
    • Get register unsigned (MCRegister) from the RegMap.
    • Get the RegMap.
    • Check if the register (by name) can be used to forward parameters.

0002-TaintDFG-Upgrade-TaintDFG-infrastructure.patch

  • Changes in Taint DataFlowGraph implementation
    • Use MachineOperand as a key for lastTaintedNode map
    • Include Depth field in the Node struct (set it on insertion in TA)
    • Do not erase potential blame node in DFG traversal, if it has zero DerefLevel and is a constant
    • Filter potential blame nodes - consider zero DerefLevel and max depth

0003-TA-Detect-tainted-parameters.patch

  • Define isLEAInstr hook in TargetInstrInfo (for X86)
  • Process LEA like instructions in RegisterEquivalence (Same as Load, but Src Memory address is not dereferenced)
  • Fix TaintAnalysis::isTainted to use (signed) int64_t for offset operand
  • Define findParamLoadingInstr method to find instruction which loaded parameter value in the register
  • Define areParamsTainted method to:
    • determine if the parameters of the out of backtrace call are tainted
    • construct a TaintList to be used in the analysis of such function

0004-TA-Introduce-forward-Taint-Analysis.patch

  • Define forwardMFAnalysis method to perform TaintAnalysis of the functions out of the backtrace
    (like runOnBlameMF, but simplified, and in the forward direction)
    Idea is to propagate the Taint from parameter (function entry point) to the
    program point, where the parameter's value is set.
  • Define propagateTaintFwd method to propagate the Taint from Source to Destination operand
    (Simplified version of propagateTaint, but in the oposite (forward) direction)
  • If parameters are tainted, we should first run Forward Analysis and then, if the TaintList
    is not empty, we run the Backward Analysis
    (that covers cases where the parameter is set in the call, or if it gets value from another parameter)
  • merge TL from the proccessed call into the TL of the parent MBB

0005-TA-Analysis-of-already-decompiled-functions.patch

  • support cases where a function is in the backtrace, but called at another place as well
    (so crash-start is not at the expected place as for other out of the BT calls)
  • upgrade debug prints
  • update tests

0006-New-tests-for-tainted-params-analysis.patch

  • New test - tainted-param-inside-blame.test:
    • parameter is set inside the (out-of-BT) call to a constant (only ForwardAnalysis is enough)
  • New test - tainted-params-outside-blame.test:
    • parameter takes the value of the other parameter in the (out-of-BT) call (combined FW & BW analysis needed)
  • New test - taint-param-decompiled.test:
    • parameter takes the value of the other parameter in the (out-of-BT) call of a function that is in the backtrace as well

Example

// clang -g -O0 m.c -o m
void crash(int val, int* adr){
	*adr = val; // crash - line 3
}

void blame(int** ptr, int* adr){
	*ptr = adr; // wrong blame - line 7
}

void fun(int** ptr, int* adr)
{
   adr = 4; // correct blame - line 12
   blame(ptr,adr);
}

int main(){
  int *adr = 3; // wrong blame - line 17
  int *p = 0; // wrong blame - line 18
  fun(&p,adr);
  crash(1, p);
  return 0;
}

Run crash analyzer: llvm-crash-analyzer --print-potential-crash-cause-loc ./m --core-file core.m
Output:

core-file processed.

Decompiling...
Decompiling crash
Decompiling main
Decompiled.

Analyzing...
Decompiling fun
Decompiling blame

Blame Function is fun
From File m.c:12:8

TaintDFG:
mir-dfg

@niktesic niktesic requested a review from a team as a code owner April 6, 2023 11:43
virtual unsigned getRegSize(std::string RegName) const = 0;

// Get RegAliasTuple from the RegMap with selected Id.
RegAliasTuple &getRegMap(unsigned Id) const {
return const_cast<RegAliasTuple &>(RegMap.at(Id));
}

// Get RegAliasTuple from the RegMap with selected Id.
std::unordered_map<unsigned, RegAliasTuple> getWholeRegMap() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better return a reference to map instead of copying the entire map; no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I missed this, thanks!

@@ -134,13 +134,19 @@ void TaintDataFlowGraph::findBlameFunction(Node *v) {
if (a->MI->getParent() == adjNode->MI->getParent() &&
!a->CallMI && !adjNode->CallMI) {
if (MDT->dominates(adjNode->MI, a->MI)) {
// Do not erase potential blame nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why didn't you terminate in the begining of the loop (line 134) after?
the terminating condition is repeated all over this loop which makes it hard to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will move this condition to the beginning of the iteration.

Copy link
Contributor

@alirezamoshtaghi alirezamoshtaghi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the extensive work,
I didn't see much functional issues in the code, but there are couple efficiency issues that I commented on and also the style is a bit hard to understand. for example, runOnBlameMF has grown too big and the added complexity of two flags runFwAnalysis and runBwAnalysis hasn't made it easier to understand.
Nevertheless, I'm ok with this code checked in, so I'm approving it, but not merging it. Waiting for other team members to comment

@niktesic
Copy link
Contributor Author

niktesic commented Apr 7, 2023

Thank you for the extensive work, I didn't see much functional issues in the code, but there are couple efficiency issues that I commented on and also the style is a bit hard to understand. for example, runOnBlameMF has grown too big and the added complexity of two flags runFwAnalysis and runBwAnalysis hasn't made it easier to understand. Nevertheless, I'm ok with this code checked in, so I'm approving it, but not merging it. Waiting for other team members to comment

Thanks for your comments, Ali! I will add a new commit with suggested updates. Feel free to add new suggestions, if you notice some.

@niktesic
Copy link
Contributor Author

New commits address the reviewers comments and introduce support for parameters on the stack.
This PR is functionally complete in terms of analysis of functions out of the backtrace, where parameters are tainted.
New commits discard previous approval, so a new one is needed.
Thanks!

Content of commits in new changes

0007 Addressing comments 1

  • return a reference to the RegMap instead of copying the entire map
  • move terminating condition to the beginning of the iteration in TaintDataFlowGraph::findBlameFunction

0008 [TA] Support parameters on the stack

  • add helpers to Crash Analyzer TargetInfo for getting BP/SP registers and SP adjustment to BP in the callee frame
  • define isStackSlotTainted method to check for Tainted stack slots and populating TaintList to be used in the TaintAnalysis of the callee
  • define transformSPtoBPTaints method to transform TaintInfo to use BP as the base instead of SP, to match the callee frame pointer, when the callee is analyzed
  • define transformBPtoSPTaints method to transform TaintInfo to use SP as the base instead of BP, to match the caller stack pointer, after callee is analyzed
  • use defined methods in runOnBlameMF to properly setup TL_Of_Call to be used during TaintAnalysis of calls out of the backtrace, when parameters are tainted
  • add test, where tainted location is set in the function out of the backtrace, forwarded as a parameter and the bad value is forwarded as other parameter (both parameters are forwarded via stack)

@niktesic niktesic changed the title [POC] Analyze out-of-BT func with tainted parameters [TA] Analyze out-of-BT func with tainted parameters Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid code which results in a compiler warning about incompatible integer to pointer conversion. Is that intentionally ignored to create a test case? Fixing the code by passing address of k ( pass &k to fun()) fixes the code and there is no core-dump.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Yes, this in intentionally ignored to create a test case. I am using "int" to "int*" conversion to mimic behavior of setting bad pointer value. In real cases the value is usually "nullptr", but I am using different magic values (like 1,2,3...) to make it easier to track the flow of the bad value.

In this particular test, I am testing the case where pointer ptr is passed as a stack parameter, and set to the bad value, which originated from val, which also a parameter forwarded through the stack.

@niktesic niktesic changed the title [TA] Analyze out-of-BT func with tainted parameters [WIP][TA] Analyze out-of-BT func with tainted parameters Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Taint Analysis of functions out of the backtrace
3 participants