Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISCV PLT call causes subsequent instructions to be lost. #1606

Open
matt-j-griffin opened this issue Jun 11, 2024 · 4 comments
Open

RISCV PLT call causes subsequent instructions to be lost. #1606

matt-j-griffin opened this issue Jun 11, 2024 · 4 comments

Comments

@matt-j-griffin
Copy link
Contributor

I've been using BAP to analyze cURL in RISC-V (libcurl.4.4.0).

Calling llvm-objdump on the binary results in this dump.

Generating BIL for the same binary using bap libcurl.4.4.0 -dbil.adt produces this file.

In the BIL output, after the instruction jal appears in a subroutine all the subsequent instructions are lost. In these cases, jal is used to call PLT stubs in the binary.

An example can be found in the curl_easy_getinfo subroutine given below:

000000000001be2c <curl_easy_getinfo>:
   1be2c: 5d 71        	addi	sp, sp, -0x50
   1be2e: 3e fc        	sd	a5, 0x38(sp)
   1be30: 1c 10        	addi	a5, sp, 0x20
   1be32: 06 ec        	sd	ra, 0x18(sp)
   1be34: 32 f0        	sd	a2, 0x20(sp)
   1be36: 36 f4        	sd	a3, 0x28(sp)
   1be38: 3a f8        	sd	a4, 0x30(sp)
   1be3a: c2 e0        	sd	a6, 0x40(sp)
   1be3c: c6 e4        	sd	a7, 0x48(sp)
   1be3e: 3e e4        	sd	a5, 0x8(sp)
   1be40: ef e0 ef ab  	jal	0x1a0fe <Curl_getinfo>
   1be44: e2 60        	ld	ra, 0x18(sp)
   1be46: 61 61        	addi	sp, sp, 0x50
   1be48: 82 80        	ret

The BIL for this subroutine is as follows:

1be2c: <curl_easy_getinfo>
1be2c:
1be2c: addi sp, sp, -0x50
(Move(Var("X2",Imm(64)),PLUS(Var("X2",Imm(64)),Int(18446744073709551536,64))))
1be2e: sd a5, 0x38(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(56,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1be30: addi a5, sp, 0x20
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(32,64))))
1be32: sd ra, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X1",Imm(64)),LittleEndian(),64)))
1be34: sd a2, 0x20(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(32,64)),Var("X12",Imm(64)),LittleEndian(),64)))
1be36: sd a3, 0x28(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(40,64)),Var("X13",Imm(64)),LittleEndian(),64)))
1be38: sd a4, 0x30(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(48,64)),Var("X14",Imm(64)),LittleEndian(),64)))
1be3a: sd a6, 0x40(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(64,64)),Var("X16",Imm(64)),LittleEndian(),64)))
1be3c: sd a7, 0x48(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(72,64)),Var("X17",Imm(64)),LittleEndian(),64)))
1be3e: sd a5, 0x8(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(8,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1be40: jal -0x1d42
(Move(Var("X1",Imm(64)),Int(114244,64)), Jmp(Int(106750,64)))

Instructions at 1be44, 1be46 and 1be48 do not appear in the BIL output.

Is there a workaround?

@bmourad01
Copy link
Contributor

Playing with bap mc, I can see that we get the following output:

$ bap mc --show-knowledge --addr 0x1be40 --target riscv64 -- ef e0 ef ab 
(in-package user)
(in-class core:program)
(bap:start-pseudo-node
  ((core:label-aliases (start-pseudo-node))
   (core:label-name (start-pseudo-node))))
(bap:exit-pseudo-node
  ((core:label-aliases (exit-pseudo-node))
   (core:label-name (exit-pseudo-node))))
(<0xa>
  ((bap:lisp-args
    ((((lisp-symbol (X1)) (bap:exp X1))
      ((bap:static-value (0xffffffffffffe2be)) (bap:exp 0xFFFFFFFFFFFFE2BE)))))
   (bap:lisp-name (llvm-riscv64:JAL))
   (primus:attributes ((core:context (context (target riscv)))))
   (bap:insn ((JAL X1 -0x1d42)))
   (bap:mem ("1be40: ef e0 ef ab"))
   (core:semantics
    ((bap:ir-graph "0000000d:
                    0000000e: X1 := 0x1BE44
                    00000011: goto %0000000f")
     (bap:insn-dests ((15)))
     (bap:insn-ops ((X1 -7490)))
     (bap:insn-asm "jal -7490")
     (bap:insn-opcode JAL)
     (bap:insn-properties
      ((:invalid false)
       (:jump true)
       (:cond false)
       (:indirect false)
       (:call false)
       (:return false)
       (:barrier true)
       (:affect-control-flow true)
       (:load false)
       (:store false)))
     (bap:bir (%0000000d))
     (bap:bil "{
                 X1 := 0x1BE44
                 jmp 0x1A0FE
               }")
     (core:insn-code ("ef e0 ef ab"))))
   (core:label-addr (0x1be40))
   (core:label-unit (3))
   (core:encoding bap:llvm-riscv64)))
(0x1a0fe ((core:label-addr (0x1a0fe)) (core:label-unit (3))))
(in-class bap:toplevel)
(bap:main
  ((bap:insn13 <opaque>)
   (bap:target-and-encoding12 <opaque>)
   (bap:last2 <opaque>)))
(in-class core:theory)
(core-internal:'(bap\:bir bap\:jump-dests bap\:bil-fp-emu)
  ((core:instance
    ((bap:bir bap:bil core:empty bap:jump-dests bap:bil-fp-emu)))))
(core-internal:'(bap\:bil-fp-emu)
  ((core:instance
    ((bap:bil core:empty bap:bil-fp-emu)
     "semantics in BIL, including FP emulation"))))
(core-internal:'(bap\:jump-dests)
  ((core:instance
    ((core:empty bap:jump-dests) "an approximation of jump destinations"))))
(core-internal:'(bap\:bir)
  ((core:instance
    ((bap:bir core:empty) "Builds the graphical representation of a program."))))
(in-class core:unit)
(unit
  ((bap:primus-lisp-context
    (context (patterns enabled) (x86-floating-points intrinsic-semantics)))
   (core:unit-source
    ((bap:typed-program (<opaque>))
     (bap:primus-lisp-program <opaque>)
     (core:source-language bap:primus-lisp)))
   (core:unit-target bap:riscv64)))

Specifically, we have (:call false) in bap:insn-properties. My hypothesis is that the edge from 1be40 to 1be44 is not present in the intraprocedural CFG for curl_easy_getinfo, since BAP doesn't recognize that this is a function call. Can you verify that this address is part of the whole-program disassembly?

@matt-j-griffin
Copy link
Contributor Author

Looking at the files I attached Curl_getinfo is present, however, it is also cut short by a jal instruction:

1a0fe: <Curl_getinfo>
1a0fe:
1a0fe: addi sp, sp, -0x60
(Move(Var("X2",Imm(64)),PLUS(Var("X2",Imm(64)),Int(18446744073709551520,64))))
1a100: sd ra, 0x28(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(40,64)),Var("X1",Imm(64)),LittleEndian(),64)))
1a102: sd a2, 0x30(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(48,64)),Var("X12",Imm(64)),LittleEndian(),64)))
1a104: sd a3, 0x38(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(56,64)),Var("X13",Imm(64)),LittleEndian(),64)))
1a106: sd a4, 0x40(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(64,64)),Var("X14",Imm(64)),LittleEndian(),64)))
1a108: sd a5, 0x48(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(72,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1a10a: sd a6, 0x50(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(80,64)),Var("X16",Imm(64)),LittleEndian(),64)))
1a10c: sd a7, 0x58(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(88,64)),Var("X17",Imm(64)),LittleEndian(),64)))
1a10e: beqz a0, 0x8e
(Move(Var("#4588",Imm(1)),EQ(Var("X10",Imm(64)),Int(0,64))), If(Var("#4588",Imm(1)), (Jmp(Int(106908,64))), ()))
1a110:
1a110: addi a5, sp, 0x30
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(48,64))))
1a112: sd a5, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1a114: lui a5, 0xf00
(Move(Var("X15",Imm(64)),Int(15728640,64)))
1a118: sd s0, 0x20(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(32,64)),Var("X8",Imm(64)),LittleEndian(),64)))
1a11a: and a5, a5, a1
(Move(Var("X15",Imm(64)),AND(Var("X15",Imm(64)),Var("X11",Imm(64)))))
1a11c: lui a4, 0x300
(Move(Var("X14",Imm(64)),Int(3145728,64)))
1a120: mv s0, a0
(Move(Var("X8",Imm(64)),Var("X10",Imm(64))))
1a122: beq a5, a4, 0x84
(Move(Var("#4651",Imm(1)),EQ(Var("X15",Imm(64)),Var("X14",Imm(64)))), If(Var("#4651",Imm(1)), (Jmp(Int(106918,64))), ()))
1a126:
1a126: bgeu a4, a5, 0x34
(Move(Var("#4653",Imm(1)),LE(Var("X15",Imm(64)),Var("X14",Imm(64)))), If(Var("#4653",Imm(1)), (Jmp(Int(106842,64))), ()))
1a12a:
1a12a: lui a4, 0x400
(Move(Var("X14",Imm(64)),Int(4194304,64)))
1a12e: beq a5, a4, 0xda
(Move(Var("#4655",Imm(1)),EQ(Var("X15",Imm(64)),Var("X14",Imm(64)))), If(Var("#4655",Imm(1)), (Jmp(Int(107016,64))), ()))
1a132:
1a132: lui a4, 0x500
(Move(Var("X14",Imm(64)),Int(5242880,64)))
1a136: bne a5, a4, 0x18
(Move(Var("#4657",Imm(1)),NEQ(Var("X15",Imm(64)),Var("X14",Imm(64)))), If(Var("#4657",Imm(1)), (Jmp(Int(106830,64))), ()))
1a13a:
1a13a: addi a5, sp, 0x38
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(56,64))))
1a13c: sd a5, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1a13e: mv s0, a2
(Move(Var("X8",Imm(64)),Var("X12",Imm(64))))
1a140: beqz a2, 0xe
(Move(Var("#4659",Imm(1)),EQ(Var("X12",Imm(64)),Int(0,64))), If(Var("#4659",Imm(1)), (Jmp(Int(106830,64))), ()))
1a142:
1a142: lui a5, 0x500
(Move(Var("X15",Imm(64)),Int(5242880,64)))
1a146: addi a5, a5, 0x2c
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Int(44,64))))
1a14a: beq a1, a5, 0xee
(Move(Var("#4661",Imm(1)),EQ(Var("X11",Imm(64)),Var("X15",Imm(64)))), If(Var("#4661",Imm(1)), (Jmp(Int(107064,64))), ()))
1a14e:
1a14e: ld s0, 0x20(sp)
(Move(Var("X8",Imm(64)),Load(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(32,64)),LittleEndian(),64)))
1a150: li a0, 0x30
(Move(Var("X10",Imm(64)),Int(48,64)))
1a154:
1a154: ld ra, 0x28(sp)
(Move(Var("X1",Imm(64)),Load(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(40,64)),LittleEndian(),64)))
1a156: addi sp, sp, 0x60
(Move(Var("X2",Imm(64)),PLUS(Var("X2",Imm(64)),Int(96,64))))
1a158: ret
(Jmp(Var("X1",Imm(64))))
1a15a:
1a15a: lui a4, 0x100
(Move(Var("X14",Imm(64)),Int(1048576,64)))
1a15e: beq a5, a4, 0x78
(Move(Var("#4592",Imm(1)),EQ(Var("X15",Imm(64)),Var("X14",Imm(64)))), If(Var("#4592",Imm(1)), (Jmp(Int(106966,64))), ()))
1a162:
1a162: lui a4, 0x200
(Move(Var("X14",Imm(64)),Int(2097152,64)))
1a166: bne a5, a4, -0x18
(Move(Var("#4643",Imm(1)),NEQ(Var("X15",Imm(64)),Var("X14",Imm(64)))), If(Var("#4643",Imm(1)), (Jmp(Int(106830,64))), ()))
1a16a:
1a16a: addi a5, sp, 0x38
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(56,64))))
1a16c: sd a5, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1a16e: beqz a2, -0x20
(Move(Var("#4645",Imm(1)),EQ(Var("X12",Imm(64)),Int(0,64))), If(Var("#4645",Imm(1)), (Jmp(Int(106830,64))), ()))
1a170:
1a170: lui a5, 0xffe00
(Move(Var("X15",Imm(64)),Int(4292870144,64)))
1a174: addiw a5, a5, -0x2
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Int(18446744073709551614,64))))))
1a176: addw a5, a5, a1
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Var("X11",Imm(64)))))))
1a178: sext.w a3, a5
(Move(Var("X13",Imm(64)),SIGNED(64,LOW(32,Var("X15",Imm(64))))))
1a17c: li a4, 0x2c
(Move(Var("X14",Imm(64)),Int(44,64)))
1a180: bltu a4, a3, -0x32
(Move(Var("#4647",Imm(1)),LT(Var("X14",Imm(64)),Var("X13",Imm(64)))), If(Var("#4647",Imm(1)), (Jmp(Int(106830,64))), ()))
1a184:
1a184: slli a3, a5, 0x20
(Move(Var("X13",Imm(64)),LSHIFT(Var("X15",Imm(64)),Int(32,64))))
1a188: auipc a4, 0x1f
(Move(Var("X14",Imm(64)),Int(106919,64)))
1a18c: addi a4, a4, 0x724
(Move(Var("X14",Imm(64)),PLUS(Var("X14",Imm(64)),Int(1828,64))))
1a190: srli a5, a3, 0x1e
(Move(Var("X15",Imm(64)),RSHIFT(Var("X13",Imm(64)),Int(30,64))))
1a194: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a196: lw a5, 0x0(a5)
(Move(Var("X15",Imm(64)),SIGNED(64,Load(Var("mem",Mem(64,8)),Var("X15",Imm(64)),LittleEndian(),32))))
1a198: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a19a: jr a5
(Jmp(Var("X15",Imm(64))))
1a19c:
1a19c: ld ra, 0x28(sp)
(Move(Var("X1",Imm(64)),Load(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(40,64)),LittleEndian(),64)))
1a19e: li a0, 0x30
(Move(Var("X10",Imm(64)),Int(48,64)))
1a1a2: addi sp, sp, 0x60
(Move(Var("X2",Imm(64)),PLUS(Var("X2",Imm(64)),Int(96,64))))
1a1a4: ret
(Jmp(Var("X1",Imm(64))))
1a1a6:
1a1a6: addi a5, sp, 0x38
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(56,64))))
1a1a8: sd a5, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1a1aa: beqz a2, -0x5c
(Move(Var("#4590",Imm(1)),EQ(Var("X12",Imm(64)),Int(0,64))), If(Var("#4590",Imm(1)), (Jmp(Int(106830,64))), ()))
1a1ac:
1a1ac: lui a5, 0xffd00
(Move(Var("X15",Imm(64)),Int(4291821568,64)))
1a1b0: addiw a5, a5, -0x3
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Int(18446744073709551613,64))))))
1a1b2: addw a5, a5, a1
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Var("X11",Imm(64)))))))
1a1b4: sext.w a3, a5
(Move(Var("X13",Imm(64)),SIGNED(64,LOW(32,Var("X15",Imm(64))))))
1a1b8: li a4, 0x1e
(Move(Var("X14",Imm(64)),Int(30,64)))
1a1ba: bltu a4, a3, -0x6c
(Move(Var("#4649",Imm(1)),LT(Var("X14",Imm(64)),Var("X13",Imm(64)))), If(Var("#4649",Imm(1)), (Jmp(Int(106830,64))), ()))
1a1be:
1a1be: slli a3, a5, 0x20
(Move(Var("X13",Imm(64)),LSHIFT(Var("X15",Imm(64)),Int(32,64))))
1a1c2: auipc a4, 0x1f
(Move(Var("X14",Imm(64)),Int(106977,64)))
1a1c6: addi a4, a4, 0x79e
(Move(Var("X14",Imm(64)),PLUS(Var("X14",Imm(64)),Int(1950,64))))
1a1ca: srli a5, a3, 0x1e
(Move(Var("X15",Imm(64)),RSHIFT(Var("X13",Imm(64)),Int(30,64))))
1a1ce: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a1d0: lw a5, 0x0(a5)
(Move(Var("X15",Imm(64)),SIGNED(64,Load(Var("mem",Mem(64,8)),Var("X15",Imm(64)),LittleEndian(),32))))
1a1d2: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a1d4: jr a5
(Jmp(Var("X15",Imm(64))))
1a1d6:
1a1d6: addi a5, sp, 0x38
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(56,64))))
1a1d8: sd a5, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1a1da: beqz a2, -0x8c
(Move(Var("#4594",Imm(1)),EQ(Var("X12",Imm(64)),Int(0,64))), If(Var("#4594",Imm(1)), (Jmp(Int(106830,64))), ()))
1a1dc:
1a1dc: lui a5, 0xfff00
(Move(Var("X15",Imm(64)),Int(4293918720,64)))
1a1e0: addiw a5, a5, -0x1
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Int(18446744073709551615,64))))))
1a1e2: addw a5, a5, a1
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Var("X11",Imm(64)))))))
1a1e4: sext.w a3, a5
(Move(Var("X13",Imm(64)),SIGNED(64,LOW(32,Var("X15",Imm(64))))))
1a1e8: li a4, 0x28
(Move(Var("X14",Imm(64)),Int(40,64)))
1a1ec: bltu a4, a3, -0x9e
(Move(Var("#4596",Imm(1)),LT(Var("X14",Imm(64)),Var("X13",Imm(64)))), If(Var("#4596",Imm(1)), (Jmp(Int(106830,64))), ()))
1a1f0:
1a1f0: slli a3, a5, 0x20
(Move(Var("X13",Imm(64)),LSHIFT(Var("X15",Imm(64)),Int(32,64))))
1a1f4: auipc a4, 0x1f
(Move(Var("X14",Imm(64)),Int(107027,64)))
1a1f8: addi a4, a4, 0x7e8
(Move(Var("X14",Imm(64)),PLUS(Var("X14",Imm(64)),Int(2024,64))))
1a1fc: srli a5, a3, 0x1e
(Move(Var("X15",Imm(64)),RSHIFT(Var("X13",Imm(64)),Int(30,64))))
1a200: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a202: lw a5, 0x0(a5)
(Move(Var("X15",Imm(64)),SIGNED(64,Load(Var("mem",Mem(64,8)),Var("X15",Imm(64)),LittleEndian(),32))))
1a204: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a206: jr a5
(Jmp(Var("X15",Imm(64))))
1a208:
1a208: addi a5, sp, 0x38
(Move(Var("X15",Imm(64)),PLUS(Var("X2",Imm(64)),Int(56,64))))
1a20a: sd a5, 0x18(sp)
(Move(Var("mem",Mem(64,8)),Store(Var("mem",Mem(64,8)),PLUS(Var("X2",Imm(64)),Int(24,64)),Var("X15",Imm(64)),LittleEndian(),64)))
1a20c: beqz a2, -0xbe
(Move(Var("#4598",Imm(1)),EQ(Var("X12",Imm(64)),Int(0,64))), If(Var("#4598",Imm(1)), (Jmp(Int(106830,64))), ()))
1a20e:
1a20e: lui a5, 0xffc00
(Move(Var("X15",Imm(64)),Int(4290772992,64)))
1a212: addiw a5, a5, -0x1b
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Int(18446744073709551589,64))))))
1a214: addw a5, a5, a1
(Move(Var("X15",Imm(64)),SIGNED(64,LOW(32,PLUS(Var("X15",Imm(64)),Var("X11",Imm(64)))))))
1a216: sext.w a3, a5
(Move(Var("X13",Imm(64)),SIGNED(64,LOW(32,Var("X15",Imm(64))))))
1a21a: li a4, 0x12
(Move(Var("X14",Imm(64)),Int(18,64)))
1a21c: bltu a4, a3, -0xce
(Move(Var("#4600",Imm(1)),LT(Var("X14",Imm(64)),Var("X13",Imm(64)))), If(Var("#4600",Imm(1)), (Jmp(Int(106830,64))), ()))
1a220:
1a220: slli a3, a5, 0x20
(Move(Var("X13",Imm(64)),LSHIFT(Var("X15",Imm(64)),Int(32,64))))
1a224: auipc a4, 0x20
(Move(Var("X14",Imm(64)),Int(107076,64)))
1a228: addi a4, a4, -0x7a4
(Move(Var("X14",Imm(64)),PLUS(Var("X14",Imm(64)),Int(18446744073709549660,64))))
1a22c: srli a5, a3, 0x1e
(Move(Var("X15",Imm(64)),RSHIFT(Var("X13",Imm(64)),Int(30,64))))
1a230: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a232: lw a5, 0x0(a5)
(Move(Var("X15",Imm(64)),SIGNED(64,Load(Var("mem",Mem(64,8)),Var("X15",Imm(64)),LittleEndian(),32))))
1a234: add a5, a5, a4
(Move(Var("X15",Imm(64)),PLUS(Var("X15",Imm(64)),Var("X14",Imm(64)))))
1a236: jr a5
(Jmp(Var("X15",Imm(64))))
1a238:
1a238: li a1, 0x0
(Move(Var("X11",Imm(64)),Int(0,64)))
1a23a: jal 0x4a30
(Move(Var("X1",Imm(64)),Int(107070,64)), Jmp(Int(126058,64)))

When I investigated in June/July I noticed that isCall wasn't set for jal in BAP's knowledge base. This lead me to the LLVM's table definitions for RISC-V for JAL, where isCall seems to be false. However, C_JAL, the compressed version of JAL seems to have isCall set.

I was able to resolve this issue by forking the LLVM respository and setting isCall to true for JAL. Is this an issue on the LLVM side then?

@bmourad01
Copy link
Contributor

@matt-j-griffin I'm not aware of it, but it may be useful to check with them. I'd be curious if there was a way to tell BAP's disassembler via the Knowledge Base about this, without having to fork LLVM.

@ivg
Copy link
Member

ivg commented Jan 17, 2025

There's a way of course, but theoretically BAP should have handled this even without extra hint from the disassembler. This instruction semantics is described in the lisp file:

(defun JAL (lr off)
  (let ((pc (get-program-counter)))
    (set$ lr (+ pc 4))
    (exec-addr (+ pc off))))

And currently there is no primitive or an attribute, which fits better here, to specify that the instruction is a call. Ideally, we should be able to write it like this,

(defun JAL (lr off)
  (declare (instruction-properties 'is-call))
  (let ((pc (get-program-counter)))
    (set$ lr (+ pc 4))
    (exec-addr (+ pc off))))

adding such functionality wouldn't be hard and it will easily fix this issue.

But the underlying issue is a little bit deeper. The disassembler driver should be able to handle this on its own as we treat a jump to a subroutine as a call. Here Curl_getinfo is definitely properly resolved and recognized as a subroutine. So it looks like a bug in the disassembling driver. I think the culprit here is that we use two properties that are not really independent: barrier and call (see bap_disasm_driver.ml:30).When we later rediscover that an instruction is call, we do not update the barrier property (we treat all calls as not barriers). So the fix would be to change the `is_barrier function (at line 582) to be

let is_barrier dsts = dsts.barrier && not dsts.call

and then change all accesses to dsts.barrier to the is_barrier function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants