Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction description mismatch #360

Open
AFOliveira opened this issue Dec 18, 2024 · 4 comments
Open

Instruction description mismatch #360

AFOliveira opened this issue Dec 18, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@AFOliveira
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
While working on #356, I encountered the following issue with the FENCE instruction.

riscv-opcodes describes FENCE as fence fm pred succ rs1 14..12=0 rd 6..2=0x03 1..0=3
LLVM describes FENCE as 0000pred[3]pred[2]pred[1]pred[0]succ[3]succ[2]succ[1]succ[0]00000000000000001111

The difference here is that LLVM hardcodes FM as 0000. I believe the reason this happens is because the ISA states the following:

image

The problem is that FENCE.TSO is a pseudo instruction of FENCE, therefore I'm not sure we can use LLVM definition here. Is this a difference on how compilers and hardware see the same instrucion?

Describe the solution you'd like
Would it be beneficialif we define a field that differentiates the instructions from software and hardware POV?

@AFOliveira AFOliveira added the enhancement New feature or request label Dec 18, 2024
@lenary
Copy link
Collaborator

lenary commented Dec 18, 2024

As far as I can see, LLVM model the three different fences in RISC-V separately:

  • fence $pred, $succ is the Normal fence (fm=0000) - the def called FENCE
  • fence.tso is the TSO fence (fm=1000, pred=0011, succ=0011) - the def called FENCE_TSO
  • fence.i is the instruction fence - the def called FENCE_I.

These latter two are not "pseudo instructions" - in LLVM, that term is used to mean something without an encoding. The three above all have well-defined encodings.

I think this split definition on the LLVM side was likely done for two reasons:

  • The fences have different operands. Note that fence.tso's assembly is not described in the unprivileged spec, but assemblers expect it to be fence.tso, whereas fence requires operands (and omitting them is an error). This is even more important for register operands (and modelling dataflow) than for immediate operands, but the same still applies.
  • In llvm's assembly parsing, it's significantly easier not to have to work hard to split the mnemonic up into "bits of the mnemonic that identify the instruction" vs "bits that are effectively an operand" (to carry on the example, in fence.tso, the fence identifies the instruction, and the .tso is effectively an operand).
  • In llvm's operand parsing, it almost always needs an operand to turn into the bits that are in the instruction. fence rw, rw has no "operand" to turn into the fm bits.

I'll also point out that the riscv-unified-db (main branch, maybe i'm looking at the wrong one) doesn't really know how to assemble fence anyway - it's marked as TODO.

More generally, looking through the other pseudoinstruction definitions in this repo, I'll note that LLVM takes a few different approaches to implementing them:

  • "zext.w" is an alias: def : InstAlias<"zext.w $rd, $rs", (ADD_UW GPR:$rd, GPR:$rs, X0)>; - this will be the most common approach, I expect.
  • fence is described above. While fence.tso is described in a separate instruction, pause is indeed an alias of fence.
  • prefetch.* is modelled as different instructions, because they have different dataflow than ori - they are marked mayLoad and mayStore, so they aren't moved relative to loads and stores by the code generator. This kind of movement is ok for

It's not very easy to balance all the reasons why LLVM may or may not split up an instruction compared to why this specification may or may not split up an instruction. I'm also 100% sure GCC/binutils will have made different choices to LLVM in some of these places. You may just need to cope with there not being a 1:1 mapping in some places.

I'll also note some issues with the pseudoinstructions: descriptions:

  • for fmv.s, fabs.s, fneg.s (and for their *.q counterparts), there are no operands specified, whereas for zext.w there are. Given the aim of these pseudoinstructions is usually to modify the operands given to the instructions, I don't quite understand why they don't specify the operands for the pseudos explicitly.
  • the prefetch.* pseudoinstructions are e.g. prefetch.i offset, but I think they're expected to be prefetch.i offset(rs1)

@dhower-qc
Copy link
Collaborator

The curse of the pseudo instructions again ;)

To amplify @lenary's comments, I think there are two kinds of pseudo instructions:

  1. Those that are simply an alias, and do not change behavior (e.g., fmv.s)
  2. Those that reuse an encoding, but result in different behavior (e.g., prefetch*, fence.tso)

I suspect we won't find issues with type 1.

It's worth somehow distinguishing type 2. For both HW and SW, there are valid reasons to either treat them as separate instructions or as an alias on a case-by-case basis. UDB's goal should be to allow tools either choice.

There are a few ways we can deal with type 2:

  1. Handle all behaviors in the operation() of the base instruction, and mark the special cases as pseudo instructions, perhaps with some metadata to indicate they are "different". (current approach)
  2. Leave behaviors of type 2 out of the operation() of the base instruction. Add new (pseudo) instructions with their own operation(). Tools will need to know that these type 2 pseudo instructions have priority in decoding to get the correct information.

I'm leaning towards option 1, but this could be a good topic for the meeting today.

@AFOliveira
Copy link
Collaborator Author

I found that binutils also doesn't specify fence.tso as an alias (see https://github.com/bminor/binutils-gdb/blob/4a95bf414d90b565723618b55c52fadfba86a9c6/opcodes/riscv-opc.c#L575).

Moreover, this commit (riscv/riscv-isa-manual@3943720)
clearly states that fence.tso is not a pseudoinstruction. I'll bring that up in the ISA Manual just in case, but I think we've just reduced the list of the following.

Those that reuse an encoding, but result in different behavior (e.g., prefetch*, fence.tso)

@AFOliveira
Copy link
Collaborator Author

Confirms, fence.tso is not a pseudo, so we should probably update it to not be one.
riscv/riscv-isa-manual#1782

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants