Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-270: Bitswap 1.3.0 - Tokens (and auth) support #270

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions BITSWAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ There are multiple Bitswap versions and more may evolve over time. We give brief
- `/ipfs/bitswap/1.0.0` - Initial version
- `/ipfs/bitswap/1.1.0` - Support CIDv1
- `/ipfs/bitswap/1.2.0` - Support Wantlist Have's and Have/DontHave responses
- `/ipfs/bitswap/1.3.0` - Support adding tokens in Bitswap requests/responses and BlockTooBig error message

## Block Sizes

Expand Down Expand Up @@ -192,6 +193,94 @@ message Message {
}
```

## Bitswap 1.3.0

Bitswap 1.3.0 extends the Bitswap 1.2.0 protocol with the following changes:
1. Having a list of tokens that may be sent with the message and referenced within the message
2. Allowing each entry in the wantlist to contain a set of tokens
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @ianopolous:

Do you have a concrete use case in mind for allowing multiple auth tokens for a single cid in the same request? Our usage in Peergos is maximally fine grained and doesn't need this. For example, Amazon S3 doesn't need this and they clearly support almost all use cases in the world right now.

A couple things come to mind:

  • Requiring two tokens from different systems: e.g. a Peergos token for authentication and some token related to paying for retrieval of that data
  • "Suggested" ACLs: The server wants to tell the client which token caused me to give them the block (e.g. for multitenancy auth or payment purposes), but I also want to tell the client some general ACL I'd like them to respect

Maybe it's overkill, but we're making it easier for people to do auth (e.g. they could've just used peerID auth and some offline channel to coordinate authentication and payments) and we should probably try to make systems composable in-band. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating the auth token from a payment token is an interesting idea. Has anyone done anything like this (is your angle composability with filecoin?)? I would normally imagine you would just pay for access to the content. I.e. pay to get an auth token/key. But I'm just hypothesising here. I guess what you're saying is different because you're not paying for access, you're paying for delivery, which should be independent.

Note that in our multi-tenant case, the client already knows which token was authed to get a block without needing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cases I have talked to folks about:

  • ACLs without payments
  • ACLs with payments
  • request limiter, e.g. this token allows you to retrieve 100 times

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, one absolutely critical property of any usage of this is that auth tokens MUST be tied to the sender's nodeId and this must be verified before releasing a block. Otherwise you are trivially vulnerable to replay attacks because you broadcast the tokens to the DHT. I think it would be worthwhile to flesh out a few of these use cases to make sure it is possible to do securely. E.g. a naive approach of using this to do an ACL by having tokens that the recipient looks up in a db to check that a token is allowed for a cid is insecure. To make it secure the db would also need to store the nodeId's allowed to use a given token, but then it wouldn't need this extension at all or any auth tokens, just use the nodeid of the requestor in normal bitswap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about trying to use this to do ACLs (as opposed to capabilities) the more I think it is impossible to do securely because of the broadcast nature of bitswap, except in the trivial case where the ACL only depends on the nodeId. In that case you don't need this PR at all and you should just use a custom allow function as suggested here and as we've implemented here.

Fundamentally, it comes down to ACLs depending on identity and the only identity between ipfs nodes is the node id. So the ACL will only be able to depend on the node id, in which case why not just use the node id which is already authenticated and unfakeable without this PR.

There is one way around this though, and that is single-use (not N>1) tokens. If a token is enforced to be only used once, then clearly a replay attack can't occur. That sounds like a massive footgun to me and also a scalabilty and centralisation problem as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch my last statement about single use tokens, because whilst true, there is also a race condition between the originator and anyone else in the DHT they send the token to. So yes that is a footgun.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, one absolutely critical property of any usage of this is that auth tokens MUST be tied to the sender's nodeId and this must be verified before releasing a block. Otherwise you are trivially vulnerable to replay attacks because you broadcast the tokens to the DHT.

Bitswap is a data transfer (and signaling) protocol that's independent of not just the IPFS Public DHT but any other content routing mechanism. If an application chooses to leverage secret tokens there's nothing to be done other than to make sure the tokens are only sent to the correct peer(s) since as you pointed out broadcasting a secret is bad news no matter how you do it.

For example, go-ipfs having a command like ipfs get bafyfoobar --token=<some-secret-token> would be a footgun that IMO shouldn't be implemented. On the other hand, there appear to be a variety of folks who have asked in ipfs/kubo#7871 and related issues to be able to do the equivalent of sending HTTP request headers along with a request. To take advantage of that protocol feature they'd have to write their own clients + servers though.

The nice thing is that despite them having to write their own clients and servers for their custom token authenticated requests: 1) they can do non-token authenticated requests with everyone else 2) Bitswap client/server implementers can choose to support whichever schemes start to take off without requiring a spec change and convincing people the next version of Bitswap should support their scheme

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitswap doesn't have a mechanism to only send wants to "the correct" peers. From a security perspective Wants are broadcast to the world. (If I want to receive an auth token for a given cid then I just need to pretend to the DHT that I have that cid and then the requests will come in with valid tokens). This is super important, and will trivially break security for most things people will try and do with it.

It is possible to do securely with the broadcast assumption, which is what we've figured out in Peergos. A simplified analogy would be that if a secret S allows access to cid C. Then if node A is requesting C then the token sent with bitswap could be some form of signature using S of (A, C). Then the recipient can verify that the signature is valid, and the request came from A. Then there is no possibility for replay attacks because the signature is tied to the sender's node id.

So I highly recommend speccing out some of these proposed usages and checking they are not totally broken security-wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitswap doesn't have a mechanism to only send wants to "the correct" peers

I don't understand this. Bitswap is a message sending protocol between two peers.

From a security perspective Wants are broadcast to the world.

This is only the case in go-bitswap, the decision as to which peers to send messages to is up to the implementation. Is there somewhere in the spec that suggests that wants are globally broadcast?

Security aside implementations may not want to broadcast to everyone simply as a way of cutting down on spamming and bandwidth usage. For example, even in the non-auth case I could see an implementation making the argument to only ask peers it's connected to over a LAN or where they were recommended by the content routing system.

So I highly recommend speccing out some of these proposed usages and checking they are not totally broken security-wise.

A couple that are more suitable for open networks have been described (e.g. bats and payments), but doing simple bearer token style auth seems very easy as well. It just means that writing some CLI command like fetch-cid <cid> <bearer-token> is meaningless and instead you'd need to specify fetch-cid <cid> <multiaddr> [bearer-token] or fetch-cid <cid> [tuple of mulitaddr and bearer token].

I'll try and reach out to some of the folks that have requested this functionality in the past for comments since AFAIK none of the people who have commented on this PR are planning on implementing support for something closer to a bearer-token style which wouldn't be appropriate for broadcast.

However, I'm confused as to what this has to do with the spec proposal. Even bats + payments require a spec change that looks like this though, right?

Copy link
Member

@ianopolous ianopolous May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only the case in go-bitswap

To my knowledge, go-bitswap is the main implementation of bitswap, which is used by go-ipfs, the main implementation of ipfs. Are you suggesting that existing ipfs apis wouldn't use this at all? If only new apis like you suggest will use this then that's much easier to avoid footguns.

3. Allowing responses (both BlockPresences and Blocks) to contain a set of tokens
4. Adding an `TokenErr` BlockPresence
5. Adding a `BlockTooBig` BlockPresence

### Interaction Pattern

Given that a client C wants to fetch data from some server S:

1. C opens a stream `s_want` to S and sends a message for the blocks it wants
1. C may either send a complete wantlist, or an update to an outstanding wantlist
2. C may reuse this stream to send new wants
3. For each of the items in the wantlist C may ask if S has the block (i.e. a Have request) or for S to send the block (i.e. a block request). C may also ask S to send back a DontHave message in the event it doesn't have the block
4. For each of the items in the wantlist C may append any `tokens` they want. Recommended tokens include those that would convince S to give C the blocks they want.
2. S responds back on a stream `s_receive`. S may reuse this stream to send back subsequent responses
1. If C sends S a Have request for data S has (and is willing to give to C) it should respond with a Have, although it may instead respond with the block itself (e.g. if the block is very small)
1. For each of the items that S sends back Blocks or BlockPresences for they may append any `tokens` they want
2. If C sends S a Have request for data S has but is not currently willing to give to C, S may respond with an `AuthRequired` BlockPresence
1. For each of the items that S sends back Blocks or BlockPresences for they may append any `tokens` they want. Recommended tokens include those that would inform C how they could convince S to give them access to the blocks they want.
2. If C sends S a Have request for data S does not have (or has but is not willing to tell C it has) and C has requested for DontHave responses then S should respond with DontHave
3. S may choose to include the number of bytes that are pending to be sent to C in the response message
4. If C asked for a block that S has but it is bigger than the maximum block size it should return `BlockTooBig`
3. When C no longer needs a block it previously asked for it should send a Cancel message for that request to any peers that have not already responded about that particular block. It should particularly send Cancel messages for Block requests (as opposed to Have requests) that have not yet been answered.

### Tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, "tokens" are not auth-specific, and are really just arbitrary key-value metadata pairs which are useful for passing auth tokens, among other things. If that's the case, would it make sense to call this something other than "token"? Generally "token" is used in auth contexts, and so its use here implies that it's intended only for auth, which isn't really the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that while it can be more general, token isn't a particularly bad name for this.


The major change in this protocol version is the introduction of the ability to pass tokens along with requests and responses. A token is defined as `<multicode><data>` where the multicode is an identifier in the multicodec table, and the data is token-specific data associated with that code.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @ianopolous

I would argue that a byte[] is the most general you can get. In fact, for our usage in Peergos these byte[] are cbor. So we get the benefit of types, upgradability etc. but at the application layer and without the centralisation point of multicodecs. And this also doesn't force everyone to pay the overhead of the multicodec. This means applications can innovate faster and without requiring anyone's permission.

I'll break this into pieces:

I would argue that a byte[] is the most general you can get.

Yes, although it comes with the risks associated with lack of explicit signaling. For example, if my application wants to support Peergos auth for some blocks and Foo for other blocks and both look like cbor then I have to waste time computing both. Even worse, Peergos and Foo might have similar enough formats as to not be distinguishable and therefore the application will have to try both auth mechanisms before responding.

without the centralisation point of multicodecs .... This means applications can innovate faster and without requiring anyone's permission.

Getting a code in the code table doesn't have to be hard and people can test with the reserved range. However, if we want this to be permissionless we could also resolve this by just adding some standard code that people can use. A strawman might be <the-code><varint><my-custom-value>, then the reservation of a code from the table is just a way for an application to save itself a few bytes.

For the cost of an extra couple bytes per token if you really didn't want to reserve a code, or use a string I guess you could reserve a code <yolo-code><bytes> and do acceptance testing on the tokens that'd also work. Not my recommendation, but doable.

And this also doesn't force everyone to pay the overhead of the multicodec.

I sort of feel for this, but also it's the cost of compatibility. IMO paying a couple of bytes per token is probably worth not having to waste the machine resources of guessing the token type. Does the few bytes really seem that expensive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cbor is exactly as capable as multicodecs at signalling types. Just that in cbor it is optional and you can keep the logic outside of your serialization layer. This is exactly what we've done for years with all our custom IPLD types (things like our take on public encryption keys, public signing keys, champ nodes etc). We haven't bothered to register any multicodecs, because we don't need to, but they are all still typed, and upgradable. The signalling is still there, it's just on the application layer, rather than the bitswap layer. This keeps bitswap (/ipld) as simple and isolated from other concerns or dependencies as possible.

What do you mean by "waste time computing both"? An application can support many auth schemes, even if they all use cbor, so long as the type signature of each is distinguishable. The only "computation" here is trying to parse the token as cbor, which doesn't seem problematic. Compare this with S3 which does more work to check tokens by canonicalising the request headers and parameters before computing an hmac. If we decide we really need this type signalling on the protobuf layer, then we could make it optional in an extra field so not everyone has to pay it.

I guess I'm just saying we should leave the tokens opaque and let the application handle them (as in the earlier proposal ipfs/kubo#7871) which aligns with how we use it in Peergos. This keeps bitswap closer to being a narrow-waist of a protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cbor is exactly as capable as multicodecs at signalling types. Just that in cbor it is optional and you can keep the logic outside of your serialization layer.

Not quite. CBOR is exactly as capable as multicodecs as signaling types if everyone uses the same semantics for signaling types and don't accidentally collide on namespaces. For example we could make the spec declare that tokens look roughly like: { type: <type identifier>, token : <token bytes> } and that would work fine as long as people didn't collide on type identifiers (in libp2p people avoid this with namespacing like /ipfs/bitswap/1.2.0 or /fil/data-transfer/1.0.0.

If we do it the raw bytes way then implementers have to write code like:

for validator in tokenValidators:
    if validator.matches(token) {
         return validator.IsValid(token, CID, peerID)
    }
return "no validator found"

If we do it the explicitly signaled way (whether using a table code, a namespaced string, etc.) we get something like:

if tokenValidators.Contains(token.signal) {
    tokenValidators[token.signal].IsValid(token, CID, peerID)
}

The former requires O(Validators) checks and the latter is O(1). Additionally, the latter can prevent accidental collisions e.g. the same bytes could belong to either TokenType1 or TokenType2 which could mean we have to check validator.IsValid twice, or we're in a weird situation where we've received a token that is accidentally valid which would be a security problem (admittedly this seems unlikely to be a serious problem, but it's easy enough to avoid)

If we decide we really need this type signalling on the protobuf layer, then we could make it optional in an extra field so not everyone has to pay it.

I think this misses the point. The reason to have the signaling is so that systems can have uncoordinated convergence (i.e. someone can choose to use Peergos' auth system if they want to without worrying about how it interacts with any other tokens they support and without requiring an IPFS spec or Peergos spec change).

Without some form of signaling every time I want to add support for a new token type I have to make sure everything still plays together nicely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, if you don't use a multicodec prefix, you either need to:

  1. Get everyone to agree on the same encoding scheme (cbor) and the same layout (e.g., some field to signal the "type").
  2. Do a bunch of duck-typing. I.e., try to decode as cbor, try to fit into the expected type, then pass to the next token handler, etc.

The first case just won't happen. That's significantly harder than getting everyone to prefix their tokens with a unique (per-protocol), fixed, prefix-free code.

In terms of the second case, yes, you can do this. But that leads to ambiguity that can be very difficult to reason about. In that case, I can't just register multiple token handlers, I need to think about how these different handlers might interact.

This is exactly multicodecs exist. It makes it really easy to remove ambiguity without having to get everyone to agree on a complex format, structure, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dignifiedquire #270 (comment)

so I am strongly in favor against requiring cbor.

IIUC no one wants to require cbor here. The discussion is around whether we should prefix with a multicode or use a duck-typing approach.

I was countering the argument in #270 (comment)

cbor is exactly as capable as multicodecs at signalling types

by saying that it's only as capable if everyone agrees to it (as Steven mentioned above) and in that case a varint prefix seems preferable to requiring cbor inside of the protobufs 😄.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're both saying, but this ties a lot of high level application stuff, and a bunch of totally unrelated codecs that will never be used here, into the low level wire format. Thus making the bitswap protocol less of a thin waist protocol and increasingly a large waist with a bunch of constants from random protocols in it. I won't argue it anymore though, it's a stylistic choice basically. Both can function exactly the same performance and code wise (either agreeing on codec, or agreeing on a format and type signalling).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still a pretty significant misunderstanding here.

I see what you're both saying, but this ties a lot of high level application stuff, and a bunch of totally unrelated codecs that will never be used here, into the low level wire format.

Bitswap already uses CIDs and multihashes, which are built on multicodecs.

Thus making the bitswap protocol less of a thin waist protocol and increasingly a large waist with a bunch of constants from random protocols in it.

Bitswap doesn't care about specific multicodecs, just that the token format is <multicodec> ++ data. It just cares that you prefix your token with a varint-encoded "thing".

it's a stylistic choice basically. Both can function exactly the same performance and code wise (either agreeing on codec, or agreeing on a format and type signalling).

It's a pretty important design decision.

  1. Just bytes. This is definitely the simplest, but it doesn't compose. I want to be able to have a single IPFS node integrate fetch blocks from multiple networks with multiple token types.
  2. Typed
    1. CBOR + type signaling
      1. Mandates CBOR (complex, not everyone will want to use it).
      2. Needs a new "table of token types".
    2. Multicodec
      1. Just enough format to unambiguously distinguish between different types. Applications can use whatever they want for the actual token format (JWT, etc.).
      2. Re-uses the table of types we already have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Minor point - it would almost certainly be broken security to send a JWT as the token here for the reasons I've mentioned elsewhere here.


Users who require additional codes for their new token formats should do one of:
- Register their code in the table
- For non-deployed testing purposes only - use a code in the application reserved range of the code table
- Note: if codes in the application reserved range will not be reservable in the code table which means that the code may conflict with another one used in the ecosystem which could cause application problems on collision. It is high recommended to not use these codes outside of testing or early development.

To save space within the message the list of tokens used within the message are declared within the top level message and all other references to tokens are instead to the indices within the token list in the top level message.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @ianopolous:

The other change I would suggest is to allow per cid auth tokens inline, as we do. Otherwise you're forcing someone like us to pay the extra overhead of the indirection (No auth tokens are ever shared between blocks for us, and arguably that would be a vulnerability (e.g. S3 doesn't allow this)). Probably there is a use case for both, especially if you include non capability-based schemes. It should be easy to support both inline tokens and indirect references.

We could add another field at each entry repeated tokens bytes in the Entry, Block and BlockPresence messages. It adds more complexity to save some bytes, I wonder if it's really that helpful. Another byte saving approach which would might save more bytes would be to store another list of the token-codes used in the token list. e.g. if your token code is > 3 bytes and you put it into the token list and then put your per-block tokens into the token list that would probably save you more bytes than this. Also, it seems like if we wanted to support strings/arbitrary bytes as custom "codes" then the above might save a whole bunch of data if we defined the codes as <code-number>[optional varint + data that goes with the code]<data> and stored the first two pieces in the second token list.

Both of these are about trying to shave a few bytes in exchange for some complexity. The initial proposal tried to cut down on the complexity unless the duplicated bytes were really excessive, finding which (if any) other complexity we want to add here though seems like a reasonable thing to discuss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically the simplest proposal is to have an auth string per cid, but you rightly point out that that has a large overhead when an auth string is repeated. If we're bumping the bitswap version number then maybe it's worth considering just gzipping the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're bumping the bitswap version number then maybe it's worth considering just gzipping the message?

I think gzipping would be nice, although it'd be nicer if it could be done at the libp2p layer (e.g. think of how HTTP is allowed to negotiate compression schemes https://en.wikipedia.org/wiki/HTTP_compression). I'm not sure it's worth the complexity to require it in the protocol especially since Bitswap versions tend to hang around for a while which means if we want to change things people will still be supporting this for a long time.

If people feel strongly that we need gzipping though I could get on board with it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be done at the libp2p layer.


### Wire Format

```protobuf
message Message {

message Wantlist {
enum WantType {
Block = 0;
Have = 1;
}

message Entry {
bytes block = 1; // CID of the block
int32 priority = 2; // the priority (normalized). default to 1
bool cancel = 3; // whether this revokes an entry
WantType wantType = 4; // Note: defaults to enum 0, ie Block
bool sendDontHave = 5; // Note: defaults to false
repeated tokens int32 = 7; // the indices of the tokens in the token list
}

repeated Entry entries = 1; // a list of wantlist entries
bool full = 2; // whether this is the full wantlist. default to false
}
message Block {
bytes prefix = 1; // CID prefix (all of the CID components except for the digest of the multihash)
bytes data = 2;
repeated tokens int32 = 4; // the indices of the tokens in the token list
}

enum BlockPresenceType {
Have = 0;
DontHave = 1;
AuthRequired = 2;
BlockTooBig = 3;
}
message BlockPresence {
bytes cid = 1;
BlockPresenceType type = 2;
repeated tokens int32 = 4; // the indices of the tokens in the token list
}

Wantlist wantlist = 1;
repeated Block payload = 3;
repeated BlockPresence blockPresences = 4;
int32 pendingBytes = 5;
repeated bytes tokens = 6; // Each token is of the form <multicode><token-data> where the multicode identifies what the data is for
}
```

## Implementations

- <https://github.com/ipfs/go-bitswap>
Expand Down