From 950bc8443f4c8609b37c9a64bf05e6b7d6d20ebd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 4 Dec 2024 14:56:16 +0100 Subject: [PATCH] FRC: Retrieval Checking Requirements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 259 ++++++++++++++++++++ 1 file changed, 259 insertions(+) create mode 100644 FRCs/frc-retrieval-checking-requirements.md diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md new file mode 100644 index 00000000..9fb18c07 --- /dev/null +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -0,0 +1,259 @@ +--- +fip: "" +title: Retrieval Checking Requirements +author: "Miroslav Bajtoš (@bajtos)" +discussions-to: https://github.com/filecoin-project/FIPs/discussions/1086 +status: Draft +type: FRC +created: 2024-12-02 +# spec-sections: +# - +# - +# requires (*optional): +# replaces (*optional): +--- + + + +# FIP-Number: Retrieval Checking Requirements + +## Simple Summary + + + +In order to make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure, and therefore, we need to measure quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements: + +1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)). +2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). +3. Advertise retrievals to IPNI. +4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize)) + +Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs. + +## Abstract + + + +When we set out to build [Spark](https://filspark.com), a protocol for testing whether _payload_ of Filecoin deals can be retrieved back, we designed it based on how [Boost](https://github.com/filecoin-project/boost) worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that [Venus](https://github.com/filecoin-project/venus) [Droplet](https://github.com/ipfs-force-community/droplet), an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release. + +This FRC has the following goals: +1. Document the retrieval process based on IPFS/IPLD. +2. Specify what Spark needs from miner software. +3. Collaborate with the community to tweak the requirements to work well for all parties involved. +4. Let this spec and the building blocks like [IPNI Reverse Index](https://github.com/filecoin-project/devgrants/issues/1781) empower other builders to design & implement their own retrieval-checking networks as alternatives to Spark. + +## Change Motivation + + +At the moment, the retrieval process for downloading (public) data stored in Filecoin deals is lacking specification and there is very little documentation for SPs on how to correctly configure their operation to provide a good retrieval service. + +The current architecture of Filecoin components does not expose enough data to enable independent 3rd-party networks to sample all data stored in Filecoin deals and check the quality of retrieval service provided by storage providers for data they are persisting. + +Our motivation is to close these gaps by documenting the current IPFS/IPLD-based retrieval process and the additional requirements needed by checker networks to measure retrieval-related service level indicators. + +> [!IMPORTANT] +> We fully acknowledge that the current IPFS/IPLD-based retrieval process may not be sufficient to support all kinds of retrieval clients. For example, warm-storage/CDN offerings may prefer to retrieve a range of bytes in a given Piece CID instead. +> +> Documenting alternative retrieval processes and the requirements for checking service level indicators of such alternatives is out of scope of this FRC. + +## Specification + + +### Retrieval Process + +Let's say we have a public dataset stored on Filecoin, packaged as UnixFS archive with CID `bafybei(...)` and stored on Filecoin in a piece with `PieceCID=baga...` and some `PieceSize`. + +The scope of this document is to support the following retrieval process: + +1. A client wanting to download the dataset identified by CID `bafybei(...)` queries an IPNI instance like [cid.contact](https://cid.contact) to find the nodes providing retrievals service for this dataset. + +2. The client picks a retrieval provider that supports the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). + +3. The client requests the content for CID `bafybei(...)` at the URL (multiaddr) specified by the [IPNI provider result](https://github.com/ipni/specs/blob/12482e4e1bd92a7c6c079bf23f2533a4ddb9e363/IPNI.md#json-find-response) of the selected provider. + +Example IPNI `ProviderResult` describing retrieval provider offering IPFS Trustless HTTP Gateway retrievals: + +```json +{ + "MultihashResults": [{ + "Multihash": "EiAT38UKZPlJfhyZQH8cAMNjUPeKBfQn6HMdiqGZ2xJicA==", + "ProviderResults": [{ + "ContextID": "ZnJpc2JpaQ==", + "Metadata": "oBIA", + "Provider": { + "ID": "12D3KooWC8gXxg9LoJ9h3hy3jzBkEAxamyHEQJKtRmAuBuvoMzpr", + "Addrs": [ + "/dns/frisbii.fly.dev/tcp/443/https" + ] + } + }] + }] +} + +``` + +### Retrieval Requirements + +1. Whenever a deal is activated, the SP MUST advertise all IPFS/IPLD payload block CIDs found in the Piece to IPNI. See the [IPNI Specification](https://github.com/ipni/specs/blob/main/IPNI.md) and [IPNI HTTP Provider](https://github.com/ipni/specs/blob/main/IPNI_HTTP_PROVIDER.md) for technical details. + +2. Whenever SP stops storing a Piece (e.g. because the last deal for the Piece has expired or was slashed), the SP SHOULD advertise removal of all payload block CIDs included in this Piece. + +3. The SP MUST provide retrieval of the IPFS/IPLD payload blocks via the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). + +### Retrieval Checking Requirements + +In addition to the above [retrieval requirements](#retrieval-requirements), SPs are asked to meet the following: + +#### Link on-chain MinerId and IPNI provider identity + +Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact). + +In particular, the value in the IPNI CID query response field `MultihashResults[].ProviderResults[].Provider.ID` must match the value of the `StateMinerInfo` response field `PeerId`. + +> [!NOTE] +> This is open to extensions in the future, we can support more than one form of linking +index-provides to filecoin-miners. See e.g. [ipni/spec#33](https://github.com/ipni/specs/issues/33). + +**Example: miner `f01611097`** + +MinerInfo state: + +```json5 +{ + // (...) + "PeerId": "12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys", + // (...) +} +``` + +IPNI provider status ([query](https://cid.contact/providers/12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys)): + +```json5 +{ + // (...) + "Publisher": { + "ID": "12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys", + "Addrs": [ + "/ip4/76.219.232.45/tcp/24887/http" + ] + }, + // (...) +} +``` + +Example CID query response for IPFS/IPLD payload block stored by this miner ([query](https://cid.contact/cid/bafyreiat37cquzhzjf7bzgkap4oabq3dkd3yubpue7uhghmkugm5wetcoa)): + +```json5 +{ + "MultihashResults": [ + { + "Multihash": "EiAT38UKZPlJfhyZQH8cAMNjUPeKBfQn6HMdiqGZ2xJicA==", + "ProviderResults": [ + // (...) + { + "ContextID": "AXESIFVcxmAvWdc3BbQUKlYcp2Z2DuO2w5Fo4jmIC8IbMX00", + "Metadata": "oBIA", + "Provider": { + "ID": "12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys", + "Addrs": [ + "/dns/cesginc.com/tcp/443/https" + ] + } + } + ] + } + ] +} +``` + +#### Construct IPNI `ContextID` from `(PieceCID, PieceSize)` + +The advertisements for IPNI must deterministically construct the `ContextID` field from the public deal metadata - the tuple `(PieceCID, PieceSize)` - as follows: + +- Use DAG-CBOR encoding ([DAG-CBOR spec](https://ipld.io/specs/codecs/dag-cbor/spec/)) +- The piece information is serialised as an array with two items: + 1. The first item is the piece size represented as `uint64` + 2. The second item is the piece CID represented as a custom tag `42` +- In places where the ContextID is represented as a string, convert the CBOR bytes to string using the hex encoding. + _Note: the Go module https://github.com/ipni/go-libipni handles this conversion automatically._ + +A reference implementation of this serialization algorithm in Go is maintained in [https://github.com/filecoin-project/go-state-types/](https://github.com/filecoin-project/go-state-types/blob/32f613e4d4450b09da3c81982dd6d7dba9c6f6f2/abi/cbor_gen.go#L23-L48). + +**Example** + +Input: + +```json5 +{ + "PieceCID": "baga6ea4seaqpyzrxp423g6akmu3i2dnd7ymgf37z7m3nwhkbntt3stbocbroqdq", + "PieceSize": 34359738368 // 32 GiB +} +``` + +Output - ContextID (hex-encoded, split into two lines for readability): + +``` +821B0000000800000000D82A5828000181E203922020FC66377F35 +B3780A65368D0DA3FE1862EFF9FB36DB1D416CE7B94C2E1062E80E +``` + +Annotated version as produced by https://cbor.me: + +``` +82 # array(2) + 1B 0000000800000000 # unsigned(34359738368) + D8 2A # tag(42) + 58 28 # bytes(40) + 000181E203922020FC66377F35B3780A65368D0DA3FE1862EFF9FB36DB1D416CE7B94C2E1062E80E +``` + +## Design Rationale + + +**_TBD_** + +## Backwards Compatibility + + +[Retrieval Requirements](#retrieval-requirements) document the current status minus Graphsync and Bitswap protocols. + +[Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: miner software must construct IPNI `ContextID` values in a specific way. Because ContextIDs are scoped per piece (not per deal), miner software must de-duplicate advertisements for deals storing the same piece. + +## Test Cases + + + +Not applicable, but see the examples in [Specification](#specification). + +## Security Considerations + + +_TODO: add more details._ + +We trust SPs to honestly advertise Piece payload blocks to IPNI. Attack vector: a malicious SP can always advertise the same payload block for all pieces persisted. + +Free-rider problem when a piece is stored with more than one SP. +Attack vector: When a piece is stored with SP1 and SP2, then SP1 can advertise retrievals with metadata pointing to SP2's multiaddr. + +## Incentive Considerations + + +_TBD_ + +## Product Considerations + + +_TBD_ + +## Implementation + + +_TBD_ + +## TODO + + +_TBD_ + +## Copyright +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).