Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to add raw blocks / raw leaves #853

Open
ProximaNova opened this issue Jan 19, 2025 · 4 comments
Open

Ability to add raw blocks / raw leaves #853

ProximaNova opened this issue Jan 19, 2025 · 4 comments

Comments

@ProximaNova
Copy link

ProximaNova commented Jan 19, 2025

ipwb currently cannot add data as raw blocks. Benefits of raw blocks: they are faster and use up less storage space.

Whenever ipwb adds something to ipfs it uses this python api: https://github.com/ipfs-shipyard/py-ipfs-http-client (ipfshttpclient). Documentation:
https://ipfs.ssi.eecc.de/ipfs/QmSfSEXrkYzsLz4adKJzVe9zj8dT8V6X9rkXdXCenV35pQ/docs/http_client_ref.html

Looking at this section:
https://ipfs.ssi.eecc.de/ipfs/QmSfSEXrkYzsLz4adKJzVe9zj8dT8V6X9rkXdXCenV35pQ/docs/http_client_ref.html#ipfshttpclient.Client.add_bytes

it says nothing about add_bytes() having an option to add raw blocks. Therefore, perhaps use a different thing which can add raw blocks, or modify the source code of that Python API to be able to do that. ipfshttpclient/client/__init__.py defines the add_bytes method/function. As I understand, ipfshttpclient is a wrapper or implementation that communicates with Kubo IPFS's core API. For example, that __init__.py file says

	def add_bytes(self, data: bytes, **kwargs):
		"""Adds a set of bytes as a file to IPFS.
[...]
		"""
		body, headers = multipart.stream_bytes(data, chunk_size=self.chunk_size)
		return self._client.request('/add', decoder='json',
		                            data=body, headers=headers, **kwargs)

	@utils.return_field('Hash')
	@base.returns_single_item(dict)

which looks like an API call. Raw blocks normally look like this: https://gateway.pinata.cloud/ipfs/bafkreibr3eo4ryibk6yueswcgepfybw33nwngs4z4vmr4ck2ky43j7cn64 (starts with "bafk"). Kubo ipfs command to create it: "ipfs add --cid-version=1 setup.cfg".

@ProximaNova
Copy link
Author

ProximaNova commented Jan 19, 2025

Tracking it down:

File py-ipfs-http-client/ipfshttpclient/client/__init__.py --> add_bytes() --> self._client.request('/add', decoder='json',... --> imported file base.py --> self._client = http.build_client_sync(... --> imported file http.py --> def build_client_sync(...> ClientSyncBase[ty.Any]: --> imported file http_common.py --> class ClientSyncBase(ty.Generic[S], metaclass=abc.ABCMeta): --> said to be "An HTTP client for interacting with the IPFS daemon".

Details:

ipfshttpclient/client/__init__.py has a thing to define "base", something like "api/v0". IIRC, it has a thing which defines address, that'll be http://10.0.0.50:5001 for this example. The "add" endpoint in the HTTP/RPC API:

Conclusion:

It might be easy to modify python codebase ipfshttpclient to add that functionality. Something like self._client.request('/add?raw-leaves=true&pin=false'...

@ProximaNova
Copy link
Author

It might be easy to modify python codebase ipfshttpclient to add that functionality. Something like self._client.request('/add?raw-leaves=true&pin=false'...

Luckily, it was that easy. As of writing this, I made two edits to https://github.com/ProximaNova/py-ipfs-http-client which are seen at:

  1. ipfs-shipyard/py-ipfs-http-client@c191872
  2. ipfs-shipyard/py-ipfs-http-client@acac2b5...44b4345

Neither are pulled into the official ipfs-shipyard/py-ipfs-http-client repo. It may not make since for them to be accepted into that as add() can add data as raw blocks. add_bytes() couldn't do that until I made those edits, which makes sense for this project (oduwsdl/ipwb). Explaining both commits:

  1. c191872706e1118d2cd76ea326a2a8d580899353: makes add_bytes() only add raw blocks - for testing purposes (the next commit is more production-focused)
  2. acac2b5d5251774b6b5cdff1fee0b45abbff5913...44b4345c715bc52d114dbf942a5bf041353f30fb: add_bytes() acts as it always has, function add_raw_bytes() was added and only that adds raw bytes/blocks.

@machawk1
Copy link
Member

machawk1 commented Jan 22, 2025

@ProximaNova Thanks for reporting and spearheading the issue w/ the library.

What is the use case for ipwb to add raw blocks/leaves? I'm unfamiliar and will do some reading on them but did not want a response to fall through the cracks.

Also pinging @ibnesayeed for his 2¢, as he might be a bit more familiar.

@ibnesayeed
Copy link
Member

Thanks for looking into this. Being able to store raw blocks of desired sizes has other potential benefits of sub-resource deduplication. This ability has been available in the JavaScript library for a while, but when we worked on this piece of code earlier, the Python library did not have such features, so we opted for the easy route that works for the proof-of-concept. Our next-generation decentralized web archiving system called IPARO (not implemented yet) has some discussions around storing slices of objects strategically for better storage optimization.

That said, if the upstream library accepts your changes, we will be happy to see a PR in this repo to test the implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants