-
Notifications
You must be signed in to change notification settings - Fork 13
Shared Storage
As basis for las2peer we use a peer to peer (P2P) network with a distributed hash table (DHT). A DHT is a key-value store, used to store arbitrary data. Each data item is assigned with a unique id generated from a given String identifier and inserted into the DHT. The stored entries are distributed over all nodes, and each node persists the data to its host's file system.
On top of the basic system, las2peer adds the following features:
- free definable reader permissions for each stored object protected by strong encryption
- all stored data are signed by their author to restrict manipulations
- changes on the data can be made with the same identifier, while las2peer ensures that the latest version is always returned in a fetch operation
By default, a DHT is a simple key-value store and has no authority. Anyone can write anything into it. This means stored data may be overwritten by anyone. To restrict this, las2peer nodes only accept content changes from the same author as the previous version (if there is one).
For a given identifier there usually exists only one object or one version of this object in the DHT. However, when the object is changed and not all nodes receive this update, data can become inconsistent. To overcome this problem, las2peer stores metadata information since version 0.6, along with each update operation. Therefore, the actual data object is assigned a version number. This number is copied into an immutable metadata artifact and also stored in the DHT. To store metadata artifacts, a special identifier structure is used. The ID is generated from the data object's identifier, concatenated with the current version number. This means each metadata artifact has a unique, yet predictable ID.
First off, the data object (called Envelope) is split up into smaller parts to optimize the usage from the P2P system. This size is configurable on each node and should be dependent on the network infrastructure. Too small parts produce excessive overhead, while too large parts delay the overall fetch operation unnecessary.
After that, these data parts are signed by the author and receive a version number. Either the storing node knows a previous version and increments it or sets the start version number. If an object with that identifier and version already exists, they have the same ID, and the DHT rejects the insert operation. In such cases, the developer has to apply a merging strategy on both objects.
In the fetch operation the request ID is generated from the given identifier. In response, several handles according to the replication factor are returned. The fetching node then queries each handle till the verification and version check succeeds. In earlier versions of las2peer before 0.6 it was unclear, which version was the correct one.
As mentioned before in the fetch operation it's unclear, which version number is the latest one. A node could have missed an update and therefore responds with an old version of the requested object. Additionally, this object is valid in a cryptographic sense, because it is just outdated, but has the correct author and signature.
To determine the latest version, las2peer stores the metadata information with each save operation and never deletes it. The small size of a metadata artifact of only a few bytes makes it possible. In a network, each node usually should provide several megabytes storage space and makes million of update operation possible till the complete space is occupied with metadata information.
This way, a binary search can be done with an ID generated from the given identifier and an increasing version number. First, the search process looks for missing ID entries with duplicating the version number in each lookup request. If the first lookup fails, the process does a binary search in the range between the last successful lookup and the failed one. This way, las2peer can find the latest version number in O(2*log2(latest version)).
The complete search process described above can be executed very fast on the DHT without retrieving the actual data. After the search process, when las2peer knows about the correct version, the actual data object is retrieved. The data are concatenated and deserialized into an object instance. Additionally, version number, signature and author key are checked. If one of these checks fails, las2peer queries another replication from the network. The replication factor and distribution among different nodes makes it very unlikely that no replication object is correct.
Copyright (c) 2020 Advanced Community Information Systems (ACIS) Group, Chair of Computer Science 5 (Databases & Information Systems), RWTH Aachen University, Germany