Internet-Draft | The R5N Distributed Hash Table | September 2024 |
Schanzenbach, et al. | Expires 6 March 2025 | [Page] |
This document contains the R5N DHT technical specification. R5N is a secure distributed hash table (DHT) routing algorithm and data structure for decentralized applications. It features an open peer-to-peer overlay routing mechanism which supports ad-hoc permissionless participation and support for topologies in restricted-route environments. Optionally, the paths data takes through the overlay can be recorded, allowing decentralized applications to use the DHT to discover routes.¶
This document defines the normative wire format of protocol messages, routing algorithms, cryptographic routines and security considerations for use by implementers.¶
This specification was developed outside the IETF and does not have IETF consensus. It is published here to guide implementation of R5N and to ensure interoperability among implementations including the pre-existing GNUnet implementation.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 6 March 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This specification describes the protocol of R5N. R5N is a Distributed Hash Table (DHT). The name is an acronym for "randomized recursive routing for restricted-route networks" and its first academic description can be found in [R5N].¶
DHTs are a key data structure for the construction of decentralized applications and generally provide a robust and efficient means to distribute the storage and retrieval of key-value pairs.¶
The core idea behind R5N is to combine a randomized routing algorithm with an efficient, deterministic closest-peer algorithm. This allows us to construct an algorithm that is able to escape and circumvent restricted route environments while at the same time allow for a logarithmically bounded routing complexity.¶
R5N also includes advanced features like recording the path a key-value pair took through the network, response filters and on-path application-specific data validation.¶
This document defines the normative wire format of peer-to-peer messages, routing algorithms, cryptographic routines and security considerations for use by implementors.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
DHTs usually operate as overlay networks consisting of peers communicating over the existing Internet. Hence canonical DHT designs often assume that the IP protocol provides the peers of the overlay with unrestricted end-to-end pairwise connectivity. However, in practice firewalls and network address translation (NAT) [RFC2663] make it difficult for peers operating on consumer end-devices to directly communicate, especially in the absence of core network infrastructure enabling NAT traversal via protocols such as interactive connectivity establishment (ICE) [RFC5245].¶
Furthermore, not all peer-to-peer networks consistently operate over the Internet, such as mobile ad-hoc networks (MANETs). While routing protocols have been designed for such networks ([RFC3561]) these generally have issues with security in the presence of malicious participants, as they vulnerable to impersonation attacks. The usual solution to these issues is to assert that the entire MANET is a closed network and to require authentication on all control messages. In contrast, the system model for R5N is that of an open network without any kind of authorities that could restrict access only to trusted participants.¶
We assume that the network is open and thus a fraction of the participating peers is malicious. Malicious peers may create, alter, delay or drop messages. We also assume that an adversary can control (or fake) many peers [Sybil], thus any kind of voting or punishment of malicious peers would be rather pointless.¶
Honest peers are expected to establish and maintain many connections. We assume that as a result the adversary is generally unable to prevent honest peers from maintaining a sufficient number of direct connections with other honest peers to achieve acceptable performance. As the number of malicious peers and their connections increases, performance of the system should gracefully degrade, and only collapse for peers that an adversary has fully isolated from the benign network.¶
The main security objectives are to provide honest nodes correct results and to limit the propagation of invalid data. Invalid data includes both invalid key-value pairs as well as invalid routing path data if such routing meta-data is present. While malicious nodes may make up arbitrary key-value pairs and paths within the adversary's domain, invalid key-value pairs are ideally discarded at the first honest node, and path data honestly state entry- and exit-points from the honest network into the subset of malicious nodes.¶
Malicious nodes may attempt to exhaust the storage capacity of honest nodes by distributing well-formed (but possibly otherwise useless) application data. We assume that storage space is relatively cheap compared to bandwidth and that honest nodes also frequently re-publish the useful data that they publish. As a result, an adversary may reduce the effectiveness and longevity of data cached in the DHT, but is assumed to not be able to effectively prevent publication and retrieval of application data by honest nodes.¶
An address is a UTF-8 [RFC3629] string which can be used to address a peer through the Underlay (Section 5). The format of an address is not enforced by this specification, but it is expected that in most cases the address is a URI [RFC3986].¶
HELLO block
is a type of block that is used to store and retrieve addresses of a peer.
It are used by the peer discovery mechanism in Section 6.2.¶
HELLO
URLs are HELLO
blocks represented as URLs.
They are used for out-of-band exchanges of peer addresses
and for signalling address updates to neighbours.
Implementation details of HELLO URLs and examples are found in Appendix C.¶
blocks
can be
stored under the same key. A peer identity is also a key
.
In the context of "key-value stores" this
refers to "key" under which blocks are stored.¶
Restricted-route topologies emerge when a connected underlay topology prevents (or restricts) direct connections between some of the nodes. This commonly occurs through the use of NAT ([RFC2663]). Nodes operated behind a NAT cause common DHT routing algorithms such as Kademlia [Kademlia] to exhibit degraded performance or even to fail. While excluding such nodes is an option, this limits load distribution and is ineffective for some networks, such as MANETs.¶
In general, nodes may not be mutually reachable (for example due to a firewall or NAT) despite being "neighbours" according to the routing table construction algorithm of a particular DHT. For example, Kademlia uses the XOR metric and would generally connect nodes that have peer identities with a small XOR distance. However, the XOR distance between (basically randomly assigned) peer identities is completely divorced from the ability of the nodes to directly communicate. DHTs usually use greedy routing to store data at the peer(s) closest to the key. In cases where a DHT cannot connect peers according to the construction rules of its routing algorithm, the topology may ends up with multiple (local) minima for a given key. Using canonical greedy routing from a particular fixed location in the network, a node may then only be able to publish and retrieve data in the proximity of its local minima.¶
R5N addresses this problem by prepending a random
walk before a classical, deterministic XOR-based routing
algorithm is employed. The optimal number of random hops
taken is equal to the mixing time of the graph. The mixing
time for various graphs is well known; for small-world
networks [Smallworld], the mixing time has
been shown to be around O(log n)
where n
is the number of nodes in the network
[Smallworldmix].¶
Thus, if the network exhibits the properties of a small
world topology [Smallworld], a random walk of
length O(log n)
will cause the algorithm to land on
a random node in the network. Consequently, the
deterministic part of the algorithm will encounter a random
local minimum. It is then possible to repeat this process
in order to store or retrieve data in the context of all or
at least multiple local minima. The ideal length of the
random walk and the number of repetitions expected to cover
all local minima depends on the network topology. Our
design assumes that the benign subset of the network forms a
small-world topology [Smallworld] and then
obtains an estimate of the current number of nodes
n
in the network and then uses log n
for
the actual length of the random walk.¶
[RFC6940] specifies the RELOAD DHT. The R5N DHT described in this document differs from RELOAD in its objectives and thus in its design. The authors of RELOAD make the case that P2P networks are often established among a set of peers that do not trust each other. It addresses this issue by requiring that node identifiers are either assigned by a central authority, or self-issued in the case of closed networks. In other words, by enforcing the P2P network to be established among a set of trusted peers. This misses the point that this openness is a core requirement of efficient and useful DHTs as they serve a fundamental part in a decentralized network infrastructure. R5N, by contrast, is intended for open overlay networks, and thus does not include a central enrollment server to certify participants and does not limit participation in another way. As participants could be malicious, R5N includes on-path customizable key-value validation to delete malformed data and path randomiziation to help evade malicious peers. R5N also expects to perform over a network where not every peer can communicate with every other peer, and where thus its route discovery feature provides utility to higher-level applications. As a result, both the features and the security properties of RELOAD and R5N are different, except in that both allow storing and retrieving key-value pairs.¶
In R5N peers provide to their applications the two fundamental core operations of any DHT:¶
PUT
: This operation stores a block
under a key on one or more peers with
the goal of making the block availiable for queries using the GET
operation.
In the classical definition of a dictionary interface, this operation would be
called "insert".¶
GET
: This operation queries the network of peers for any number of blocks
previously stored under or near a key.
In the classical definition of a dictionary interface, this operation would be
called "find".¶
An example for possible semantics of the above operations provided as an API to applications by an implementation are outlined in Appendix B.¶
A peer does not necessarily need to expose the above operations to applications, but it commonly will. A peer that does not expose the above operations could be a host purely used for bootstrapping, routing or supporting the overlay network with resources.¶
Similarly, there could be hosts on the network that participate in the DHT but do not route traffic or store data. Examples for such hosts would be mobile devices with limited bandwidth, battery and storage capacity. Such hosts may be used to run applications that use the DHT. However, we will not refer to such hosts as peers.¶
In a trivial scenario where there is only one peer (on the local host), R5N operates similarly to a dictionary data structure. However, the default use case is one where nodes communicate directly and indirectly in order to realize a distributed storage mechanism. This communication requires a lower-level peer addressing and message transport mechanism such as TCP/IP. R5N is agnostic to the underlying transport protocol which is why this document defines a common addressing and messaging interface in Section 5. The interface provided by this underlay is used across the specification of the R5N protocol. It also serves as a set of requirements of possible transport mechanisms that can be used to implement R5N with. That being said, common transport protocols such as TCP/IP or UDP/IP and their interfaces are suitable R5N underlays and are used as such by existing implementations.¶
Specifics about the protocols of the underlays implementing the underlay interface or the applications using the DHT are out of the scope of this document.¶
To establish an initial connection to a network of R5N peers, at least one initial, addressable peer is required as part of the bootstrapping process. Further peers, including neighbors, are then learned via a peer discovery process as defined in Section 6.2.¶
Across this document, the functional components of an R5N implementation are divided into routing (Section 6), message processing (Section 7) and block storage (Section 8). Figure 1 illustrates the architectural overview of R5N.¶
A peer MUST support one or more underlay protocols. Peers supporting multiple underlays effectively create a bridge between different networks. How peers are addressed in a specific underlay is out of scope of this document. For example, a peer may have a TCP/IP address, or expose a QUIC endpoint, or both. While the specific addressing options and mechanisms are out of scope for this document, it is necessary to define a universal addressing format in order to facilitate the distribution of address information to other peers in the DHT overlay. This standardized format is the HELLO block (described in Section 8.2), which contains sets of addresses. If the address is a URI, it may indicate which underlay understands the respective address.¶
It is expected that the underlay provides basic mechanisms to manage peer connectivity and addressing. The essence of the underlay interface is captured by the following set of API calls:¶
TRY_CONNECT(P, A)
P
using the given address A
.
If the connection attempt is successful, information on the
new peer connection will be offered through the
PEER_CONNECTED
signal.¶
HOLD(P)
P
. Underlays are usually limited in the number
of active connections. With this function the DHT can indicate to the
underlay which connections should preferably be preserved.¶
DROP(P)
P
. This call is only there for symmetry and
used during the peer's shutdown to release all of the remaining
HOLDs
.
As R5N always prefers the longest-lived
connections, it would never drop an active connection that it
has called HOLD()
on before. Nevertheless, underlay implementations
should not rely on this always being true. A call to DROP()
also
does not imply that the underlay must close the connection: it merely
removes the preference to preserve the connection that was established
by HOLD()
.¶
SEND(P, M)
M
to a peer P
. Sending messages is expected
to be done on a best-effort basis, thus the underlay does not
have to guarantee delivery or message ordering. If the underlay
implements flow- or congestion-control, it may
discard messages to limit its queue size.¶
ESTIMATE_NETWORK_SIZE() -> L2NSE
L2NSE
value must be the base-2 logarithm
of the estimated number of peers in the network.
This estimate is used by the routing algorithm. If the underlay does
not support a protocol for network size estimation (such as
[NSE]) the value is assumed to be provided as
a configuration parameter to the underlay implementation.¶
The above calls are meant to be actively executed by the implementation as part of the peer-to-peer protocol. In addition, the underlay creates signals to drive updates of the routing table, local storage and message processing (Section 7). Specifically, the underlay is expected to emit the following signals (usually implemented as callbacks) based on network events observed by the underlay implementation:¶
PEER_CONNECTED -> P
P
. Such an event triggers, for example,
updates in the routing table and gossiping of HELLOs to that
peer. Underlays may include meta-data about the connection,
for example to indicate that the connection is from a
resource-constrained host that does not intend to function
as a full peer and thus should not be considered
for routing.¶
PEER_DISCONNECTED -> P
ADDRESS_ADDED -> A
A
was added for our local peer and that henceforth the peer
may be reachable under this address. This information is
used to advertise connectivity information about the local
peer to other peers. A
is an
address suitable for inclusion in a HELLO
payload
Section 8.2.¶
ADDRESS_DELETED -> A
A
was removed from the set of addresses the local peer is
possibly reachable under. The signal is used
to stop advertising this address to other peers.¶
RECEIVE -> (P, M)
M
was received from a peer P
.¶
To enable routing, any R5N implementation must keep
information about its current set of neighbors.
Upon receiving a connection notification from the
underlay interface through a
PEER_CONNECTED
signal, information on the new neighbor
MUST be added to the routing table, except if the
respective k
-bucket in the routing table is full or if meta-data
is present that indicates that the peer does not wish to participate
in routing.
Peers added to the routing table SHOULD be signalled to the
underlay as important connections using a HOLD()
call.
Similarly when a disconnect is indicated by the underlay through
a PEER_DISCONNECTED
signal, the peer
MUST be removed from the routing table.¶
To achieve logarithmically bounded routing performance, the data structure for managing neighbors and their metadata MUST be implemented using the k-buckets concept of [Kademlia] as defined in Section 6.1. Maintenance of the routing table (after bootstrapping) is described in Section 6.2.¶
Unlike [Kademlia], routing decisions in R5N are also influenced by a Bloom filter in the message that prevents routing loops. This data structure is discussed in Section 6.3.¶
In order to select peers which are suitable destinations for
routing messages, R5N uses a hybrid approach: Given
an estimated network size L2NSE
retrieved using
ESTIMATE_NETWORK_SIZE()
, the peer selection for the
first L2NSE
hops is random. After the initial
L2NSE
hops, peer selection follows an XOR-based peer
distance calculation.
Section 6.4
describes the corresponding routing functions.¶
Finally, each ResultMessage
is routed back along the
path that the corresponding GetMessage
took
previously. This is enabled by tracking state per
GetMessage
in the pending table described in
Section 6.5.¶
Whenever a PEER_CONNECTED
signal is received from
the underlay, the respective peer is considered for
insertion into the routing table. The routing table
consists of an array of k
-buckets. Each
k
-bucket contains a list of neighbors.
The i-th k
-bucket stores neighbors whose peer
public keys are between XOR-distance 2i and
2i+1 from the local peer; i
can be
directly computed from the two peer identities using the
GetDistance()
function. System constraints will
typically force an implementation to impose some upper limit
on the number of neighbors kept per
k
-bucket. Upon insertion, the implementation
MUST call HOLD()
on the respective
neighor.¶
Implementations SHOULD try to keep at least
5 entries per k
-bucket. Embedded systems that cannot manage
this number of connections MAY use connection-level
signalling to indicate that they are merely a client utilizing a
DHT and not able to participate in routing. DHT peers receiving
such connections MUST NOT include connections to
such restricted systems in their k
-buckets, thereby effectively
excluding them when making routing decisions.¶
If a system hits constraints with respect to
the number of active connections, an implementation
MUST evict neighbours from those k
-buckets with the
largest number of neighbors. The eviction strategy MUST be
to drop the shortest-lived connection per k
-bucket first.¶
Implementations MAY cache valid addresses of disconnected
peers outside of the routing table and sporadically or periodically try to (re-)establish connection
to the peer by making TRY_CONNECT()
calls to the underlay interface
if the respective k
-bucket has empty slots.¶
Initially, implementations require at least one initial connection to a
neighbor (signalled through
PEER_CONNECTED
).
The first connection SHOULD be established
by an out-of-band exchange of the information from a
HELLO
block.
This is commonly achieved through the
configuration of hardcoded bootstrap peers or bootstrap
servers either for the underlay or the R5N
implementation.¶
Implementations MAY have other means to achieve this
initial connection.
For example, implementations could allow the application or
even end-user to provide a working HELLO
which is then in turn used to call TRY_CONNECT()
on
the underlay in order to trigger a subsequent
PEER_CONNECTED
signal from the underlay
interface.
Appendix C specifies
a URL format for encoding HELLO blocks as text strings. The
URL format thus provides a portable, human-readable,
text-based serialization format that can, for example, be
encoded into a QR code for dissemination.
HELLO URLs SHOULD be supported by implementations for
both import and export of HELLOs
.¶
To discover additional peers for its routing table, a peer
MUST initiate GetMessage
requests
(see Section 7.4) asking for blocks of type
HELLO
using its own peer identity in the
QUERY_HASH
field of the message. The
PEER_BF
field of the GetMessage
MUST be
initialized to filter the peer's own peer identity as well as the peer
identities of all currently connected
neighbors. These requests MUST use
the FindApproximate
and
DemultiplexEverywhere
flags. FindApproximate
will ensure that other peers
will reply with results where the keys are merely considered
close-enough, while DemultiplexEverywhere
will
cause each peer on the path to respond if it has relevant
information. The combination of these flags is thus likely
to yield HELLOs
of peers that are useful somewhere
in the initiator's routing table. The RECOMMENDED
replication level to be set in the REPL_LVL
field
is 4. The size and format of the result filter is specified
in Section 8.2. The XQUERY
MUST be empty.¶
In order to facilitate peers answering requests for
HELLOs
, the underlay is expected to provide the
implementation with addresses signalled through
ADDRESS_ADDED
. It is possible that no addresses are
provided if a peer can only establish outgoing connections
and is otherwise unreachable. An implementation
MUST advertise its addresses periodically to
its neighbors through HelloMessages
. The
advertisement interval and expiration SHOULD
be configurable.
If the values are chosen at the discretion of the
implementation, it is RECOMMENDED to choose external
factors such as expiration of DHCP leases to determine the values.
The specific frequency of advertisements
SHOULD be smaller than the expiration
period.
It MAY additionally depend on available bandwidth,
the set of already connected neighbors, the workload of the system and
other factors which are at the discretion of the developer.
If HelloMessages
are not updated before they expire,
peers might be unable to discover and connect to the respective
peer, and thus miss out on quality routing table entries. This
would degrade the performance of the DHT and SHOULD
thus be avoided by advertising updated HELLOs
before the
previous one expires. When using unreliable underlays, an implementation
MAY use higher frequencies and transmit
more HelloMessages
within an expiration interval
to ensure that neighbours almost always have non-expired
HelloMessages
at their disposal even if some messages
are lost.¶
Whenever a peer receives such a HelloMessage
from another peer that is already in the routing
table, it must cache it as long as that peer remains in its
routing table (or until the HELLO
expires) and
serve it in response to GET
requests for
HELLO
blocks (see Section 7.4.3). This behaviour makes it
unnecessary for peers to initiate dedicated
PutMessages
containing HELLO
blocks.¶
As DHT GetMessages
and PutMessages
traverse a random path through the network for the first
L2NSE
hops, a key design objective of
R5N is to avoid routing loops. The peer Bloom
filter is part of the routing metadata in messages to
prevent circular routes. It is updated at each hop where the
hop's peer identity derived from the peer's public key is
added to it.
The peer Bloom filter follows the definition in Appendix A.
It MUST be L=1024
bits
(128 bytes) in size and MUST set k=16
bits per
element.
The set of elements E
consists of of all possible 256-bit peer
public keys and the mapping function M
is defined as follows:¶
M(e) -> SHA-512 (e) as uint32[]
¶
The element e
is the peer public key which is hashed using SHA-512.
The resulting 512-bit peer identity is interpreted as an array of k=16
32-bit integers in network byte order which are used to set and check the bits
in B
using BF-SET()
and BF-TEST()
.¶
At this size, the Bloom filter reaches a false-positive rate of
approximately fifty percent at about 200 entries. For peer
discovery where the Bloom filter is initially populated with
peer identities from the local routing table, the 200
entries would still be enough for 40 buckets assuming 5
peers per bucket, which corresponds to an overlay network
size of approximately 1 trillion peers. Thus,
L=1024
bits should suffice for all conceivable
use-cases.¶
For the next hop selection in both the random
and the deterministic case, any peer which is in the peer
Bloom filter for the respective message is excluded from the
peer selection process.
Any peer which is forwarding GetMessages
or PutMessages
(Section 7) thus adds its own peer public key to the
peer Bloom filter.
This allows other peers to (probabilistically) exclude already
traversed peers when searching for the next hops in the routing table.¶
We note that the peer Bloom filter may exclude peers due to false-postive matches. This is acceptable as routing should nevertheless terminate (with high probability) in close vicinity of the key. Furthermore, due to the randomization of the first L2NSE hops, it is possible that false-positives will be different when a request is repeated.¶
Using the data structures described so far, the R5N routing component provides the following functions for message processing (Section 7):¶
GetDistance(A, B) -> Distance
SelectClosestPeer(K, B) -> N
N
from our
routing table with the shortest XOR-distance to the key K
.
This means that for all other peers N'
in the routing table
GetDistance(N, K) < GetDistance(N',K)
.
Peers with a positive test against the peer Bloom
filter B
are not considered.¶
SelectRandomPeer(B) -> N
N
from
all neighbors.
Peers with a positive test in the peer Bloom
filter B
are not considered.¶
SelectPeer(K, H, B) -> N
N
depending on the
number of hops H
parameter.
If H < NETWORK_SIZE_ESTIMATE
returns SelectRandomPeer(B)
, and otherwise
returns SelectClosestPeer(K, B)
.¶
IsClosestPeer(N, K, B) -> true | false
N
is the closest peer for K
(cf. SelectClosestPeer(K, B)
).
Peers with a positive test in the Bloom filter B
are not considered.¶
ComputeOutDegree(REPL_LVL, HOPCOUNT, L2NSE) -> Number
This function computes the number of neighbors
that a message should be forwarded to. The arguments
are the desired replication level (REPL_LVL
),
the HOPCOUNT
of the message so far and
and the current network size estimate (L2NSE
)
as provided by the underlay.
The result is the non-negative number of next hops to
select. The following figure gives the
pseudocode for computing the number of neighbors
the peer should attempt to forward the message to.¶
The above calculation of FRAC
may yield values that are
not discrete.
The result is FRAC
rounded probabilistically
(PROUND
) to the nearest
discrete value, using the fraction
as the probability for rounding up.
For example, a value of 3.14
is rounded up to 4
with
a probability of 14% and rounded down to 3
with a probability
of 86%.
This probabilistic rounding is necessary to achieve
the statistically expected value of the replication
level and average number of peers a message is forwarded to.¶
R5N performs stateful routing where the messages
only carry the query hash and do not encode the ultimate
source or destination of the request. Routing a request
towards the key is done hop-by-hop using the routing table and the
query hash. The pending table is used to route responses
back to the originator. In the pending table each peer
primarily maps a query hash to the associated
originator of the request. The pending table MUST
store entries for the last MAX_RECENT
requests
the peer has encountered.
To ensure that the peer does
not run out of memory, information about older requests
MAY be discarded.
The value of MAX_RECENT
MAY be configurable
at the host level to use available memory resources without
conflicting with other system requirements and limitations.
MAX_RECENT
SHOULD
be at least 128 * 103.
If the pending table is smaller, the likelihood grows that a peer
receives a response to a query but is unable to forward it to the
initiator because it forgot the predecessor. Eventually, the
initiator would likely succeed by repeating the query.
However, this would be much more expensive than peers having an adequately
sized pending table.¶
For each entry in the pending table, the DHT MUST track the query key, the peer identity of the previous hop, the extended query, requested block type, flags, and the result filter. If the query did not provide a result filter, a fresh result filter MUST still be created to filter duplicate replies. Details of how a result filter works depend on the type, as described in Section 8.1.¶
When a second query from the same origin for the same query hash is received, the DHT MUST attempt to merge the new request with the state for the old request. If this is not possible (say because the MUTATOR differs), the existing result filter MUST be discarded and replaced with the result filter of the incoming message.¶
We note that for local applications, a fixed limit on the number of concurrent requests may be problematic. Hence, it is RECOMMENDED that implementations track requests from local applications separately and preserve the information about requests from local applications until the local application explicitly stops the request.¶
An implementation will process
messages either because it needs to transmit messages as part of routing
table maintenance, or due to requests from local applications, or
because it received a message from a neighbor.
If instructed through an application-facing API such as the one outlined
in Appendix B, a peer acts as an initiator
of GetMessages
or PutMessages
.
The status of initiator is relevant for peers when processing ResultMessages
due to the required handover of results to the application that requested
the respective result.¶
The implementation MUST listen for RECEIVE(P, M)
signals
from the underlay and react to the respective messages sent by
the peer P
.¶
Whether initiated locally or received from a neighbor, an implementation
processes messages according to the wire formats and the required
validations detailed in the following sections.
Where required, the local peer public key is referred to as SELF
.¶
This section describes some data structures and fields shared by various types of messages.¶
Flags is an 8-bit vector representing binary options. Each flag is represented by a bit in the field starting from 0 as the rightmost bit to 7 as the leftmost bit.¶
GetMessages
, or respectively cache the
block in their local storage for PutMessages
and ResultMessages
.¶
GetMessage
.¶
If the RecordRoute
flag is set, the route of a PutMessage
or a ResultMessage
through the overlay network is recorded in the
PATH
field of the message. PATH
is a list of path elements.
A new path element (Section 7.1.3) is appended to the
existing PATH
before a peer sends the message to the next peer.¶
A path element contains a signature and the public key of the peer that
created the element. The signature is computed over the public keys of the
previous peer (from which the message was received) and next peer (the
peer the message is send to). A new message has no previous peer and
uses all ZEROs
(32 NULL-bytes) in the
public key field when creating the signature.¶
Assuming peer A sends a new PUT
message to peer B, which forwards
the message to peer C, which forwards to peer D which finally stores the data.
The PATH
field of the message received at peer D contains three
path elements (build from top to bottom):¶
Note that the wire format of PATH
(Section 7.1.3)
will not include the last public key (Pub C in our example) as this will be
redundant; the receiver of a message can use the public key of the sender as
the public key to verify the last signature.¶
The PATH
is stored along with the payload data from the PUT
message at the final peer. Note that the same payload stored at different
peers will have a different PATH
associated with it.¶
When the storing peer delivers the data based on a GET
request, it
initializes the PATH
field with the stored path value and appends
a new path element. The first part of PATH
in a GET
response
message is called the PutPath
, followed
by the GetPath
. This way the combined PATH
will record the
whole route of the payload from the originating peer (initial
PutMessage
) to the requesting peer (initial GetMessage
).¶
When receiving a message with flag RecordRoute
and PATH
,
a peer is encouraged to verify the integrity of PATH
(if the
available resources of the peer allow this) by checking the signatures of
the path elements.¶
If an invalid signature is detected, the path is truncated keeping only
element fields following the faulty signature and setting the Truncated
flag. Assume that peer C detects a faulty signature from peer B,
the trunacted path has two entries:¶
The Truncated
flag indicates that the first path element does not contain
a signature but only the public key of the peer where the signature fails.¶
A path element represents a hop in the path a message has taken through the overlay network. The wire format of a path element is illustrated in Figure 5.¶
where:¶
An ordered list of path elements may be appended to any routed
PutMessages
or ResultMessages
.
The last signature (after which the peer public key is omitted)
is created by the current hop only after the peer made its routing
decision identifiying the successor peer. The peer public key is not
included after the last signature as it must be that of the sender of
the message and including it would thus be redundant.
Similarly, the predecessor of the first element of
an untruncated path is not stated explicitly, as it must be ZERO
(32 NULL-bytes).¶
Figure 6 shows the wire format of an example
path from peer A over peers B and C and D as it would be received by peer E in the
PUTPATH
of a PutMessage
, or as the combined
PUTPATH
and GETPATH
of a ResultMessage
.
The wire format of the path elements allows a natural
extension of the PUTPATH
along the route of the ResultMessage
to the destination forming the GETPATH
.
The PutMessage
would indicate in the PATH_LEN
field
a length of 3.
The ResultMessage
would indicate a path length of 3 as the
sum of the field values in PUTPATH_L
and GETPATH_L
.
Basically, the last signature does not count for the path length.¶
A path may be truncated in which case the signature of the truncated
path element is omitted leaving only the public key of the peer preceeding
the truncation which is required for the
verification of the subsequent path element signature.
Such a truncated path is indicated with the respective
truncated flag (Section 7.1.1).
For truncated paths, the peer public key of the signer of the last path element is
again omitted as it must be that of
the sender of the PutMesssage
or ResultMessage
. Similarly,
the public key of the receiving peer used in the last path element is omitted as
it must be SELF.
The wire format of a truncated example path from peers B over C and D to E
(possibly still originating at A, but the origin is unknowable to E due to truncation)
is illustrated in Figure 7.
Here, a ResultMessage
would indicate in the PATH_LEN
field
a length of 1 while
a PutMessage
would indicate a length of 1 as the sum of
PUTPATH_L
and GETPATH_L
fields.
Basically, the truncated peer and the last signature do not count
for the path length.¶
The SIGNATURE field in a path element covers a 64-bit contextualization header, the the block expiration, a hash of the block payload, as well as the predecessor peer public key and the peer public key of the successor that the peer making the signature is routing the message to. Thus, the signature made by SELF basically says that SELF received the block payload from PEER PREDECESSOR and has forwarded it to PEER SUCCESSOR. The wire format is illustrated in Figure 8.¶
When the underlay signals the implementation of added or
removed addresses through ADDRESS_ADDED
and
ADDRESS_DELETED
an implementation
MUST disseminate those changes to neighbors
using HelloMessages
(as already discussed in
section Section 6.2). HelloMessages
are used to inform neighbors of a peer about the sender's
available addresses. The recipients use these messages to
inform their respective underlays about ways to sustain the
connections and to generate HELLO
blocks (see Section 8.2) to answer peer discovery queries
from other peers.¶
where:¶
HelloMessage
. Must be zero.
In the future, this may be used to extend or update the HelloMessage
format.¶
ADDRESSES
field.¶
HELLO
block as described in Section 8.2.¶
HELLO
URL).
Stored in network byte order.¶
NUM_ADDRS
addresses which can be used to contact the peer.
Each address MUST be 0-terminated.
If NUM_ADDRS = 0
then this field is omitted (0 bytes).¶
If the initiator of a HelloMessage
is SELF
, the message
is simply sent to all neighbors P
currently in the routing table
using the SEND()
function of the underlay as defined in
Section 5.¶
Upon receiving a HelloMessage
from a peer P
an implementation MUST process it step by step as follows:¶
P
is not in its routing table, the message is discarded.¶
HelloMessage
is used to synthesize a block of type HELLO
(Section 8.2). The block is cached in
the routing table until it expires, or the peer is
removed from the routing table, or the information is
replaced by another message from the peer. The
implementation SHOULD instruct the
underlay to connect to all now available addresses using
TRY_CONNECT()
in order to make the underlay
aware of alternative addresses for this connection and
to maintain optimal connectivity.¶
HelloMessages
MUST NOT
be forwarded.¶
PutMessages
are used to store information at other
peers in the DHT. Any application-facing API which allows
applications to initiate PutMessages
to store data
in the DHT needs to receive sufficient, possibly
implementation-specific information to construct the initial
PutMessage
. In general, application-facing APIs
supporting multiple applications and block types need to be
given the block type (BTYPE
) and message
FLAGS
in addition to the actual BLOCK
payload. The BLOCK_KEY
could be provided
explicitly, or in some cases might be derived using the
DeriveBlockKey()
function from the block type
specific operations defined in Section 8.1.¶
where:¶
PUTPATH
. As PUTPATH
is optional, this value may be zero. If the PUTPATH
is
enabled, set initially to zero by the initiator. Updated
by processing peers to match the path length in the
message.¶
BF-SET()
.¶
PutMessage
wants to store content
under.
Set by the initiator. Read-only.¶
Truncated
flag is set
in FLAGS
. If present, this is the public key of
the peer just before the first entry on the
PUTPATH
and the first peer on the
PUTPATH
is not the actual origin of the
message. Thus, to verify the first signature on the
PUTPATH
, this public key must be used. Note
that due to the truncation, this last hop cannot be
verified to exist. Value is modified by processing
peers.¶
PUT
path.
The path consists of a list of PATH_LEN
path elements.
Set by the initiator to zero.
Incremented by processing peers.¶
RecordRoute
flag
is set in FLAGS
. If present, this is
an EdDSA signature [ed25519] by the sender of this message
(using the same format as the signatures in PUTPATH)
affirming that the sender forwarded the message from
the predecessor (all zeros if PATH_LEN
is zero,
otherwise the last peer in PUTPATH
) to
the target peer.
Modified by processing peers (if flag is set).¶
BTYPE
field. The length is
determined by MSIZE
minus the size of all of
the other fields. Set by the initiator. Read-only.¶
Upon receiving a PutMessage
from a peer P
,
or created through initiation by an overlay API,
an implementation MUST process it step by step as follows:¶
EXPIRATION
field is evaluated.
If the message is expired, it MUST be discarded.¶
BTYPE
is ANY
, then the message
MUST be discarded. If the BTYPE
is not supported by the implementation, no validation of
the block payload is performed and processing continues
at (5). Else, the block MUST be
validated as defined in (3) and (4).¶
BTYPE
. First, the client
attempts to derive the key using the respective
DeriveBlockKey
procedure as described in Section 8.1. If a key can be derived and
does not match, the message MUST be
discarded.¶
ValidateBlockStoreRequest
procedure
for the BTYPE
as described in Section 8.1 is used to validate the block
payload. If the block payload is invalid, the message
MUST be discarded.¶
P
SHOULD be in PEER_BF
. If not,
the implementation MAY log an error, but
MUST continue.¶
RecordRoute
flag is not set, the
PATH_LEN
MUST be set to zero.
If the flag is set and PATH_LEN
is non-zero,
the local peer SHOULD verify the
signatures from the PUTPATH
. Verification
MAY involve checking all signatures or
any random subset of the signatures. It is
RECOMMENDED that peers adapt their
behavior to available computational resources so as to
not make signature verification a bottleneck. If an
invalid signature is found, the PUTPATH
MUST be truncated to only include the
elements following the invalid signature.¶
IsClosestPeer(SELF, BLOCK_KEY,
PeerFilter)
) or the DemultiplexEverywhere
flag ist set, the message SHOULD be
stored locally in the block storage if possible. The
implementation MAY
choose not store the block
if external factors or configurations prevent this, such
as limited (allotted) disk space.¶
BTYPE
of the message indicates a
HELLO
block, the peer MUST be
considered for the local routing table by using the peer
identity in BLOCK_KEY
. If neither the peer is
already connected nor the respective k-bucket is already
full, then the peer MUST try to establish
a connection to the peer indicated in the HELLO
block using the address information from the
HELLO
block and the underlay's
TRY_CONNECT()
function. The implementation
MUST instruct the underlay to try to
connect to all provided addresses using
TRY_CONNECT()
in order to make the underlay
aware of multiple addresses for this connection. When a
connection can be established, the underlay's
PEER_CONNECTED
signal will cause the peer to be
added to the respective k-bucket of the routing table
(Section 6).¶
REPL_LVL
, HOPCOUNT
and FALSE = IsClosestPeer(SELF, BLOCK_KEY,
PeerFilter)
the number of peers to forward to
MUST be calculated using
ComputeOutDegree()
. The implementation
SHOULD select this number of peers
to forward the message to using the function
SelectPeer()
(Section 6.4) using the
BLOCK_KEY
, HOPCOUNT
, and utilizing
PEER_BF
as peer Bloom filter. For each
selected peer PEER_BF
is updated with that peer
in between calls to SelectPeer()
. The
implementation MAY forward to fewer or no
peers in order to handle resource constraints such as
limited bandwidth or simply because there are not enough
suitable connected peers. For each selected peer with
peer identity P
a dedicated
PutMessage_P
is created containing the original
(and where applicable already updated) fields of the
received PutMessage
. In each message
all selected peer identities and the local peer
identity MUST be added to the
PEER_BF
and the HOPCOUNT
MUST be incremented by one. If the
RecordRoute
flag is set, a new path element is
created using the predecessor peer public key and the
signature of the current peer. The path element is
added to the PUTPATH
fields and the
PATH_LEN
field is incremented by one. When
creating the path element signature, the successor must
be set to the recipient peer P
of the
PutMessage_P
. The successor in the new path
element is the recipient peer P
. If the path
becomes too long for the resulting message to be
transmitted by the underlay, it MUST be
truncated. Finally, the messages are sent using
SEND(P, PutMessage_P)
to each recipient.¶
GetMessages
are used to request information from
other peers in the DHT. An application-level API which
allows applications to initiate GetMessages
needs
to provide sufficient, implementation-specific information
needed to construct the initial GetMessage
. For
example, implementations supporting multiple applications
and blocks will need to be given the block type, message
FLAG
parameters and possibly an XQUERY
in
addition to just the QUERY_HASH
. In some cases, it
might also be useful to enable the application to assist in
the construction of the RESULT_FILTER
such that it
can filter already known results. Note that the
RESULT_FILTER
may need to be re-constructed every
time the query is retransmitted by the initiator (details
depending on the BTYPE
) and thus a
RESULT_FILTER
can often not be passed directly as
an argument by the application to an application API.
Instead, applications would typically provide the set of
results to be filtered, allowing the DHT to construct the
RESULT_FILTER
whenever it retransmits a
GetMessage
request as initiator.¶
where:¶
RESULT_FILTER
. Set by the
initiator. Read-only.¶
RF_SIZE
bytes as described in Section 7.4.2.
Set by the initiator. Modified by processing peers
based on results returned.¶
MSIZE
.¶
A result filter is used to indicate to other peers which
results are not of interest when processing a
GetMessage
(Section 7.4). Any peer
which is processing GetMessages
and has a result
which matches the query key MUST check the
result filter and only send a reply message if the result
does not test positive under the result filter. Before
forwarding the GetMessage
, the result filter
MUST be updated using the result of the
BTYPE
-specific FilterResult
(see Section 8.1) function to filter out all
results already returned by the local peer.¶
How a result filter is implemented depends on the block type as described in Section 8.1. Result filters may be probabilistic data structures. Thus, it is possible that a desireable result is filtered by a result filter because of a false-positive test.¶
How exactly a block result is added to a result filter is specified as part of the definition of a block type (cf. Section 8.2).¶
Upon receiving a GetMessage
from a peer P
, or
created through initiation by the overlay API, an
implementation MUST process it step by step as follows:¶
BTYPE
is supported, the
QUERY_HASH
and XQUERY
fields are
validated as defined by the respective
ValidateBlockQuery()
procedure for this type. If
the result yields REQUEST_INVALID
, the message
MUST be discarded and processing ends.
If the BTYPE
is not supported, the message
MUST be forwarded (Skip to step 4). If
the BTYPE
is ANY
, the message is
processed further without validation.¶
P
SHOULD be in the PEER_BF
peer
Bloom filter. If not, the implementation
MAY log an error, but MUST
continue.¶
The local peer SHOULD try to produce a
reply in any of the following cases: (1) If the local
peer is the closest peer (cf. IsClosestPeer(SELF,
QueryHash, PeerFilter)
, or (2) if the
DemultiplexEverywhere
flag is set, or (3) if
the local peer is not the closest and a previously
cached ResultMessage
also matches this
request (Section 7.5.2).¶
The reply is produced (if one is available) using the following steps:¶
BTYPE
is HELLO
, the
implementation MUST only consider
synthesizing its own addresses and the addresses it
has cached for the peers in its routing table as
HELLO
block replies. Otherwise, if the
BTYPE
does not indicate a request for a
HELLO
block or ANY
, the
implementation MUST only consider
blocks in the local block storage and previously
cached ResultMessages
.¶
FLAGS
field includes the flag
FindApproximate
, the peer
SHOULD respond with the closest block
(smallest value of GetDistance(QUERY_HASH,
BLOCK_KEY)
) it can find that is not filtered by
the RESULT_BF
. Otherwise, the peer
MUST respond with the block with a
BLOCK_KEY
that matches the
QUERY_HASH
exactly and that is not filtered
by the RESULT_BF
.¶
ResultMessage
. The
ResultMessage
SHOULD be
transmitted to the neighbor from which the request
was received.¶
Implementations MAY not reply if they
are resource-constrained. However,
ResultMessages
MUST be given
the highest priority among competing transmissions.¶
If the BTYPE
is supported and
ValidateBlockReply
for the given query has
yielded a status of FILTER_LAST
, processing
MUST end and not continue with
forwarding of the request to other peers.¶
MUST
create (or merge) an
entry in the pending table (see Section 6.5) for the query represented by
this GetMessage
. The pending table
MUST store the last MAX_RECENT
requests, and peers thus MUST discard the
oldest existing request if memory constraints on the
pending table are encountered. Note that peers
MUST clean up state for queries that had
response with a status of FILTER_LAST
even if
they are not the oldest query in the pending table.¶
REPL_LVL
, the number of
peers to forward to MUST be calculated
using ComputeOutDegree()
. If there is at least
one peer to forward to, the implementation
SHOULD select up to this number of peers
to forward the message to. Furthermore, the
implementation SHOULD select up to this
number of peers to using the function
SelectPeer()
(from Section 6.4) using the
QUERY_HASH
, HOPCOUNT
, and the
PEER_BF
. The implementation MAY
forward to fewer or no peers in order to handle resource
constraints such as bandwidth. Before forwarding, the
peer Bloom filter PEER_BF
MUST
be updated to filter all selected peers and the local
peer identity SELF
. For all peers with peer
identity P
chosen to forward the message to,
SEND(P, GetMessage_P)
is called. Here,
GetMessage_P
is the original message with the
updated fields for HOPCOUNT
(incremented by
one), updated PEER_BF
and updated
RESULT_FILTER
(based on results already
returned).¶
ResultMessages
are used to return information to
other peers in the DHT or to applications using the overlay
API that previously initiated a GetMessage
. The
initiator of a ResultMessage
is a peer triggered
through the processing of a GetMessage
.¶
where:¶
PutMessage
, possibly setting the
Truncated
bit if the initiator is forced to
truncate the path. For HELLO
blocks, the
FLAGS
should simply be cleared.¶
PUTPATH
. As
PUTPATH
is optional, this value may be zero
even if the message has traversed several peers. Set by
the initiator to the PATH_LEN
of the
PutMessage
from which the block originated.
Modified by processing peers in case of path truncation.¶
GETPATH
. As
GETPATH
is optional, this value may be zero
even if the message has traversed several peers.
MUST be set to zero by the initiator.
Modified by processing peers to match the path length in
the message.¶
PutMessage
from which the block originated.
Read-only.¶
GetMessage
which caused this reply message to be sent. Set by the
initiator using the value of the GetMessage
.
Read-only.¶
Truncated
flag is set
in FLAGS
. If present, this is the public key of
the peer just before the first entry on the
PUTPATH
and the first peer on the
PUTPATH
is not the actual origin of the
message. Thus, to verify the first signature on the
PUTPATH
, this public key must be used. Note
that due to the truncation, this last hop cannot be
verified to exist. Set by processing peers.¶
PUTPATH_L
path elements. Set by the
initiator to the the PUTPATH
of the
PutMessage
from which the block originated.
Modified by processing peers in case of path truncation.¶
GETPATH_L
path elements. Set by
processing peers.¶
RecordRoute
flag is set
in FLAGS
. If present, this is an EdDSA
signature [ed25519] by the sender of this
message (using the same format as the signatures in
PUTPATH
) affirming that the sender forwarded
the message from the predecessor (all zeros if
PATH_LEN
is zero, otherwise the last peer in
PUTPATH
) to the target peer.¶
Upon receiving a ResultMessage
from a connected
peer or triggered by the processing of a
GetMessage
, an implementation MUST
process it step by step as follows:¶
EXPIRATION
field is evaluated. If
the message is expired, it MUST be
discarded.¶
BTYPE
is supported, then the
BLOCK
MUST be validated against
the requested BTYPE
. To do this, the peer
checks that the block is valid using
ValidateBlockStoreRequest
. If the result is
BLOCK_INVALID
, the message MUST
be discarded.¶
PUTPATH_L
or the GETPATH_L
are
non-zero, the local peer SHOULD verify
the signatures from the PUTPATH
and the
GETPATH
. Verification MAY
involve checking all signatures or any random subset of
the signatures. It is RECOMMENDED that
peers adapt their behavior to available computational
resources so as to not make signature verification a
bottleneck. If an invalid signature is found, the path
MUST be truncated to only include the
elements following the invalid signature. In
particular, any invalid signature on the
GETPATH
will cause PUTPATH_L
to be set
to zero.¶
DeriveBlockKey
. This may result in
NONE
. The result is used later. Note that
even if a key was computed, it does not have to match
the QUERY_HASH
.¶
BTYPE
of the message indicates a
HELLO
block, the peer SHOULD be
considered for the local routing table by using the peer
identity computed from the block using
DeriveBlockKey
. An implementation
MAY choose to ignore the HELLO
,
for example because the routing table or the respective
k-bucket is already full. If the peer is a suitable
candidate for insertion, the local peer
MUST try to establish a connection to the
peer indicated in the HELLO
block using the
address information from the HELLO
block and
the underlay's TRY_CONNECT()
function. The
implementation MUST instruct the underlay
to connect to all provided addresses using
TRY_CONNECT()
in order to make the underlay
aware of multiple addresses for this connection. When a
connection is established, the signal
PEER_CONNECTED
will cause the peer to be added
to the respective k-bucket of the routing table (see
Section 6).¶
QUERY_HASH
of this
ResultMessage
does not match an entry in the
pending table (Section 6.5), then the
message is discarded and processing ends. Otherwise,
processing continues for each entry in the table as
follows.¶
FindApproximate
flag was not set in
the query and the BTYPE
enabled the
implementation to compute the key from the block,
the computed key must exactly match the
QUERY_HASH
, otherwise the result does not
match the pending query and processing continues
with the next pending table entry.¶
BTYPE
is supported, result block
MUST be validated against the
specific query using the respective
FilterBlockResult
function. This function
MUST update the result filter if a
result is returned to the originator of the query.¶
BTYPE
is not supported, filtering of
exact duplicate replies MUST still be
performed before forwarding the reply. Such
duplicate filtering MAY be
implemented probabilistically, for example using a
Bloom filter. The result of this duplicate
filtering is always either FILTER_MORE
or
FILTER_DUPLICATE
.¶
RecordRoute
flag is set in
FLAGS
, the local peer identity
MUST be appended to the
GETPATH
of the message and the respective
signature MUST be set using the query
origin as the PEER SUCCESSOR
and the
response origin as the PEER PREDECESSOR
.
If the flag is not set, the GETPATH_L
and
PUTPATH_L
MUST be set to
zero when forwarding the result.¶
FILTER_MORE
or FILTER_LAST
, the
message is forwarded to the origin of the query as
defined in the entry which may either be the local
peer or a remote peer. In case this is a query of
the local peer the result may have to be provided to
applications through the overlay API. Otherwise,
the result is forwarded using SEND(P,
ResultMessage')
where ResultMessage'
is the now modified message. If the result was
FILTER_LAST
, the query MUST
be removed from the pending table.¶
ResultMessages
in order to provide already seen
replies to future GetMessages
. The
implementation MAY choose not no cache
any or a limited number of ResultMessages
for
reasons such as resource limitations.¶
This section describes various considerations R5N implementations must consider with respect to blocks. Specifically, implementations SHOULD be validate and persist blocks. Implementations MAY not support validation for all types of blocks. For example, on some devices, storing blocks is impossible due to lack of storage capacity. Block storage improves lookup performance for local applications and also other peers. Not storing blocks results in degraded performance.¶
The block type determines the format and handling of the block
payload by peers in PutMessages
and
ResultMessages
. Applications can and should define
their own block types. Block types MUST be
registered with GANA (see Section 11.1).
Especially when new block types are introduced, some peers
MAY lack support for the respective block
operations.¶
Block validation operations are used as part of message processing (see Section 7) for all types of DHT messages. To enable these validation operations, any block type specification MUST define the following functions:¶
is used to evaluate the request for a block as part of
GetMessage
processing. Here, the block payload is unkown,
but if possible the XQuery
and Key
SHOULD be verified. Possible values for
the RequestEvaluationResult
are:¶
PutMessage
and
ResultMessage
processing. The special return
value of NONE
implies that this block type does
not permit deriving the key from the block. A
Key
may be returned for a block that is
ill-formed.¶
is used to evaluate a block payload as part of
PutMessage
and ResultMessage
processing. Possible values for the
BlockEvaluationResult
are:¶
Mutator
value which MAY be used to
deterministically re-randomize probabilistic data
structures. RF
MUST be a byte
sequence suitable for transmission over the network.¶
is used to filter results against specific queries.
This function does not check the validity of
Block
itself or that it matches the given key,
as this must have been checked earlier. Locally
stored blocks from previously observed
ResultMessages
and PutMessages
use
this function to perform filtering based on the request
parameters of a particular GET operation. Possible
values for the FilterEvaluationResult
are:¶
Block
is a valid result, and there may be more.¶
Block
is the last possible valid result.¶
Block
is a valid result, but considered to be
a duplicate (was filtered by the RF
) and
SHOULD NOT be returned to the previous
hop. Peers that do not understand the block type
MAY return such duplicate results
anyway and implementations must take this into account.¶
Block
does not satisfy the constraints
imposed by the XQuery
. The result
SHOULD NOT be returned to the previous
hop. Peers that do not understand the block type
MAY return such irrelevant results
anyway and implementations must take this into account.¶
If the main evaluation result is FILTER_MORE
,
the function also returns an updated result filter
where the Block
is added to the set of
filtered replies. An implementation is not expected
to actually differenciate between the
FILTER_DUPLICATE
and
FILTER_IRRELEVANT
return values: in both
cases the Block
is ignored for this query.¶
For bootstrapping and peer discovery, the DHT
implementation uses its own block type called "HELLO".
HELLO
blocks are the only type of block that
MUST be supported by every R5N
implementation. A block with this block type contains the
peer public key of the peer that published the
HELLO
together with a set of addresses of this
peer. The key of a HELLO
block is the SHA-512
hash [RFC4634] of the peer public key and
thus the peer's identity in the DHT.¶
The HELLO
block type wire format is illustrated
in Figure 13. A query for block of
type HELLO
MUST NOT include
extended query data (XQuery). Any implementation
encountering a request for a HELLO
with non-empty
XQuery data MUST consider the request
invalid and ignore it.¶
ADDRESSES
belong. This is also the public key
needed to verify the SIGNATURE
.¶
HELLO
URL).¶
is the EdDSA signature [ed25519] of the
HELLO
block. The signature covers various
information derived from the actual data in the
HELLO
block. The data signed over includes the
block expiration time, a constant that uniquely
identifies the purpose of the signature, and a hash of
the addresses with 0-terminators in the same order as
they are present in the HELLO
block. The
format is illustrated in Figure 14.¶
HELLO
in microseconds since midnight (0 hour),
January 1, 1970 in network byte order.¶
HELLO
.
H_ADDRS
is generated over the
ADDRESSES
field as provided in the
HELLO
block using SHA-512 [RFC4634].¶
The HELLO
block operations MUST be
implemented as follows:¶
HELLO
is to
simply check that the XQuery
is empty. If it is empty,
REQUEST_VALID
ist returned. Otherwise,
REQUEST_INVALID
is returned.¶
HELLO
is to simply
hash the PEER PUBLIC KEY
from the HELLO
. The
result of this function is thus always the SHA-512 hash over
the PEER PUBLIC KEY
.¶
SIGNATURE
over the hashed ADDRESSES
against the public key from the PEER PUBLIC KEY
field. If the signature is valid BLOCK_VALID
is
returned. Otherwise BLOCK_INVALID
is returned.¶
The RF
for HELLO
blocks is implemented
using a Bloom filter following the definition from Appendix A and consists of a variable number
of bits L
. L
depends on the
FilterSize
which will be the number of
connected peers |E|
known to the peer creating a
HELLO
block from its own addresses: L
is
set to the minimum of 218 bits (215
bytes) and the lowest power of 2 that is strictly larger
than 2*K*|E|
bits (K*|E|/4
bytes).¶
The k
-value for the Bloom filter
MUST be 16. The elements used in the Bloom
filter consist of an XOR between the H_ADDRS
field (as computed using SHA-512 over the
ADDRESSES
) and the SHA-512 hash of the
MUTATOR
field from a given HELLO
block.
The mapping function M(H_ADDRS XOR MUTATOR
) is
defined as follows:¶
M(e = H_ADDR XOR MUTATOR) -> e as uint32[]
¶
M
is an identity function and returns the 512-bit
XOR result unmodified. This resulting byte string is
interpreted as k=16 32-bit integers in network byte order
which are used to set and check the bits in B
using BF-SET()
and BF-TEST()
. The
32-bit MUTATOR
is prepended to the L-bit Bloom filter
field HELLO_BF
containing B
to create
the result filter for a HELLO
block:¶
where:¶
The MUTATOR
value is used to additionally
"randomize" the computation of the Bloom filter while
remaining deterministic across peers. It is only ever set
by the peer initiating the GET request, and changed every
time the GET request is repeated. Peers forwarding GET
requests MUST not change the mutator value
included in the RESULT_FILTER
as they might not
be able to recalculate the result filter with a different
MUTATOR
value.¶
Consequently, repeated requests have statistically independent probabilities of creating false-positives in a result filter. Thus, even if for one request a result filter may exclude a result as a false-positive match, subsequent requests are likely to not have the same false-positives.¶
HELLO
result filters can be merged if the Bloom
filters have the same size and MUTATOR
by setting
all bits to 1 that are set in either Bloom filter. This
is done whenever a peer receives a query with the same
MUTATOR
, predecessor and Bloom filter size.¶
H_ADDRS
field is XORed with the SHA-512 hash
of the MUTATOR
field from the HELLO
block and the resulting value is checked against the
Bloom filter in RF
. Consequently,
HELLOs
with completely identical sets of
addresses will be filtered and FILTER_DUPLICATE
is returned. Any small variation in the set of addresses
will cause the block to no longer be filtered (with high
probability) and FILTER_MORE
is returned.¶
An implementation SHOULD provide a local persistence mechanism for blocks. Embedded systems that lack storage capability MAY use connection-level signalling to indicate that they are merely a client utilizing a DHT and are not able to participate with storage. The local storage MUST provide the following functionality:¶
PUTPATH
(and if applicable
TRUNCATED ORIGIN
) of that version of the block.¶
Over time a peer may accumulate a significant number of blocks
which are stored locally in the persistence layer.
Due to the expected high number of blocks, the method to
retrieve blocks close to the specified lookup key in the
LookupApproximate
API must be implemented with care
with respect to efficiency.¶
It is RECOMMENDED to limit the number of
results from the LookupApproximate
procedure to a
result size which is easily manageable by the local system.
The RECOMMENDED default is to return blocks
with the four closest keys. Note that filtering by the
RF
will be done by the DHT afterwards and it is
NOT RECOMMENDED to fetch additional records
even if all four closest keys are filtered by the
RF
. The main reason for this is to ensure peers do
not spend extensive resources to process approximate
lookups. In particular, implementations MUST
limit the worst-case effort they spent on approximate
lookups.¶
In order to efficiently find a suitable result set, the implementation SHOULD follow the following procedure:¶
An implementation MAY decide to use a custom algorithm in order to find the closest blocks in the local storage. But, especially for more primitive approaches (such as only comparing XOR distances for all blocks in the storage), more simplistic procedures may become ineffective for large data sets and care MUST be taken to strictly bound the maximum effort expended per query.¶
An implementation MUST implement an eviction strategy for blocks stored in the block storage layer.¶
In order to ensure the freshness of blocks, an implementation MUST evict expired blocks in favor of new blocks.¶
An implementation MAY preserve blocks which are often requested. This approach can be expensive as it requires the implementation to keep track of how often a block is requested.¶
An implementation MAY preserve blocks which are close to the local peer public key.¶
An implementation MAY provide configurable storage quotas and adapt its eviction strategy based on the current storage size or other constrained resources.¶
If an upper bound to the maximum number of neighbors in a k-bucket is reached, the implementation MUST prefer to preserve the oldest working connections instead of new connections. This makes Sybil attacks [Sybil] less effective as an adversary would have to invest more resources over time to mount an effective attack.¶
The ComputeOutDegree
function limits the
REPL_LVL
to a maximum of 16. This imposes
an upper limit on bandwidth amplification an attacker
may achieve for a given network size and topology.¶
We note that peers implementing disjoint sets of underlay protocols may experience difficulties communicating (unless other peers bridge the respective underlays). Similarly, peers that do not support a particular application will not be able to validate application-specific payloads and may thus be tricked into storing or forwarding corrupt blocks.¶
When a FindApproximate
flag is encountered in a
query, a peer will try to respond with the closest block it
has that is not filtered by the result Bloom filter
(RF
). Implementations MUST ensure
that the cost of evaluating any such query is reasonably
small. For example, implementations SHOULD
consider ways to avoid an exhaustive search of their
database. Not doing so can lead to denial of service
attacks as there could be cases where too many local results
are filtered by the result filter.¶
By design R5N does not rely on strict admission control through the use of either centralized enrollment servers or pre-shared keys. This is a key distintion over protocols that do rely on this kind of access control such as [RFC6940] which, like R5N, provides a peer-to-peer (P2P) signaling protocol with extensible routing and topology mechanisms. Some decentralized applications, such as the GNU Name System ([RFC9498]), require an open system that enables ad-hoc participation.¶
Applications using the DHT APIs to store application-specific block types may have varying security and privacy requirements. R5N does NOT provide any kind confidentiality or privacy for blocks, for example through the use of cryptography. This must be provided by the application as part of the block type design. One example where confidentiality and privacy are required are GNS records and their record blocks as defined in [RFC9498]. Other possibilities to protect the block objects may be implemented using ideas from other efforts such as Oblivious HTTP and its encapsulation of HTTP requests and responses [RFC9458].¶
R5N makes heavy use of the Ed25519 cryptosystem.
It cannot be ruled out that the relevant primitives
are broken at any point in the future.
In this case, the R5N design can be reused by modifying
the messages and related artifacts defined in
Section 7.1.
In order to extend and modify the R5N protocol
in general and to replace cryptographic in particular,
new message types (MTYPE
fields) can be registered in
[GANA] and the message formats updated accordingly.
Peers processing messages MUST NOT
modify the MTYPE
field in order to prevent
possible security downgrades.¶
IANA maintains a registry called the "Uniform Resource Identifier (URI) Schemes" registry. The registry should be updated to include an entry for the 'gnunet' URI scheme. IANA is requested to update that entry to reference this document when published as an RFC.¶
GANA [GANA] is requested to create a "DHT Block Types" registry. The registry shall record for each entry:¶
The registration policy for this sub-registry is "First Come First Served", as described in [RFC8126]. GANA created the registry as follows:¶
GANA [GANA] is requested to create a "gnunet://" sub-registry. The registry shall record for each entry:¶
The registration policy for this sub-registry is "First Come First Served", as described in [RFC8126]. GANA created this registry as follows:¶
GANA amended the "GNUnet Signature Purpose" registry as follows:¶
GANA is requested to amend the "GNUnet Message Type" registry as follows:¶
R5N uses Bloom filters in several places. This section gives some general background on Bloom filters and defines functions on this data structure shared by the various use-cases in R5N.¶
A Bloom filter (BF) is a space-efficient probabilistic datastructure to test if an element is part of a set of elements. Elements are identified by an element ID. Since a BF is a probabilistic datastructure, it is possible to have false-positives: when asked if an element is in the set, the answer from a BF is either "no" or "maybe".¶
Bloom filters are defined as a string of L
bits.
The bits are initially always empty, meaning that the bits are set to
zero.
There are two functions which can be invoked on the Bloom filter "bf":
BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element that is to
be added to the Bloom filter or queried against the set.¶
A mapping function M is used to map each ID of each element from the set to a subset of k bits. In the original proposal by Bloom, M is non-injective and can thus map the same element multiple times to the same bit. The type of the mapping function can thus be described by the following mathematical notation:¶
When adding an element to the Bloom filter bf
using
BF-SET(bf, e)
, each integer n
of the mapping
M(e)
is interpreted as a bit offset n mod L
within
bf
and set to 1.¶
When testing if an element may be in the Bloom filter bf
using
BF-TEST(bf,e)
, each bit offset n mod L
within
bf
MUST have been set to 1.
Otherwise, the element is not considered to be in the Bloom filter.¶
An implementation of this specification commonly exposes the two overlay operations "GET" and "PUT". The following are non-normative examples of APIs for those operations. Their behaviour is described prosaically in order to give implementers a fuller picture of the protocol.¶
A basic GET operation interface may be exposed as:¶
GET(Query-Key, Block-Type) -> Results as List
¶
The procedure typically takes at least two arguments to initiate a lookup:¶
QueryKey
:The GET procedure may allow additional optional parameters in order to control or modify the query:¶
Block-Type
.
A Block-Type
must define if the XQuery
can or must
be used and what the specific format of its contents should be.
Extended queries are in general used to implement domain-specific filters.
These might be particularly useful in combination with FindApproximate
to add a well-defined filter by an application-specific distance.
Regardless, the DHT does not define any particular semantics for an XQuery.
See also Section 8.¶
Block-type
-specific filter
which allows applications to
indicate results which are
not relevant anymore to the
caller (see Section 7.4.2).¶
The GET procedure should be implemented as an asynchronous operation that returns individual results as they are found in the DHT. It should terminate only once the application explicitly cancels the operation. A single result commonly consists of:¶
Block-Type
.¶
FindApproximate
was set.¶
RecordRoute
flag
was set by the application calling the PUT procedure. The
reported path may have been truncated from the beginning.
The API SHOULD signal truncation by exposing
the Truncated
flag.¶
RecordRoute
flag was set
for the GET procedure. The reported path may have been
truncated from the beginning. The API
SHOULD signal truncation by exposing the
Truncated
flag. As the block was cached by the
node at the end of this path, this path is more likely to
be stale compared to the GET-Path
.¶
GET-Path
or PUT-Path
was truncated, otherwise false.¶
A PUT operation interface may be exposed as:¶
PUT(Key, Block-Type, Block-Expiration, Block-Data)
¶
The procedure typically takes at least four parameters:¶
The PUT procedure may accept additional optional parameters that control or modify the operation:¶
The PUT procedure does not necessarily yield any information.¶
The general format of a HELLO
URL uses "gnunet://"
as the scheme, followed by "hello/" for the name
of the GNUnet subsystem, followed by "/"-separated values
with the GNS Base32 encoding [RFC9498] of
the peer public key, a Base32-encoded EdDSA signature
[ed25519], and an expiration
time in seconds since the UNIX Epoch in decimal format.
After this a "?" begins a list of key-value pairs where the key
is the URI scheme of one of the peer's addresses and the value
is the URL-escaped payload of the address URI without the "://".¶
The general syntax of HELLO
URLs specified using
Augmented Backus-Naur Form (ABNF) of [RFC5234] is:¶
'scheme' is defined in [RFC3986] in Section 3.1. 'pchar' is defined in [RFC3986], Appendix A.¶
For example, consider the following URL:¶
It specifies that the peer with the pid
"1MVZ..."
is reachable via "foo" at "example.com" and "bar+baz" at
"1.2.3.4" on port 5678 until
1708333757 seconds after the Epoch. Note that "foo"
and "bar+baz" here are underspecified and just used as a simple example.
In practice, the addr-name
refers to a scheme supported by a
DHT underlay.¶