Internet-Draft mojette-encoding October 2024
Haynes & Evenou Expires 18 April 2025 [Page]
Workgroup:
Network File System Version 4
Internet-Draft:
draft-haynes-nfsv4-mojette-encoding-00
Published:
Intended Status:
Standards Track
Expires:
Authors:
T. Haynes
Hammerspace
P. Evenou
Hammerspace

The Mojette Transformation for the Erasure Encoding of Files in NFSv4.2

Abstract

Parallel NFS (pNFS) allows a separation between the metadata (onto a metadata server) and data (onto a storage device) for a file. The flex file layout type version 2 further allows for erasure encoding types to provide data integrity. In this document, a new erasure encoding type for the Mojette Transformation is introduced.

This note is to be removed before publishing as an RFC.

Discussion of this draft takes place on the NFSv4 working group mailing list (nfsv4@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group information can be found at https://datatracker.ietf.org/wg/nfsv4/about/.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 18 April 2025.

Table of Contents

1. Introduction

In Parallel NFS (pNFS) (see Section 12 of [RFC8881]), the metadata server returns layout type structures that describe where file data is located. There are different layout types for different storage systems and methods of arranging data on storage devices. [RFCTBD09] defined the Flexible File Version 2 Layout Type used with file-based data servers that are accessed using the NFS protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC8881], and NFSv4.2 [RFC7862]. This document introduces a new Erasure Encoding Type called Mojette Transformation.

Using the process detailed in [RFC8178], the revisions in this document become an extension of NFSv4.2 [RFC7862]. They are built on top of the external data representation (XDR) [RFC4506] generated from [RFC7863].

1.1. Notations

x_i:
The ith value of x.
sum_{x=0}^{10} f(x_i):
The sum of the function f applied to the values of x ranging from 0 to 10. I.e, The first '{}' describes the lower range of x and the second '{}' describes the upper range of x.

1.2. Requirements Language

The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Mojette Transform

2.1. Introduction

The Mojette Transform is an erasure coding technique that provides fault tolerance for data storage systems by enabling the recovery of lost data blocks. This section describes the integration of the systematic Mojette Transform into the NFS protocol, focusing on encoding and decoding file system blocks, typically sized at 4KB or 8KB.

2.1.1. Encoding

The Mojette Transform involves the following steps to encode a data block:

Initialization:
Each data block is treated as a 2D grid of data elements (pixels). Typically, a block is structured as a matrix of size PxQ, where P and Q are the dimensions of the grid.
Projections Calculation:
Projections are computed along specific directions defined by pairs of coprime integers (p_i, q_i). Each projection sums the values of the data elements (pixels) along a line defined by these directions. The size of a projection is given as in Figure 1. For a given projection direction (p_i, q_i), the projection values are calculated as in Figure 2, where Delta is 1 if the argument is zero and 0 otherwise.
Size of projection = (P - 1) x |q_i| + (Q - 1) * |p_i| + 1
Figure 1
Projection(b, p_i, q_i) =
    sum_{k=0}^{Q-1} sum_{l=0}^{P-1}
            Data(k, l) x Delta(b - l x p_i + k x q_i)
Figure 2

2.1.2. Decoding

Decoding the Mojette Transform is the inverse of encoding, involving the reconstruction of data from projection data to fill an empty grid. This involves solving a system of linear equations defined by the projection differences and the projection directions (p_i, q_i). The algorithm iterates to refine the values of the missing data elements until the original data block is reconstructed.

Data reconstruction is possible if Katz's criterion holds, which was extended to any convex shape. The Katz's criterion specifies that reconstruction is valid if for a given set of n projections along n directions (p_i, q_i) either sum_{i=0}^{n} q_i ≥ Q or sum_{i=0}^{n} p_i ≥ P.

Adjusting the number of lines Q and the projections set allows setting a desired fault-tolerance threshold.

For example a 64x4 grid can be decoded by the projection set {(0, 1), (1, 1), (2, 1), (3, 1)} as sum_{i=1}^{4} q_i = 4.

2.1.3. Systematic and Non Systematic Implementations

A systematic code is an error-correcting code where the input data is embedded directly in the encoded output. In contrast, a non-systematic code produces an output that does not contain the original input symbols. The Mojette Transform can be implemented in both ways, allowing it to adapt to various use cases.

2.1.4. Data Block Representation

In the context of NFS, a data block corresponds to a file system block, which is a contiguous segment of data, typically 4KB or 8KB in size. The Mojette Transform encodes these blocks to ensure data integrity and availability in distributed storage environments.

2.2. Non-Systematic Mojette Transform

2.2.1. Block Encoding

In the non-systematic version of the Mojette Transform, the original data block is not directly included in the encoded output. Instead, the entire encoded output consists of projections computed from the original data. The number of computed projections n is larger than the number of projections m required to rebuild the initial data.

2.2.2. Block Decoding

To decode a file system block that has undergone the non-systematic Mojette Transform, the following steps are followed:

Identify Available Projections:
Determine which projections are available. A least m projections (what ever they are) out of n must be available.
Recompute Data Block:
Apply the inverse Mojette Transform to rebuild the original Data.

2.2.3. Example

Assume a file system block of 4KB is divided into a 128x4 matrix of 128-bit elements. Using the non-systematic Mojette Transform, we compute projections along selected directions, such as (-2,1), (-1, -1), (0,1), (1,1), (2,1) and (3,1). The original data is not stored directly; instead, the projections are stored.

If a data loss occurs, for instance, if two projections are lost, the missing elements can be recovered by using the remaining projections and solving the inverse problem. Any set of 4 projections among the 6 generated can rebuild exactly the original data block

2.3. Systematic Mojette Transform

2.3.1. Block Encoding

In the systematic version, the original data block (file system block) is part of the encoded output. Additional projections are calculated to provide redundancy. If k is the number of original data blocks and n is the total number of encoded blocks (including projections), the systematic code will have the first k blocks as the original data and the remaining n - k blocks as projections.

2.3.2. Block Decoding

To decode a file system block that has undergone the Systematic Mojette Transform, the following steps are followed:

Identify Missing Data:
Determine which data lines are missing in the block. Let e be the number of missing lines.
Recompute Projections:
Compute the projections of the available (partial) data blocks. Calculate the differences between the projections of the full data and the partial data.
Combine Existing Data and Recomputed projection:
Recreate the full block of data by combining the existing data with the reconstructed data according to their positions in the block.

2.3.3. Example

Assume a file system block of 4KB is divided into a $64\times4$ matrix of 128-bit elements. Using the systematic Mojette Transform, we first compute projections along selected directions, such as (0,1), and (1,1). The original 4 blocks of 64 128-bit elements remains part of the encoded data, and the 2 additional projections are stored for redundancy. If a data loss occurs, the missing elements can be recovered by using the projections and solving the inverse problem.

2.4. Conclusion

The Mojette Transform provides two implementations for an efficient and effective way to enhance data reliability by encoding file system blocks with additional projections. These methods ensure that data can be reconstructed even in the presence of failures, thereby enhancing the fault tolerance of the file system.

In summary, the Mojette Transform offers robust solutions for data reliability in file systems, balancing redundancy, efficiency, and performance to ensure data integrity and quick recovery from failures.

2.4.1. Benefits of Non-Systematic Mojette Transform:

Fast Failure Detection:

By storing only encoded data and having the ability to rebuild the original block from any sufficient subset of projections, the non-systematic encoding implementation offers great flexibility and allows fast failure detection and recovery.

Constant Performance:

The non-systematic decoding algorithm is the most performant of all implementations. Although decoding always occurs, the overhead is low, and unlike systematic encoding, performance remains constant regardless of the number of failures.

2.4.2. Benefits of Systematic Mojette Transform:

Redundancy Reduction:

Systematic Mojette coding reduces redundancy by integrating the original data blocks into the encoded data, unlike non-systematic codes that generate entirely new data from the original.

Efficiency:

Fewer projections need to be calculated and stored, reducing both computational and storage overhead.

Performance:

Decoding is faster and simpler, especially when some original data blocks are available, enabling quicker data access. However, performance is slightly degraded in the case of failures and depends on the number of failures.

3. Extension of Flexible File Layout Type Version 2 Encoding Type

3.1. ffv2_encoding_type

   /// enum ffv2_encoding_type {
   ///     FFV2_ENCODING_MIRRORED               = 0x1;
   ///     FFV2_ENCODING_MOJETTE_SYSTEMATIC     = 0x2;
   ///     FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC = 0x3;
   /// };
Figure 3

ffv2_encoding_type is extended in Figure 3 to introduce two different erasure encoding types: FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC. They are intoduced at this level instead of at the ffv2_encoding_data (see Figure 5) in order for the client to negotiate the support of one over the other with the NFS4ERR_ERASURE_ENCODING_NOT_SUPPORTED error (see Section ZZZ of [RFCTBD09]).

3.2. ffv2_mojette_faulty_devices

/// enum ffv2_mojette_faulty_devices {
///         FFV2_MOJETTE_FAULTY_DEVICES_2_1       = 0x1;
///         FFV2_MOJETTE_FAULTY_DEVICES_4_1       = 0x2;
///         FFV2_MOJETTE_FAULTY_DEVICES_4_2       = 0x3;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_1       = 0x4;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_2       = 0x5;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_3       = 0x6;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_4       = 0x7;
/// };
Figure 4

The ffv2_mojette_faulty_devices (see Figure 4) can be used in both the layout_hint (see both Section 5.12.4 of [RFC8881] and Section XXX of [RFCTBD09]) and the ffl_encoding_type_data (see Section YYY of [RFCTBD09]) to convey the distribution of FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_SPARE projection blocks (see Section XXX of [RFCTBD09]) in the layouts for the Flexible File Version 2 Layout Type. The name of each of the enum targets ends with 'X_Y', which states that there is a need for X+Y files to compose the projection. X is the number of FFV2_DS_FLAGS_ACTIVE blocks and Y is the number of FFV2_DS_FLAGS_SPARE blocks.

3.3. ffv2_encoding_data

/// union ffv2_encoding_type_data
///          switch (ffv2_encoding_type fetd_type) {
///     case FFV2_ENCODING_MIRRORED:
///         void;
///     case FFV2_ENCODING_MOJETTE_SYSTEMATIC:
///     case FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC:
///         ffv2_mojette_faulty_devices
///                       fetd_mojette_potection_configuration;
/// };
Figure 5

This addition of FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC to the ffv2_encoding_type_data (see Figure 5) allows for the metadata server to inform the client as to the expected distribution of FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_SPARE projection blocks inside the ffm_data_servers arm for the Mojette erasure encoding types.

4. Examples Tying It All Together

4.1. Block_Sizes

Consider a FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC encoding type which needs 6 active blocks and 2 spare blocks in the payload (Not a valid ffv2_mojette_faulty_devices, but needed to show varied block sizes). As can be seen in Table 1 , 4kB data blocks need blocks about 1kB in size. But not all blocks are the same size.

Table 1: Example sizes of a Mojette Projection
Projection ID 0 1 2 3 4 5 6 7
p value -3 -2 -1 0 1 2 0 0
q value 1 1 1 1 1 1 0 0
size (bytes) 1048 1080 1080 1048 1064 1048 0 0

5. Extraction of XDR

This document contains the external data representation (XDR) [RFC4506] description of the uncacheable attribute. The XDR description is embedded in this document in a way that makes it simple for the reader to extract into a ready-to-compile form. The reader can feed this document into the following shell script to produce the machine readable XDR description of the new flags:

<CODE BEGINS>
#!/bin/sh
grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'


<CODE ENDS>

That is, if the above script is stored in a file called 'extract.sh', and this document is in a file called 'spec.txt', then the reader can do:

<CODE BEGINS>
sh extract.sh < spec.txt > mojette_encoding_prot.x


<CODE ENDS>

The effect of the script is to remove leading white space from each line, plus a sentinel sequence of '///'. XDR descriptions with the sentinel sequence are embedded throughout the document.

Note that the XDR code contained in this document depends on types from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]), the Flex Files Layout Type flexfiles.x file (generated from [RFC8435]), and the Flex Files Layout Type version 2 flexfilesv2.x (generated from [RFCTBD09]). This includes both nfs types that end with a 4, such as offset4, length4, etc., as well as more generic types such as uint32_t and uint64_t.

While the XDR can be appended to that from [RFC7863], the various code snippets belong in their respective areas of that XDR.

6. Security Considerations

This document has the same security considerations as both Flex Files Layout Type version 1 (see Section 15 of [RFC8435]) and NFSv4.2 (see Section 17 of [RFC7862]).

7. IANA Considerations

This document introduces changes in the 'Flex Files V2 Erasure Encoding Type Registry'. This document defines both the FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC types for Client-Side Mojette Transformations.

Table 2: Flex Files V2 Erasure Encoding Type Assignments
Erasure Encoding Type Name Value RFC How Minor Versions
FFV2_ENCODING_MOJETTE_SYSTEMATIC 2 RFCTBD10 L 2
FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC 3 RFCTBD10 L 2

8. References

8.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4506]
Eisler, M., Ed., "XDR: External Data Representation Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, , <https://www.rfc-editor.org/info/rfc4506>.
[RFC7530]
Haynes, T., Ed. and D. Noveck, Ed., "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, , <https://www.rfc-editor.org/info/rfc7530>.
[RFC7862]
Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, , <https://www.rfc-editor.org/info/rfc7862>.
[RFC7863]
Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 External Data Representation Standard (XDR) Description", RFC 7863, DOI 10.17487/RFC7863, , <https://www.rfc-editor.org/info/rfc7863>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8178]
Noveck, D., "Rules for NFSv4 Extensions and Minor Versions", RFC 8178, DOI 10.17487/RFC8178, , <https://www.rfc-editor.org/info/rfc8178>.
[RFC8435]
Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible File Layout", RFC 8435, DOI 10.17487/RFC8435, , <https://www.rfc-editor.org/info/rfc8435>.
[RFC8881]
Noveck, D., Ed. and C. Lever, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 8881, DOI 10.17487/RFC8881, , <https://www.rfc-editor.org/info/rfc8881>.
[RFCTBD09]
Haynes, T., "Erasure Encoding of Files in NFSv4.2", , <https://www.ietf.org/archive/id/draft-haynes-nfsv4-erasure-encoding-00.xml>.

8.2. Informative References

[RFC1813]
Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, DOI 10.17487/RFC1813, , <https://www.rfc-editor.org/info/rfc1813>.

Appendix A. Acknowledgments

The following from Hammerspace were instrumental in driving the incorporation of the Mojette Transformation into an encoding type for Flexible Files Version 2 Layout Type: David Flynn, Trond Myklebust, Tom Haynes, Didier Feron, Jean-Pierre Monchanin, Pierre Evenou, and Brian Pawlowski.

Appendix B. RFC Editor Notes

This section is to be removed before publishing as an RFC.

[RFC Editor: prior to publishing this document as an RFC, please replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the RFC number of this document]

[RFC Editor: prior to publishing this document as an RFC, please replace all occurrences of RFCTBD09 with RFCyyyy where yyyy is the RFC number of the document: draft-haynes-nfsv4-erasure-encoding.xml]

Authors' Addresses

Thomas Haynes
Hammerspace
Pierre Evenou
Hammerspace