The Mojette Transformation for the Erasure Encoding of Files in NFSv4.2

Internet-Draft	mojette-encoding	October 2024
Haynes & Evenou	Expires 18 April 2025	[Page]

Abstract

Parallel NFS (pNFS) allows a separation between the metadata (onto a metadata server) and data (onto a storage device) for a file. The flex file layout type version 2 further allows for erasure encoding types to provide data integrity. In this document, a new erasure encoding type for the Mojette Transformation is introduced.¶

2. Mojette Transform

2.1. Introduction

The Mojette Transform is an erasure coding technique that provides fault tolerance for data storage systems by enabling the recovery of lost data blocks. This section describes the integration of the systematic Mojette Transform into the NFS protocol, focusing on encoding and decoding file system blocks, typically sized at 4KB or 8KB.¶

2.1.1. Encoding

The Mojette Transform involves the following steps to encode a data block:¶

Initialization:: Each data block is treated as a 2D grid of data elements (pixels). Typically, a block is structured as a matrix of size PxQ, where P and Q are the dimensions of the grid.¶
Projections Calculation:: Projections are computed along specific directions defined by pairs of coprime integers (p_i, q_i). Each projection sums the values of the data elements (pixels) along a line defined by these directions. The size of a projection is given as in Figure 1. For a given projection direction (p_i, q_i), the projection values are calculated as in Figure 2, where Delta is 1 if the argument is zero and 0 otherwise.¶

Size of projection = (P - 1) x |q_i| + (Q - 1) * |p_i| + 1

Figure 1

Projection(b, p_i, q_i) =
    sum_{k=0}^{Q-1} sum_{l=0}^{P-1}
            Data(k, l) x Delta(b - l x p_i + k x q_i)

Figure 2

2.1.2. Decoding

Decoding the Mojette Transform is the inverse of encoding, involving the reconstruction of data from projection data to fill an empty grid. This involves solving a system of linear equations defined by the projection differences and the projection directions (p_i, q_i). The algorithm iterates to refine the values of the missing data elements until the original data block is reconstructed.¶

Data reconstruction is possible if Katz's criterion holds, which was extended to any convex shape. The Katz's criterion specifies that reconstruction is valid if for a given set of n projections along n directions (p_i, q_i) either sum_{i=0}^{n} q_i ≥ Q or sum_{i=0}^{n} p_i ≥ P.¶

Adjusting the number of lines Q and the projections set allows setting a desired fault-tolerance threshold.¶

For example a 64x4 grid can be decoded by the projection set {(0, 1), (1, 1), (2, 1), (3, 1)} as sum_{i=1}^{4} q_i = 4.¶

2.1.3. Systematic and Non Systematic Implementations

A systematic code is an error-correcting code where the input data is embedded directly in the encoded output. In contrast, a non-systematic code produces an output that does not contain the original input symbols. The Mojette Transform can be implemented in both ways, allowing it to adapt to various use cases.¶

2.1.4. Data Block Representation

In the context of NFS, a data block corresponds to a file system block, which is a contiguous segment of data, typically 4KB or 8KB in size. The Mojette Transform encodes these blocks to ensure data integrity and availability in distributed storage environments.¶

2.2. Non-Systematic Mojette Transform

2.2.1. Block Encoding

In the non-systematic version of the Mojette Transform, the original data block is not directly included in the encoded output. Instead, the entire encoded output consists of projections computed from the original data. The number of computed projections n is larger than the number of projections m required to rebuild the initial data.¶

2.2.2. Block Decoding

To decode a file system block that has undergone the non-systematic Mojette Transform, the following steps are followed:¶

Identify Available Projections:: Determine which projections are available. A least m projections (what ever they are) out of n must be available.¶
Recompute Data Block:: Apply the inverse Mojette Transform to rebuild the original Data.¶

2.2.3. Example

Assume a file system block of 4KB is divided into a 128x4 matrix of 128-bit elements. Using the non-systematic Mojette Transform, we compute projections along selected directions, such as (-2,1), (-1, -1), (0,1), (1,1), (2,1) and (3,1). The original data is not stored directly; instead, the projections are stored.¶

If a data loss occurs, for instance, if two projections are lost, the missing elements can be recovered by using the remaining projections and solving the inverse problem. Any set of 4 projections among the 6 generated can rebuild exactly the original data block¶

2.3. Systematic Mojette Transform

2.3.1. Block Encoding

In the systematic version, the original data block (file system block) is part of the encoded output. Additional projections are calculated to provide redundancy. If k is the number of original data blocks and n is the total number of encoded blocks (including projections), the systematic code will have the first k blocks as the original data and the remaining n - k blocks as projections.¶

2.3.2. Block Decoding

To decode a file system block that has undergone the Systematic Mojette Transform, the following steps are followed:¶

Identify Missing Data:: Determine which data lines are missing in the block. Let e be the number of missing lines.¶
Recompute Projections:: Compute the projections of the available (partial) data blocks. Calculate the differences between the projections of the full data and the partial data.¶
Combine Existing Data and Recomputed projection:: Recreate the full block of data by combining the existing data with the reconstructed data according to their positions in the block.¶

2.3.3. Example

Assume a file system block of 4KB is divided into a $64\times4$ matrix of 128-bit elements. Using the systematic Mojette Transform, we first compute projections along selected directions, such as (0,1), and (1,1). The original 4 blocks of 64 128-bit elements remains part of the encoded data, and the 2 additional projections are stored for redundancy. If a data loss occurs, the missing elements can be recovered by using the projections and solving the inverse problem.¶

2.4. Conclusion

The Mojette Transform provides two implementations for an efficient and effective way to enhance data reliability by encoding file system blocks with additional projections. These methods ensure that data can be reconstructed even in the presence of failures, thereby enhancing the fault tolerance of the file system.¶

In summary, the Mojette Transform offers robust solutions for data reliability in file systems, balancing redundancy, efficiency, and performance to ensure data integrity and quick recovery from failures.¶

2.4.1. Benefits of Non-Systematic Mojette Transform:

Fast Failure Detection:: By storing only encoded data and having the ability to rebuild the original block from any sufficient subset of projections, the non-systematic encoding implementation offers great flexibility and allows fast failure detection and recovery.¶
Constant Performance:: The non-systematic decoding algorithm is the most performant of all implementations. Although decoding always occurs, the overhead is low, and unlike systematic encoding, performance remains constant regardless of the number of failures.¶

2.4.2. Benefits of Systematic Mojette Transform:

Redundancy Reduction:: Systematic Mojette coding reduces redundancy by integrating the original data blocks into the encoded data, unlike non-systematic codes that generate entirely new data from the original.¶
Efficiency:: Fewer projections need to be calculated and stored, reducing both computational and storage overhead.¶
Performance:: Decoding is faster and simpler, especially when some original data blocks are available, enabling quicker data access. However, performance is slightly degraded in the case of failures and depends on the number of failures.¶

3. Extension of Flexible File Layout Type Version 2 Encoding Type

3.1. ffv2_encoding_type

   /// enum ffv2_encoding_type {
   ///     FFV2_ENCODING_MIRRORED               = 0x1;
   ///     FFV2_ENCODING_MOJETTE_SYSTEMATIC     = 0x2;
   ///     FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC = 0x3;
   /// };

Figure 3

ffv2_encoding_type is extended in Figure 3 to introduce two different erasure encoding types: FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC. They are intoduced at this level instead of at the ffv2_encoding_data (see Figure 5) in order for the client to negotiate the support of one over the other with the NFS4ERR_ERASURE_ENCODING_NOT_SUPPORTED error (see Section ZZZ of [RFCTBD09]).¶

3.2. ffv2_mojette_faulty_devices

/// enum ffv2_mojette_faulty_devices {
///         FFV2_MOJETTE_FAULTY_DEVICES_2_1       = 0x1;
///         FFV2_MOJETTE_FAULTY_DEVICES_4_1       = 0x2;
///         FFV2_MOJETTE_FAULTY_DEVICES_4_2       = 0x3;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_1       = 0x4;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_2       = 0x5;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_3       = 0x6;
///         FFV2_MOJETTE_FAULTY_DEVICES_8_4       = 0x7;
/// };

Figure 4

The ffv2_mojette_faulty_devices (see Figure 4) can be used in both the layout_hint (see both Section 5.12.4 of [RFC8881] and Section XXX of [RFCTBD09]) and the ffl_encoding_type_data (see Section YYY of [RFCTBD09]) to convey the distribution of FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_SPARE projection blocks (see Section XXX of [RFCTBD09]) in the layouts for the Flexible File Version 2 Layout Type. The name of each of the enum targets ends with 'X_Y', which states that there is a need for X+Y files to compose the projection. X is the number of FFV2_DS_FLAGS_ACTIVE blocks and Y is the number of FFV2_DS_FLAGS_SPARE blocks.¶

3.3. ffv2_encoding_data

/// union ffv2_encoding_type_data
///          switch (ffv2_encoding_type fetd_type) {
///     case FFV2_ENCODING_MIRRORED:
///         void;
///     case FFV2_ENCODING_MOJETTE_SYSTEMATIC:
///     case FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC:
///         ffv2_mojette_faulty_devices
///                       fetd_mojette_potection_configuration;
/// };

Figure 5

This addition of FFV2_ENCODING_MOJETTE_SYSTEMATIC and FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC to the ffv2_encoding_type_data (see Figure 5) allows for the metadata server to inform the client as to the expected distribution of FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_SPARE projection blocks inside the ffm_data_servers arm for the Mojette erasure encoding types.¶

Table 1: Example sizes of a Mojette Projection
Projection ID	0	1	2	3	4	5
p value	-3	-2	-1	0	1	2
q value	1	1	1	1	1	1
size (bytes)	1048	1080	1080	1048	1064	1048

5. Extraction of XDR

This document contains the external data representation (XDR) [RFC4506] description of the uncacheable attribute. The XDR description is embedded in this document in a way that makes it simple for the reader to extract into a ready-to-compile form. The reader can feed this document into the following shell script to produce the machine readable XDR description of the new flags:¶

<CODE BEGINS>
#!/bin/sh
grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'


<CODE ENDS>

That is, if the above script is stored in a file called 'extract.sh', and this document is in a file called 'spec.txt', then the reader can do:¶

<CODE BEGINS>
sh extract.sh < spec.txt > mojette_encoding_prot.x


<CODE ENDS>

The effect of the script is to remove leading white space from each line, plus a sentinel sequence of '///'. XDR descriptions with the sentinel sequence are embedded throughout the document.¶

Note that the XDR code contained in this document depends on types from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]), the Flex Files Layout Type flexfiles.x file (generated from [RFC8435]), and the Flex Files Layout Type version 2 flexfilesv2.x (generated from [RFCTBD09]). This includes both nfs types that end with a 4, such as offset4, length4, etc., as well as more generic types such as uint32_t and uint64_t.¶

While the XDR can be appended to that from [RFC7863], the various code snippets belong in their respective areas of that XDR.¶

Table 2: Flex Files V2 Erasure Encoding Type Assignments
Erasure Encoding Type Name	Value	RFC	How	Minor Versions
FFV2_ENCODING_MOJETTE_SYSTEMATIC	2	RFCTBD10	L	2
FFV2_ENCODING_MOJETTE_NON_SYSTEMATIC	3	RFCTBD10	L	2

8. References

8.1. Normative References

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC4506]: Eisler, M., Ed., "XDR: External Data Representation Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 2006, <https://www.rfc-editor.org/info/rfc4506>.
[RFC7530]: Haynes, T., Ed. and D. Noveck, Ed., "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015, <https://www.rfc-editor.org/info/rfc7530>.
[RFC7862]: Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, November 2016, <https://www.rfc-editor.org/info/rfc7862>.
[RFC7863]: Haynes, T., "Network File System (NFS) Version 4 Minor Version 2 External Data Representation Standard (XDR) Description", RFC 7863, DOI 10.17487/RFC7863, November 2016, <https://www.rfc-editor.org/info/rfc7863>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8178]: Noveck, D., "Rules for NFSv4 Extensions and Minor Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, <https://www.rfc-editor.org/info/rfc8178>.
[RFC8435]: Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, <https://www.rfc-editor.org/info/rfc8435>.
[RFC8881]: Noveck, D., Ed. and C. Lever, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 8881, DOI 10.17487/RFC8881, August 2020, <https://www.rfc-editor.org/info/rfc8881>.
[RFCTBD09]: Haynes, T., "Erasure Encoding of Files in NFSv4.2", October 2024, <https://www.ietf.org/archive/id/draft-haynes-nfsv4-erasure-encoding-00.xml>.

8.2. Informative References

[RFC1813]: Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, DOI 10.17487/RFC1813, June 1995, <https://www.rfc-editor.org/info/rfc1813>.

The Mojette Transformation for the Erasure Encoding of Files in NFSv4.2

Abstract

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

1.1. Notations

1.2. Requirements Language