Packed CBOR: Table permutations

Internet-Draft	Packed CBOR: Table permutations	July 2024
Amsüss	Expires 30 January 2025	[Page]

Abstract

Packed CBOR is a compression mechanism for Concise Binary Object Representation (CBOR) that can be used without a decompression step. This document introduces a means for altering configured tables to optimize the use of efficient encoding points by shuffling around entries in the packing tables.¶

2. Setting up the tables by reference

CBOR tag CPA115 describes a permutation of the CBOR packing tables around a rump. Unlike tags CPA113 and most other table setup tags, it does not add items; instead, it modifies the table such that items with very short encoded lengths (most efficient are the first 16 items of the shared table, and the first item of the argument items in straight use, and the first 8 items of the inverted items) can be used even when they were not originally assigned to those positions.¶

Without excluding others, there are two groups of use cases envisioned:¶

Rearranging preconfigured tables.¶

If a media type comes with a pre-configured table, and/or when one ore more external tables are referenced (e.g. by using [I-D.amsuess-cbor-packed-by-reference]), a single tag CPA115 can be used around the main rump of the document to pull the most frequently used entries into the prominent positions.¶
Local optimization.¶

If some table entries are frequent in a sub-tree of the document, using tag CPA115 around that subtree can be beneficiall to overall compactness even when the table was set up locally (using tag CPA113) with an ordering suitable for the whole document.¶

Packed-Shuffle = #6.<cpa115>([
    shuffle-shared,
    ?shuffle-argument,
    rump])
rump = any
shuffle-shared = shuffle
shuffle-argument = shuffle
shuffle = [+(offset, ?length)]
offset = uint
length = nint

cpa115 = 115   ; preliminary value, see IANA considerations

2.1. Permutation semantics

Inside the rump, each table has a permutation applied compared to the outer table, described in the suffle-shared value for the shared table, and in the shuffle-argument value for the argument table (which defaults to the identical permutation).¶

The applied permutations are described in inverse (so that a table index used in the rump gets mapped to a table index outside this tag): The item shuffle consecutively lists the lowest items of the inner table by indicating their postions in the outer table. A negative number indicates that a group of in total (1 - length) items are included consecutively starting with the item before the negative number. This creates a kind of run-length encoding. At the end of the shuffle array, all items that were not referenced are appended in their original sequence from the outer table.¶

An inner index can be calculated in a way suitable to constrained systems by applying the algorighm in Appendix A.¶

2.2. Examples

Outer table:
 A B C D E F G H

Permutation CBOR item:
 [1 / pick the "B" /, 4 / pick the "E" /, -2 / "3 items" /]

Inner table:
 B E F G A C D H

Figure 1: Single permutation illustrated

1113([
  ["tick.", "tock.", "tickety."],
  ["The clock goes ", "And it goes "],
    [
    [
      6(simple(0)), / "The clock goes tick." /
      6(simple(1)), / "The clock goes tock." /
      224(simple(2)), / "And it goes tickety." /
      6(simple(1)), / "The clock goes tock." /
    ]
    115([[], [1], [
      / The argument table is now:
        ["And it goes ", "The clock goes "]
      /
      6(simple(0)), / "And it goes tick." /
      6(simple(1)), / "And it goes tock." /
      6(simple(2)), / "And it goes tickety." /
      6(simple(1)), / "And it goes tock." /
    ]]),
    [
      6(simple(1)), / "The clock goes tock." /
    ]
  ]
])

Figure 2: Using permutations to make the 6() straight argument available.

Note that in the tick/tock example, using 6() over 224() saves just 1 byte, whereas the setup around tag CPA115 costs 6 bytes. This particular use would break even at 6 uses of tag 6.¶

Appendix A. Example implementation

The following algorithm illustrates how table lookup can be implemented without the need for variable size storage per compression table.¶

An implementation in fixed memory is expected to accommodate up to a fixed size of nested table setup tags. When parsing the shuffle item, if may calculate early_item_count and store it for later use, along with a reference to the position of the shuffle table, through which it iterates again for every item to be unpacked.¶

def early_item_count(shuffle):
    count = 0
    for (offset, length) in shuffle:
        length = 1 if length is None else 1 - length
        count += length
    return count

def lookup(index, shuffle, outertable):
    shuffle_early_items = early_item_count(shuffle)
    if index < shuffle_early_items:
        for (offset, length) in shuffle:
            length = 1 if length is None else 1 - length
            if index < length:
                return outertable[offset + index]
            index -= length
    else:
        outer_index = index - shuffle_early_items
        for (offset, length) in shuffle:
            length = 1 if length is None else 1 - length
            if offset <= outer_index:
                outer_index += length
        return outertable[outer_index]

Appendix B. Open issues

For this to progress it will need some real use cases, not just synthetic examples.¶
An unoptimized version of this can be achieved with tag CPA113 from [I-D.ietf-cbor-packed], without the numeric encoding of items and the special encoding for consecutive items. Using CPA113 this way is relatively easy for shared items; for argument items, it requires referencing them from the shared table or performing a no-op concatenation with an empty item.¶
Is skipping of used values worth it? If we accepted that this tag produced not a permutation but an extended copy of the outer table, we could do without the double iteration / pre-computation of the number of items. This would be a simplification, and more similar to using CPA113. The advantage of the permutation construction is that the non-optimized entries are not pushed back more than necessary, saving a bit (in some cases, literally) in the encoded size of those points.¶

Is there maybe a different expression of the permutation that has the same nice properties added complexity? (Stating the number of items in front of the shuffle item saves the double processing, but adds redundant information and does not help with the branching in the implementation).¶

Whether this document should progress on from the initial version should depend on whether good use cases are identified and whether it outperforms alternative solutions by a sufficient margin.¶

Packed CBOR: Table permutations

Abstract

About This Document

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

2. Setting up the tables by reference

2.1. Permutation semantics

2.2. Examples

3. Security Considerations

4. IANA Considerations

4.1. CBOR Tags Registry

5. References

5.1. Normative References

5.2. Informative References

Appendix A. Example implementation

Appendix B. Open issues

Author's Address