Computing Aware Traffic Steering (CATS) with Generic Metric

Internet-Draft	CATS with Generic Metric	October 2024
Yuan, et al.	Expires 24 April 2025	[Page]

Abstract

Steering traffic for computing-related services considering computing resources and circumstances is discussed in CATS WG. Correspondingly, publishing services and updating computing conditions turns out to be a significant issue. It SHOULD be realized that multiple same common metrics are required from both network and service instances in order to evaluate overall performance and further achieve and fulfill appropriate traffic steering and scheduling. Therefore, an implementation for distributed CATS with generic metrics delivery and distribution based on BGP is proposed and discussed in this draft.¶

1. Introduction

Since for computing related services, AR/VR, metaverse for instance, the performance experienced by clients and customers is determined not only by network metrics but also by computing circumstances. Relevant use cases and problem statements are discussed in [I-D.ietf-cats-usecases-requirements]. For CATS framework introduced in [I-D.ietf-cats-framework], it would be an essential and significant issue of computing metrics publishing and updating for CATS.¶

Generally, control plane for CATS could be organized and deployed in various patterns and forms depending on the specific schemes of computing metrics collection and notification, instance selection and path calculation and other workflows. Especially for distributed metrics collection and distributed control plane implementations, protocols including BGP, BGP-LS, IGP would be mentioned to extend their capabilities to support metrics distribution and collection.¶

Furthermore, for computing metrics, they could be classified into multiple types and categories. A typical instance for computing metric analysis and discussion is presented in [I-D.ysl-cats-metric-definition]. Generally, there could be converted, abstract and generic metrics or explicit metadata. In another aspect, to achieve end-to-end service provisioning, metrics of same dimensions among network infrastructure and service instances SHOULD be considered together while unique types of computing metrics MAY be processed independently.¶

General considerations for metrics which MAY be distributed and utilized in CATS are discussed below.¶

Generic and common metrics: Latency, bandwidth and converted abstract metrics or costs (TE metrics, Costs, etc) for instance. Service instances and computing resources share these same types of metrics with network infrastructure. The accumulation of latencies would reflect the end-to-end delay. Similarly, a minimum bandwidth of the forwarding paths would indicate the overall capacity. Thus, potential requirements for comprehensive considerations of overall generic metrics SHOULD be noted.¶
Unique metrics originated from specific areas (computing-related services, clusters, etc.): Computing capabilities, available memories, existing connections for instance. Commonly, network devices and network links do not have these similar metrics. Thus, if these metrics are distributed to the network, they turn out to be unique types and are not natively recognized. To evaluate these metrics, they would be relatively considered independently.¶


                                        Computing Resources
                                        Inst latency
                                        Service bandwidth
                                        Abstract metrics
                                                            +---+
                                                         +-----+ )
                                                 +----- +|C-SMA|  +
                                                /      ( +-----+   )
                                               /      ( +--+    --  )
+--------------+              +--------------+/     (   |LB|---(  )  )
|CATS-Forwarder|--------------|CATS-Forwarder|------(   +--+    --   )
+--------------+              +--------------+       (              )
                 Network                              +------------+
                 link(policy) latency                 Service Instance
                 link(policy) bandwidth
                 link(policy) metric

Figure 1: Network and Computing Metrics

In distributed control plane scenarios, especially when the service traffic needs to traverse multiple ASes, computing metrics SHOULD be distributed among CATS-Forwarders and be considered when performing ordered updates of routes. Thus, a distribution scheme based on generic metric introduced in [I-D.ietf-idr-bgp-generic-metric] is proposed in this draft. Generic metric is proposed to accumulate and propagate different types of metrics as it will aid in intent-based end-to-end path across BGP domains. Similarly, CATS SHOULD also be recognized as another intend-based end-to-end routing scenario. Computing-related services would be identified with multiple intents and thus these intents and relevant metrics SHOULD be able to be distributed. Furthermore, computing metrics, especially generic and common types of metrics, require to be accumulated and thus processed along the path of distribution. Detailed implementation will be introduced and discussed in the following sections.¶

3. Generic Metrics for CATS

In [I-D.ietf-idr-bgp-generic-metric], Accumulative Metric is defined.¶


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Accumulated Metric Code    |   Accumulated Metric Length   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Accumulated Metric Data...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 2: AMetric TLV

For the field of Metric Type in Accumulated Metric Data, values would be determined from IGP-Protocol registry for metric-types. Thus, parameters including latency, upstream/downstream bandwidth and configured TE metric of service instances could be encoded accordingly for a CATS scenario, in order to be processed in a general accumulative manner along the path.¶


 0                 1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Metric-type 1 | Metric-flags1 | Metric 1 value...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 0                 1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Metric-type 2 | Metric-flags2 | Metric 2 value...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 0                 1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Service Metric| Metric-flags  | Service Metric value...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3: Accumulative Metrics

Besides metric types defined with IGP registry, unique metric types would also be considered for a CATS scenario to extend and modify a current AMetric scheme. Suppose a general Service Metric or Cost would be proposed which specify the estimated or tested performance of a service instance with an abstract value. With normalized Service Metric and multiple dimensions of existing generic metrics, the implementations for CATS turn out to be various patterns. Regarding similar classifications for manifestations of discontinuity, typical senarios will be displayed in the following sections.¶

4. Senario 1: Minimum End-to-end Latency for Computing-related Service


                                                       +---+
                                                    +-----+ )
                                                   +|C-SMA|  +
                                                  ( +-----+   )
                                                 ( +--+    --  )
+---------+     policy     +---------+         (   |LB|---(  )  )
|CATS     |----------------|CATS     |---------(   +--+    --   )
|Forwarder|----------------|Forwarder|---+      (              )
+---------+                +---------+    \      +------------+
                                           \                 +---+
           \\                               \             +-----+ )
            \\                               \           +|C-SMA|  +
             \\                               \         ( +-----+   )
              \\                               \       ( +--+    --  )
               \\                               \    (   |LB|---(  )  )
                \\  policy                       +---(   +--+    --   )
                 \\                                   (              )
                  \\                                   +------------+
                   \\                                  +---+
                    \\                              +-----+ )
                     \\                            +|C-SMA|  +
                      \\                          ( +-----+   )
                         +---------+             ( +--+    --  )
                         |CATS     |           (   |LB|---(  )  )
                         |Forwarder|-----------(   +--+    --   )
                         +---------+            (              )
                                                 +------------+

Figure 4: Minimum End-to-end Latency for Computing-related Service

C-SMAs collect computing-related metrics and pre-process relevant metadata. C-SMAs would be configured to establish BGP peers to CATS-Forwarders and thus distribute and update computing metrics with Generic Metric attribute. Suppose services deployed here require minimum end-to-end latency, delay would be filled in the update packets according to Generic Metric. Here, service routes MAY be distributed with next hop as a load balancer.¶
Services would be deployed in VRFs or a public VRF. CATS-Forwarders might be enabled to detect the latency to their correlated load balancers. Thus, service routes of same prefixes are updated with accumulated latency values. The value includes a processing delay of service instances and a detected delay between the CATS-Forwarder and the load balancer. Comparing among routes of same service prefixes, these routes would be re-ordered determined by the accumulated latency. When selecting a best route, the service route will be distributed to the remote device and the next hop would be modified as the CATS-Forwarder itself.¶
Similarly, remote CATS-Forwarders would be able to detect the latency of policies or network links. Therefore, CATS-Forwarders could calculate the end-to-end latency values for each candidate service instance with resolved TE policies. Identically, ordered updates are performed and best routes are correspondingly determined. Since a delay parameter is accumulated along the path of service routes distribution, the accumulation would aid remote CATS-Forwarders to perform the specific latency-intent-based path selection.¶

The workflow also works for circumstances when service traffic needs to traverse multiple ASes. The end-to-end latency would be accumulated and calculated along the path of service routes distribution.¶


                                                             +---+
                                                          +-----+ )
       +------------+  +-----------+  +-----------+      +|C-SMA|  +
       |            |  |           |  |           |     ( +-----+   )
       |            |  |           |  |           |    ( +--+    --  )
       |        ASBR|--|ASBR   ASBR|--|ASBR   ASBR|--(   |LB|---(  )  )
       |            |  |           |  |           |  (   +--+    --   )
       |            |  |           |  |           |   (              )
       |            |  +-----------+  +-----------+    +------------+
       |            |
       |            |
       |            |
+---------+         |
|CATS     |         |
|Forwarder|         |
+---------+         |
       |            |                                        +---+
       |            |                                     +-----+ )
       |            |  +-----------+  +-----------+      +|C-SMA|  +
       |            |  |           |  |           |     ( +-----+   )
       |            |  |           |  |           |    ( +--+    --  )
       |        ASBR|--|ASBR   ASBR|--|ASBR   ASBR|--(   |LB|---(  )  )
       |            |  |           |  |           |  (   +--+    --   )
       |            |  |           |  |           |   (              )
       +------------+  +-----------+  +-----------+    +------------+

Figure 5: End-to-end Latency Accumulation among Multiple ASes

5. Senario 2: Minimum Cost for Computing-related Service with constrained latency


                                     (For Inst 1 and 2)

             Delay 15,Cost 30         Delay 10,Cost 20
             Delay 25,Cost 25         Delay 20,Cost 15
            <-----------------       <-----------------

                                                       +---+
                                                    +-----+ )
                                                   +|C-SMA|  +
                                                  ( +-----+   )
                                       Delay 5   ( +--+    --  )
+---------+     policy     +---------+ Cost 10 (   |LB|---(  )  )
|CATS     |----------------|CATS     |---------(   +--+    --   )
|Forwarder|----------------|Forwarder|---+      (              )
+---------+                +---------+    \      +------------+
                                           \                 +---+
           \\                               \             +-----+ )
            \\                               \           +|C-SMA|  +
             \\                               \         ( +-----+   )
              \\                       Delay 6 \       ( +--+    --  )
               \\                      Cost 12  \    (   |LB|---(  )  )
                \\  policy                       +---(   +--+    --   )
                 \\                                   (              )
                  \\                                   +------------+
                   \\                                  +---+
                    \\                              +-----+ )
                     \\                            +|C-SMA|  +
                      \\                          ( +-----+   )
                         +---------+  Delay 8    ( +--+    --  )
                         |CATS     |  Cost 14  (   |LB|---(  )  )
                         |Forwarder|-----------(   +--+    --   )
                         +---------+            (              )
                                                 +------------+
                                       (For Inst 3)

                                      Delay 10,Cost 20
                                     <-----------------

Figure 6: Minimum Cost for Computing-related Service with constrained latency

Similar to Scenario 1, C-SMAs collect computing-related metrics and distribute computing metrics with Generic Metric attribute. Suppose services deployed here require minimum end-to-end cost, TE metric for instance. Additionally, end-to-end latency is configured as constraints for ordered updates of routes. Converted costs and detected latency values would be filled in the update packets.¶
Service routes of same prefixes are updated with accumulated latency values and costs. The latency value includes a processing delay of service instances and a detected delay between the CATS-Forwarder and the load balancer. Similarly, The cost value includes a notified cost and a configured cost to the next hop. Additional path MAY be enabled at CATS-Forwarders, and thus service route will be distributed to the remote device and the next hop would be modified as the CATS-Forwarder itself.¶
Finally, remote CATS-Forwarders calculate the end-to-end latency values and overall costs for each candidate service instance with resolved policies or forwarding paths. Ordered updates with configured constraints are performed and best or appropriate routes are correspondingly determined.¶

Therefore, a generic metric scheme would work well for multi-factor scenarios.¶

6. Senario 3: Normalized Metrics in Distribution Process

It SHOULD be considered that generic metrics MAY be not always supported for each ASes and devices alongside the distribution process. Under certain circumstances, these metrics would be normalized or be transmitted unchanged.¶


                                   (For Inst 1 and 2)

                                    Delay 10,Metric 10
                                    Delay 20,Metric 12
       Cost+Normalized Metric      <------------------

        +------------------+                         +---+
        |                  |                      +-----+ )
        |                  |                     +|C-SMA|  +
        |                  |         Delay 5    ( +-----+   )
        |                  |         Cost 10   ( +--+    --  )
        |                +---------+         (   |LB|---(  )  )
        |                |CATS     |---------(   +--+    --   )
        |                |Forwarder|---+      (              )
        |                +---------+    \      +------------+
        |                  |             \                 +---+
        |                  |              \             +-----+ )
        |                  |               \           +|C-SMA|  +
        |                  |         Delay 8\         ( +-----+   )
        |                  |         Cost 10 \       ( +--+    --  )
        |                  |                  \    (   |LB|---(  )  )
Service |                  |                   +---(   +--+    --   )
Metric  |                  |                        (              )
Unaware |                  |                         +------------+
        |                  |                         +---+
        |                  |                      +-----+ )
        |                  |                     +|C-SMA|  +
        |                  |        Delay 6     ( +-----+   )
        |              +---------+  Cost 15    ( +--+    --  )
        |              |CATS     |           (   |LB|---(  )  )
        |              |Forwarder|-----------(   +--+    --   )
        |              +---------+            (              )
        |                  |                   +------------+
        |                  |         (For Inst 3)
        |                  |
        |                  |         Delay 10,Metric 15
        +------------------+        <------------------

Figure 7: Minimum Cost for Computing-related Service with constrained latency

Normalization algorithms and strategies could be configured at CATS-Forwarders. When an AS or device is unaware of specific type of generic metric, a service metric displayed in the figure for instance, the metric value could be converted and normalized. For instance, service metric values could be magnified ten-fold to be common IGP Cost values. Afterwards, normalized values could be accumulated with IGP Costs to next hop. With the other implementation, unrecognized values would be transmitted unchanged if the remote devices are capable of analyzing such metrics. Ordered updates of service routes could be performed with a purpose of minimum service metric with constraints of end-to-end latency and cost.¶


                                     (For Inst 1 and 2)

    Service Metric                    Delay 10,Metric 10
    Accumulated Cost,Delay            Delay 20,Metric 12
    <------------------              <------------------

+---------+    +-------------+                         +---+
|         |    |             |                      +-----+ )
|         |    |             |                     +|C-SMA|  +
|         |    |             |         Delay 5    ( +-----+   )
|         |    |             |         Cost 10   ( +--+    --  )
|         |    |           +---------+         (   |LB|---(  )  )
|         |    |           |CATS     |---------(   +--+    --   )
|         |    |           |Forwarder|---+      (              )
|         |    |           +---------+    \      +------------+
|         |    |             |             \                 +---+
|         |    |             |              \             +-----+ )
|         |    |             |               \           +|C-SMA|  +
|         |    |             |         Delay 8\         ( +-----+   )
|         |    |             |         Cost 10 \       ( +--+    --  )
| Service |--- | Service     |                  \    (   |LB|---(  )  )
| Metric  |    | Metric      |                   +---(   +--+    --   )
| Aware   |--- | Unaware     |                        (              )
|         |    |             |                         +------------+
|         |    |             |                         +---+
|         |    |             |                      +-----+ )
|         |    |             |                     +|C-SMA|  +
|         |    |             |        Delay 6     ( +-----+   )
|         |    |         +---------+  Cost 15    ( +--+    --  )
|         |    |         |CATS     |           (   |LB|---(  )  )
|         |    |         |Forwarder|-----------(   +--+    --   )
|         |    |         +---------+            (              )
|         |    |             |                   +------------+
|         |    |             |         (For Inst 3)
|         |    |             |
|         |    |             |         Delay 10,Metric 15
+---------+    +-------------+        <------------------

Figure 8: Minimum Service Metric for Computing-related Service with constrained latency and cost

7. Conclusion

About Computing Aware Traffic Steering (CATS) with Generic Metric, several considerations SHOULD be noted:¶

It mainly applies for circumstances of distributed control plane for CATS. For a centralized control plane based on controllers or orchestrators, there might be existing interfaces for the collection of computing metrics.¶
Generic common metrics between network and computing resources SHOULD be considered as significant factors which aid routes selection, especially for conditions of the provisioning of end-to-end services.¶
Flexible and complex metadata or unique metrics are suggested to be normalized as simple and abstract factors which would restrain route oscillation and make route selection easier.¶

11. Normative References

[I-D.ietf-cats-framework]: Li, C., Du, Z., Boucadair, M., Contreras, L. M., and J. Drake, "A Framework for Computing-Aware Traffic Steering (CATS)", Work in Progress, Internet-Draft, draft-ietf-cats-framework-04, 17 October 2024, <https://datatracker.ietf.org/doc/html/draft-ietf-cats-framework-04>.
[I-D.ietf-cats-usecases-requirements]: Yao, K., Trossen, D., Contreras, L. M., Shi, H., Li, Y., Zhang, S., and Q. An, "Computing-Aware Traffic Steering (CATS) Problem Statement, Use Cases, and Requirements", Work in Progress, Internet-Draft, draft-ietf-cats-usecases-requirements-03, 3 July 2024, <https://datatracker.ietf.org/doc/html/draft-ietf-cats-usecases-requirements-03>.
[I-D.ietf-idr-bgp-generic-metric]: Sangli, S. R., Hegde, S., Das, R., Decraene, B., Wen, B., Kozak, M., Dong, J., Jalil, L., and K. Talaulikar, "Accumulated Metric in NHC attribute", Work in Progress, Internet-Draft, draft-ietf-idr-bgp-generic-metric-00, 30 August 2024, <https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-generic-metric-00>.
[I-D.ysl-cats-metric-definition]: Yao, K., Shi, H., and C. Li, "CATS metric Definition", Work in Progress, Internet-Draft, draft-ysl-cats-metric-definition-00, 8 July 2024, <https://datatracker.ietf.org/doc/html/draft-ysl-cats-metric-definition-00>.
[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Computing Aware Traffic Steering (CATS) with Generic Metric

Abstract

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

2. Requirements Language

3. Generic Metrics for CATS

4. Senario 1: Minimum End-to-end Latency for Computing-related Service

5. Senario 2: Minimum Cost for Computing-related Service with constrained latency

6. Senario 3: Normalized Metrics in Distribution Process

7. Conclusion

8. Security Considerations

9. Acknowledgements

10. IANA Considerations

11. Normative References

Authors' Addresses