BESS Working Group P. Brissette Internet-Draft LA. Burdet, Ed. Intended status: Standards Track Cisco Systems Expires: 19 April 2025 B. Wen Comcast E. Leyton Verizon Wireless J. Rabadan Nokia 16 October 2024 EVPN Port-Active Redundancy Mode draft-ietf-bess-evpn-mh-pa-11 Abstract The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables establishing a logical link-aggregation connection with a redundant group of independent nodes. The objective of MC-LAG is to enhance both network availability and bandwidth utilization through various modes of traffic load-balancing. RFC7432 defines EVPN-based MC-LAG with Single-active and All-active multi-homing redundancy modes. This document builds on the existing redundancy mechanisms supported by EVPN and introduces a new Port-Active redundancy mode. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 19 April 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. Brissette, et al. Expires 19 April 2025 [Page 1] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. Multi-Chassis Link Aggregation (MC-LAG) . . . . . . . . . . . 3 3. Port-Active Redundancy Mode . . . . . . . . . . . . . . . . . 4 3.1. Overall Advantages . . . . . . . . . . . . . . . . . . . 4 3.2. Port-Active Redundancy Procedures . . . . . . . . . . . . 5 4. Designated Forwarder Algorithm to Elect per Port-Active PE . 6 4.1. Capability Flag . . . . . . . . . . . . . . . . . . . . . 6 4.2. Modulo-based Algorithm . . . . . . . . . . . . . . . . . 7 4.3. Highest Random Weight Algorithm . . . . . . . . . . . . . 7 4.4. Preference-based DF Election . . . . . . . . . . . . . . 8 4.5. AC-Influenced DF Election . . . . . . . . . . . . . . . . 8 5. Convergence considerations . . . . . . . . . . . . . . . . . 8 5.1. Primary / Backup per Ethernet-Segment . . . . . . . . . . 9 5.2. Backward Compatibility . . . . . . . . . . . . . . . . . 9 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 10 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 11 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 11.1. Normative References . . . . . . . . . . . . . . . . . . 11 11.2. Informative References . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction EVPN [RFC7432] defines the All-Active and Single-Active redundancy modes. All-Active redundancy provides per-flow load-balancing for multi-homing, while Single-Active redundancy ensures service carving where only one of the PEs in a redundancy relationship is active per service. Although these two multi-homing scenarios are widely utilized in data center and service provider access networks, there are cases where active/standby multi-homing at the interface level is beneficial and necessary. The primary consideration for this new mode of load- Brissette, et al. Expires 19 April 2025 [Page 2] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 balancing is the determinism of traffic forwarding through a specific interface, rather than statistical per-flow load-balancing across multiple PEs providing multi-homing. This determinism is essential for certain QoS features to function correctly. Additionally, this mode ensures fast convergence during failure and recovery, which is expected by customers. This document defines the Port-Active redundancy mode as a new type of multi-homing in EVPN and details how this mode operates and is supported via EVPN. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Multi-Chassis Link Aggregation (MC-LAG) When a CE device is multi-homed to a set of PE nodes using the [IEEE_802.1AX_2014] Link Aggregation Control Protocol (LACP), the PEs must function as a single LACP entity for the Ethernet links to form and operate as a Link Aggregation Group (LAG). To achieve this, the PEs connected to the same multi-homed CE must synchronize LACP configuration and operational data among them. Historically, the Interchassis Communication Protocol (ICCP) [RFC7275] has been used for this synchronization. EVPN, as described in [RFC7432], covers the scenario where a CE is multi-homed to multiple PE nodes, using a LAG to simplify the procedure significantly. This simplification, however, comes with certain assumptions: * a CE device connected to EVPN multi-homing PEs MUST have a single LAG with all its links connected to the EVPN multi-homing PEs in a redundancy group. * identical LACP parameters MUST be configured on peering PEs, including system ID, port priority, and port key. This document presumes proper LAG operation as specified in [RFC7432]. Issues resulting from deviations in the aforementioned assumptions, LAG misconfiguration, and miswiring detection across peering PEs are considered outside the scope of this document. Brissette, et al. Expires 19 April 2025 [Page 3] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 +-----+ | PE3 | +-----+ +-----------+ | MPLS/IP | | CORE | +-----------+ +-----+ +-----+ | PE1 | | PE2 | +-----+ +-----+ | | I1 I2 \ / \ / +---+ |CE1| +---+ Figure 1: MC-LAG Topology Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are part of the same redundancy group providing multi-homing to CE1 via interfaces I1 and I2. Interfaces I1 and I2 are members of a LAG running LACP. The core, shown as IP or MPLS enabled, provides a wide range of L2 and L3 services. MC-LAG multi-homing functionality is decoupled from those services in the core and it focuses on providing multi-homing to the CE. In Port-Active redundancy mode, only one of the two interfaces I1 or I2 would be in forwarding and the other interface will be in standby. This also implies that all services on the active interface are in active mode and all services on the standby interface operate in standby mode. 3. Port-Active Redundancy Mode 3.1. Overall Advantages The use of Port-Active redundancy in EVPN networks provides the following benefits: a. Port-Active redundancy offers open standards-based active/standby redundancy at the interface level, eliminating the need for ICCP and LDP (e.g., VXLAN or SRv6 may be used in the network). b. This mode is agnostic of the underlying technology (MPLS, VXLAN, SRv6) and associated services (L2, L3, Bridging, E-LINE, etc.) c. It enables deterministic QoS over MC-LAG attachment circuits. Brissette, et al. Expires 19 April 2025 [Page 4] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 d. Port-Active redundancy is fully compliant with [RFC7432] and does not require any new protocol enhancements to existing EVPN RFCs. e. It can leverage various Designated Forwarder (DF) election algorithms, such as modulo ([RFC7432]), Highest Random Weight (HRW, [RFC8584]), etc. f. Port-Active redundancy replaces legacy MC-LAG ICCP-based solutions and offers the following additional benefits: * Efficient support for 1+N redundancy mode (with EVPN using BGP RR), whereas ICCP requires a full mesh of LDP sessions among PEs in the redundancy group. * Fast convergence with mass-withdraw is possible with EVPN, which has no equivalent in ICCP. 3.2. Port-Active Redundancy Procedures The following steps outline the proposed procedure for supporting Port-Active redundancy mode with EVPN LAG: a. The Ethernet-Segment Identifier (ESI) MUST be assigned per access interface as described in [RFC7432]. The ESI can be auto-derived or manually assigned and the access interface MAY be a Layer-2 or Layer-3 interface. b. The Ethernet-Segment (ES) MUST be configured in Port-Active redundancy mode on peering PEs for the specified access interface. c. When ESI is configured on a Layer-3 interface, the Ethernet- Segment (ES) route (Route Type-4) MAY be the only route exchanged by PEs in the redundancy group. d. PEs in the redundancy group leverage the DF election defined in [RFC8584] to determine which PE keeps the port in active mode and which one(s) keep it in standby mode. Although the DF election defined in [RFC8584] is per [ES, Ethernet Tag] granularity, the DF election is performed per [ES] in Port-Active redundancy mode. The details of this algorithm are described in Section 4. e. The DF router MUST keep the corresponding access interface in an up and forwarding active state for that Ethernet-Segment. f. Non-DF routers SHOULD implement a bidirectional blocking scheme for all traffic comparable to the Single-Active blocking scheme described in [RFC7432], albeit across all VLANs. Brissette, et al. Expires 19 April 2025 [Page 5] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 * Non-DF routers MAY bring and keep the peering access interface attached to them in an operational down state. * If the interface is running the LACP protocol, the non-DF PE MAY set the LACP state to OOS (Out of Sync) instead of setting the interface to a down state. This approach allows for better convergence during the transition from standby to active mode. g. The primary/backup bits of the EVPN Layer 2 Attributes Extended Community [RFC8214] SHOULD be used to achieve better convergence, as described in Section 5.1. 4. Designated Forwarder Algorithm to Elect per Port-Active PE The ES routes operating in Port-Active redundancy mode are advertised with the new Port Mode Load-Balancing capability bit in the DF Election Extended Community as defined in [RFC8584]. Additionally, the ES associated with the port utilizes the existing Single-Active procedure and signals the Single-Active Multihomed site redundancy mode along with the Ethernet-AD per-ES route (refer to Section 7.5 of [RFC7432]). Finally, The ESI label-based split-horizon procedures specified in Section 8.3 of [RFC7432] SHOULD be employed to prevent transient echo packets when Layer-2 circuits are involved. Various algorithms for DF Election are detailed in Sections 4.2 to 4.5 for comprehensive understanding, although the choice of algorithm in this solution does not significantly impact complexity or performance compared to other redundancy modes. 4.1. Capability Flag [RFC8584] defines a DF Election extended community, and a Bitmap (2 octets) field to encode "capabilities" to use with the DF election algorithm in the DF algorithm field: Bit 0: D bit or 'Don't Pre-empt' bit, as explained in [I-D.ietf-bess-evpn-pref-df]. Bit 1: AC-DF Capability (AC-Influenced DF election), as explained in [RFC8584]. 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |D|A| |P| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Brissette, et al. Expires 19 April 2025 [Page 6] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 Figure 2: Amended Bitmap field in the DF Election Extended Community This document defines the following value and extends the Bitmap field: Bit 5: Port Mode Designated Forwarder Election (referred to as the P bit hereafter). This bit determines that the DF Election algorithm SHOULD be modified to consider the port ES only and not the Ethernet Tags. 4.2. Modulo-based Algorithm The default DF Election algorithm, or modulo-based algorithm, as described in [RFC7432] and updated by [RFC8584], is applied here at the granularity of ES only. Given that the ES-Import Route Target extended community may be auto-derived and directly inherits its auto-derived value from ESI bytes 1-6, many operators differentiate ESIs primarily within these bytes. Consequently, bytes 3-6 are utilized to determine the designated forwarder using the modulo-based DF assignment, achieving good entropy during modulo calculation across ESIs. Assuming a redundancy group of N PE nodes, the PE with ordinal i is designated as the DF for an when (Es mod N) = i, where Es represents bytes 3-6 of that ESI. 4.3. Highest Random Weight Algorithm An application of Highest Random Weight (HRW) to EVPN DF Election is defined in [RFC8584] and MAY also be used and signaled. For Port- Active this is modified to operate at the granularity of rather than per . Section 3.2 of [RFC8584] describes computing a 32-bit CRC over the concatenation of Ethernet Tag (V) and ESI (Es). For Port-Active redundancy mode, the Ethernet Tag is omitted from the CRC computation and all references to (V, Es) are replaced by (Es). The algorithm to detemine the DF Elected and Backup-DF Elected (BDF) at Section 3.2 of [RFC8584] is repeated and summarized below using only (Es) in the computation: 1. DF(Es) = Si| Weight(Es, Si) >= Weight(Es, Sj), for all j. In the case of a tie, choose the PE whose IP address is numerically the least. Note that 0 <= i,j < number of PEs in the redundancy group. Brissette, et al. Expires 19 April 2025 [Page 7] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 2. BDF(Es) = Sk| Weight(Es, Si) >= Weight(Es, Sk), and Weight(Es, Sk) >= Weight(Es, Sj). In the case of a tie, choose the PE whose IP address is numerically the least. Where: * DF(Es) is defined to be the address Si (index i) for which Weight(Es, Si) is the highest; 0 <= i < N-1. * BDF(Es) is defined as that PE with address Sk for which the computed Weight is the next highest after the Weight of the DF. j is the running index from 0 to N-1; i and k are selected values. 4.4. Preference-based DF Election When the new capability 'Port Mode' is signaled, the preference-based DF Election algorithm in [I-D.ietf-bess-evpn-pref-df] is modified to consider the port only and not any associated Ethernet Tags. The Port Mode capability is compatible with the 'Don't Pre-empt' bit and both may be signaled. When an interface recovers, a peering PE signaling D bit enables non-revertive behavior at the port level. 4.5. AC-Influenced DF Election The AC-DF bit defined in [RFC8584] MUST be set to 0 when advertising Port Mode Designated Forwarder Election capability (P=1). When an AC (sub-interface) goes down, any resulting Ethernet A-D per EVI withdrawal does not influence the DF Election. Upon receiving the AC-DF bit set (A=1) from a remote PE, it MUST be ignored when performing Port Mode DF Election. 5. Convergence considerations To enhance convergence during failure and recovery when Port-Active redundancy mode is employed, advanced synchronization between peering PEs may be beneficial. The Port-Active mode poses a challenge since the "standby" port may be in a down state. Transitioning a "standby" port to an up state and stabilizing the network requires time. For Integrated Routing and Bridging (IRB) and Layer 3 services, synchronizing ARP / ND caches is recommended. Additionally, associated VRF tables may need to be synchronized. For Layer 2 services, synchronization of MAC tables may be considered. Moreover, for members of a LAG running LACP, the ability to set the "standby" port to an "out-of-sync" state, also known as "warm- standby," can be utilized to improve convergence times. Brissette, et al. Expires 19 April 2025 [Page 8] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 5.1. Primary / Backup per Ethernet-Segment The EVPN Layer 2 Attributes Extended Community ("L2-Attr") defined in [RFC8214] SHOULD be advertised in the Ethernet A-D per ES route to enable fast convergence. Only the P and B bits of the Control Flags field in the L2-Attr Extended Community are relevant to this document, specifically in the context of Ethernet A-D per ES routes: * When advertised, the L2-Attr Extended Community SHALL have only the P or B bits set in the Control Flags field, and all other bits and fields MUST be zero. * A remote PE receiving the optional L2-Attr Extended Community in Ethernet A-D per ES routes SHALL consider only the P and B bits and ignore other values. For L2-Attr Extended Community sent and received in Ethernet A-D per EVI routes used in [RFC8214], [RFC7432] and [I-D.ietf-bess-evpn-vpws-fxc]: * P and B bits received SHOULD be considered overridden by "parent" bits when advertised in the Ethernet A-D per ES. * Other fields and bits of the extended community are used according to the procedures outlined in the referenced documents. By adhering to these procedures, the network ensures proper handling of the L2-Attr Extended Community to maintain robust and efficient convergence across Ethernet Segments. 5.2. Backward Compatibility Implementations that comply with [RFC7432] or [RFC8214] only (i.e., implementations that predate this specification) will not advertise the EVPN Layer 2 Attributes Extended Community in Ethernet A-D per ES routes. That means that all remote PEs in the ES will not receive P and B bit per ES and will continue to receive and honour the P and B bits received in Ethernet A-D per EVI route(s). Similarly, an implementation that complies with [RFC7432] or [RFC8214] only and that receives an L2-Attr Extended Community in Ethernet A-D per ES routes will ignore it and continue to use the default path resolution algorithm: * The remote ESI Label Extended Community ([RFC7432]) signals Single-Active (Section 4) Brissette, et al. Expires 19 April 2025 [Page 9] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 * the remote MAC and/or Ethernet A-D per EVI routes are unchanged, and since the L2-Attr Extended Community in Ethernet A-D per ES route is ignored, the P and B bits in the L2-Attr Extended Community in Ethernet A-D per EVI routes are used. 6. Applicability A prevalent deployment scenario involves providing L2 or L3 services on PE devices that offer multi-homing capabilities. The services may include any L2 EVPN solutions such as EVPN VPWS or standard EVPN as defined in [RFC7432]. Additionally, L3 services may be provided within a VPN context, as specified in [RFC4364], or within a global routing context. When a PE provides first-hop routing, EVPN IRB may also be deployed on the PEs. The mechanism outlined in this document applies to PEs providing L2 and/or L3 services where active/standby redundancy at the interface level is required. An alternative solution to the one described in this document is Multi-Chassis Link Aggregation Group (MC-LAG) with ICCP active- standby redundancy, as detailed in [RFC7275]. However, ICCP requires LDP to be enabled as a transport for ICCP messages. There are numerous scenarios where LDP is not necessary, such as deployments utilizing VXLAN or SRv6. The solution described in this document using EVPN does not mandate the use of LDP or ICCP and remains independent of the underlay encapsulation. 7. IANA Considerations This document solicits the allocation of the following values from the "BGP Extended Communities" registry group : * Bit 5 in the [RFC8584] DF Election Capabilities registry, "P bit - Port Mode Designated Forwarder Election". 8. Security Considerations The Security Considerations described in [RFC7432] and [RFC8584] are applicable to this document. Brissette, et al. Expires 19 April 2025 [Page 10] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 Introducing a new capability necessitates unanimity among PEs. Without consensus on the new DF Election procedures and Port Mode, the DF Election algorithm defaults to the procedures outlined in [RFC8584] and [RFC7432].This fallback behavior could be exploited by an attacker who modifies the configuration of one PE within the Ethernet Segment (ES). Such manipulation could force all PEs in the ES to revert to the default DF Election algorithm and capabilities. In this scenario, the PEs may be subject to unfair load balancing, service disruption, and potential issues such as black-holing or duplicate traffic, as mentioned in the security sections of those documents. 9. Acknowledgements The authors thank Anoop Ghanwani for his comments and suggestions and Stephane Litkowski for his careful review. 10. Contributors In addition to the authors listed on the front page, the following coauthors have also contributed to this document: Ali Sajassi Cisco Systems United States of America Email: sajassi@cisco.com Samir Thoria Cisco Systems United States of America Email: sthoria@cisco.com 11. References 11.1. Normative References [I-D.ietf-bess-evpn-pref-df] Rabadan, J., Sathappan, S., Lin, W., Drake, J., and A. Sajassi, "Preference-based EVPN DF Election", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-pref-df-13, 9 October 2023, . Brissette, et al. Expires 19 April 2025 [Page 11] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 [IEEE_802.1AX_2014] IEEE, "IEEE Standard for Local and metropolitan area networks -- Link Aggregation", IEEE 802-1ax-2014, DOI 10.1109/IEEESTD.2014.7055197, 5 March 2015, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. Rabadan, "Virtual Private Wire Service Support in Ethernet VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, . [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet VPN Designated Forwarder Election Extensibility", RFC 8584, DOI 10.17487/RFC8584, April 2019, . 11.2. Informative References [I-D.ietf-bess-evpn-vpws-fxc] Sajassi, A., Brissette, P., Uttaro, J., Drake, J., Boutros, S., and J. Rabadan, "EVPN VPWS Flexible Cross- Connect Service", Work in Progress, Internet-Draft, draft- ietf-bess-evpn-vpws-fxc-08, 24 October 2022, . [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, . [RFC7275] Martini, L., Salam, S., Sajassi, A., Bocci, M., Matsushima, S., and T. Nadeau, "Inter-Chassis Communication Protocol for Layer 2 Virtual Private Network Brissette, et al. Expires 19 April 2025 [Page 12] Internet-Draft EVPN Port-Active Redundancy Mode October 2024 (L2VPN) Provider Edge (PE) Redundancy", RFC 7275, DOI 10.17487/RFC7275, June 2014, . Authors' Addresses Patrice Brissette Cisco Systems Ottawa ON Canada Email: pbrisset@cisco.com Luc Andre Burdet (editor) Cisco Systems Canada Email: lburdet@cisco.com Bin Wen Comcast United States of America Email: Bin_Wen@comcast.com Edward Leyton Verizon Wireless United States of America Email: edward.leyton@verizonwireless.com Jorge Rabadan Nokia United States of America Email: jorge.rabadan@nokia.com Brissette, et al. Expires 19 April 2025 [Page 13]