Network Working Group S. Hegde Internet-Draft A. Przygienda Intended status: Informational Juniper Expires: 27 January 2025 A. Lindem LabN Consulting LLC 26 July 2024 Improved OSPF Database Exchange Procedure draft-hegde-lsr-ospf-better-idbx-04 Abstract When an OSPF router undergoes restart, previous instances of LSAs belonging to that router may remain in the databases of other routers in the OSPF domain until such LSAs are aged out. Hence, when the restarting router joins the network again, neighboring routers re- establish adjacencies while the restarting router is still bringing- up its interfaces and adjacencies and generates LSAs with sequence numbers that may be lower than the stale LSAs. Such stale LSAs may be interpreted as bi-directional connectivity before the initial database exchanges are finished and genuine bi-directional LSA connectivity exists. Such incorrect interpretation may lead to, among other things, transient traffic packet drops. This document suggests improvements in the OSPF database exchange process to prevent such problems due to stale LSA utilization. The solution does not preclude changes in the existing standard but presents an extension that will prevent this scenario. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Hegde, et al. Expires 27 January 2025 [Page 1] Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 27 January 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. OSPF Specification Additions and Modifications . . . . . 4 2.2. Example . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Potential Optimizations . . . . . . . . . . . . . . . . . . . 6 3.1. Restarting Neighbor Tracking . . . . . . . . . . . . . . 6 3.2. Limited LSA Tracking . . . . . . . . . . . . . . . . . . 6 4. Restarting Router Data Plane Convergence . . . . . . . . . . 7 5. Management Considerations . . . . . . . . . . . . . . . . . . 7 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 8 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 9.1. Informative References . . . . . . . . . . . . . . . . . 8 9.2. Normative References . . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction When an OSPF [RFC2328] router restarts, its stale LSAs are left in the database of other routers in the OSPF domain until the LSAs are aged out either intentionally or by the LSA age elapsing. The stale Router LSA can contain links to all the neighbors that had Full adjacencies before the router restarted. Hegde, et al. Expires 27 January 2025 [Page 2] Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024 ----------C-------- | | A-----B E-----F | | --------D--------- Figure 1: OSPF Network Figure 1 shows a very simple OSPF network. In case of C undergoing restart that does not generate purges, the other routers in the domain will hold the stale LSA of Router C in their database. The stale LSA may have links to B and E, which represents the topology of C before it went down. When C restarts again, it initiates the database exchange process with B and E. B and E may have C's stale LSA with a higher sequence number in their database than the new ones originated by C and hence erroneously assume those being the newest copy, successively bringing up the adjacency with C, and transitioning to Full state. Based on C's Stale LSA having LSA links to B and E, the Shortest Path First (SPF) back-link check is satisfied and B and E update their routing table to point to C. This may cause C to drop this traffic as C may not have all its previous adjacencies up and all LSAs in place to correctly compute the necessary routes. The situation corrects itself with C reissuing the LSAs with even higher sequence numbers over time so the condition is transient today already. 2. Solution To prevent the transient condition described the database exchange procedure from [RFC2328] section 7.2 is extended with additional constraints to prevent an OSPF router from transitioning to Full state when it has stale LSAs originated by the database exchange neighbor in its Link State Database (LSDB). During an IDBX the router determined it should become adjacent with another OSPF router and for that purpose initially it enters the ExStart state and creates a "Database summary list" for the neighbor. In addition to this list a "Stale database exchange list" is created for such neighbor and is initially populated with LSAs in LSDB which were originated by it. The neighbor will not transition to Full state until both the Link state request list and the Stale database exchange list are empty. During the Database Exchange process entries are removed from the Stale database exchange list when v the same or a more recent LSAs is referenced in a Database Description packet or the neighbor originates a more recent instance of the LSA, and it is received in a Link State Update packet. For stale LSAs in Hegde, et al. Expires 27 January 2025 [Page 3] Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024 the router's Link State Database, the neighbor will request stale LSAs and originate more recent instances via normal OSPF procedures as illustrated in Section 2.2. 2.1. OSPF Specification Additions and Modifications The following additions and modifications to OSPF [RFC2328] are needed for this feature. The application of the feature SHOULD be based on local configuration (refer to Section 5). 1. In section 10.3 "The Neighbor State Machine", State: ExStart, Event: NegotiationDone - The "Stale database exchange list" is created including LSAs with the "Advertising Router" set to the neighbor's Router ID. 2. In section 10.3 "The Neighbor State Machine", State: Exchange, Event: ExchangeDone - The new neighbor state will not be Full unless both the Link state request list and the Stale database exchange list are empty. 3. In section 10.3 "The Neighbor State Machine", State: Exchange or greater, Event: AdjOk? - If an adjacency will not be formed, the Stale database exchange list is also cleaned up. 4. In section 10.3 "The Neighbor State Machine", State: Any state, Event: KillNbr - The Stale database exchange list is also cleaned up. 5. In section 10.3 "The Neighbor State Machine", State: Any state, Event: LLDown - The Stale database exchange list is also cleaned up. 6. In section 10.3 "The Neighbor State Machine", State: Any state, Event: InactivityTimer - The Stale database exchange list is is also cleaned up. 7. In section 10.3 "The Neighbor State Machine", State: 2Way or greater, Event: 1-WayReceived - The Stale database exchange list is is also cleaned up. 8. In section 10.6 "Receiving Database Description Packets", If the Database Description packet is accepted, LSAs originated by the neighbor that are not less recent than the local database copy will be removed from the Stale database exchange list. LSAs that are less recent than the local database copy will remain on th State database exchange list and will be updated by the neighbor through the Loading phase of the Database exchange process. Hegde, et al. Expires 27 January 2025 [Page 4] Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024 9. In section 13, "The Flooding Procedure", (5), (g): If the orignator of the more recent LSA is a neighbor that is in Exchange or Loading state, remove the LSA from the Stale database exchange list. If the neighbor is in Loading state and both the List state request list and the Stale database exchange list are empty, the Loading Done event is generated. In most cases, the LSA will have been received from the neighbor in Exchange or Loading state. However, this cannot be guaranteed due to packet loss and different propagation delays. Additionally, the neighbor may be in Exchange and Loading state on more than one interface. 2.2. Example Figure 2 provides an example of C restarting having originated an LSA with sequence number Y before. After restarting C originates the same LSA with sequence number X where X < Y since it is not aware of existence of version X yet. C-----------------E Create Stale DB Exchange List DBD: LSA A origin:C -------> Sequence:X DBD: LSA A Origin:C <------ Sequence:Y LSReq LSA A, origin C -----> Modified Procedure <------- LSA A, Origin C, Seq:Y C re-originating self LSA LSA A, Origin C, -------> Remove Stale Seq:Y+1 LSA A, Origin C From Stale DB Exchange list. Bring adjacency to Full state if both LS Request list and Stale DB Exchange list are empty. Figure 2: Modified Database Exchange Procedure Hegde, et al. Expires 27 January 2025 [Page 5] Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024 As shown in figure Figure 2 above, E originates LSReq with Sequence number X but waits until the LSA with sequence number Y+1 (or strictly speaking, an LSA that compares as newer to the one it holds) arrives. As the LSA is still in the Stale DB Exchange List, the adjacency will remain in Loading state and will not move to Full state. All the neighbors of the restarting routers hold the neighbor FSM in Loading state and do not transition to Full state until the stale LSA is replaced with the new LSA that is more recent than the stale LSA. This ensures that other routers in the network do not compute a path through the restarting router since they cannot satisfy the bi-directionality condition in SPF computations. 3. Potential Optimizations The solution described in Section 2 assures stale LSAs for a restarting router are either updated or purged before neighbors of the restarting router advertise a link to the restarting router. This section describes optimizations that are being considered. 3.1. Restarting Neighbor Tracking 3.2. Limited LSA Tracking Rather than adding all the restarting router's LSAs in the Link-State Database (LSDB) to the Stale DB Exchange list, only the following LSAs would be added: * Router-LSA containing one or more links to the local router * Network-LSAs containing a link to the local router The procedures in Section 2 will guarantee that the LSAs on the Stale DB Exchange list are updated before the local router advertises a link to the restarting router. Given that the these stale LSAs are the only ones containing a link to the local router, their update will prevent usage of the link until the restarting router has established an adjacency with the local router. As part of the restarting router establishing an adjacency and advertising a link to the local router, the restarting router will synchronize its LSDB with the local router and will update or purge stale LSAs previously originated by the restarting router. These stale LSAs will be requested by the restarting router using Link State Request packets (section 10.9 in [RFC2328]) and, when received, updated or purgeed using the procedure specified in section 13.4 [RFC2328]. The restarting router's LSAs containing a link to the local router are added to the Stale DB Exchange list and will be updated or purged prior to the local router advertising a link to the restarting Hegde, et al. Expires 27 January 2025 [Page 6] Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024 router. Additionally, the restarting router will not advertise a link to the local router until it has synchronized its LSDB with the local router. Hence, the restarting router's other stale LSAs will be updated or purged prior both routers advertising a link and these other stale LSAs need not be added to the Stale DB Exchange list. 4. Restarting Router Data Plane Convergence The procedures in Section 2 don't attempt to solve the problem of the restarting router's adjacency being used prior to the data plane being updated with the routes associated with the adjacency. This problem can occur on platforms with a non-neglible delay being the control plane Routing Information Base (RIB) being updated and the platform's data plane Forwarding Information Bases (FIBs) being updated. However, this delay is local to the restarting router and can be avoided by delaying the advertisements of adjacecies (i.e., links in the restarting router's router-LSA or network-LSAs) can be handled locally by delaying these advertisements until the data plane has been updated. Note that the restarting router's SPF calculation will need to include these links in order to compute the routes using the adjacency (even though link advertisement is delayed). How this is accomplished is an implementation detail that is beyond the scope of this document. As long as the restarting router's stale LSAs have been updated or purged as described in Section 2, the scope of delaying usage of the adjacency before the restarting router's data plane has converged is completely under the local control. 5. Management Considerations Application the Database exchange procedure specified in this documennt SHOULD be based on local configuration with the default behavior not to perform the addition Stale database exchange list processed specified in this document. Not all OSPF networks require elimination of traffic drops during the Database exchange process (as evidenced by the fact that OSPF has been deployed for several decades without this added packet loss mitigation). Additionally, optional configuration will encourage implementation and deployment. 6. IANA Considerations No IANA Considerations Hegde, et al. Expires 27 January 2025 [Page 7] Internet-Draft draft-hegde-lsr-ospf-better-idbx July 2024 7. Security Considerations 8. Contributors 9. References 9.1. Informative References 9.2. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, April 1998, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . Authors' Addresses Shraddha Hegde Juniper India Email: shraddha@juniper.net Tony Przygienda Juniper 1133 Innovation Way Sunnyvale, CA United States of America Email: prz@juniper.net Acee Lindem LabN Consulting LLC 301 Midenhall Way Cary, North Carolina United States of America Email: acee.ietf@gmail.com Hegde, et al. Expires 27 January 2025 [Page 8]