Pseudowire Edge-to-Edge Emulation S. Shelvapille INTERNET-DRAFT V. Puri Intended status: Standards Track Bay Microsystems, Inc. Expires: September 7, 2009 March 6, 2009 Encapsulation Methods for Transport of InfiniBand over MPLS Networks draft-puri-pwe3-ib-encap-01.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 7, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified Shelvapille et al. Expires March 12, 2009 [Page 1] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Abstract An InfiniBand(IB) pseudowire (PW) is used to carry InfiniBand frames over an MPLS network. This enables service providers to offer "emulated" InfiniBand services over existing MPLS networks. This document specifies the encapsulation of InfiniBand PDUs within a pseudowire. It also specifies how islands of IB fabrics can be connected via PWs to form a single IB subnet. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Specification of Requirements . . . . . . . . . . . . . . . . 3 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Acronyms and Abbreviations . . . . . . . . . . . . . . . . . 5 5. Reference Model . . . . . . . . . . . . . . . . . . . . . . . 5 6. Applicability Statement. . . . . . . . . . . . . . . . . . . . 7 7. IB over MPLS PSN . . . . . . . . . . . . . . . . . . . . . . . 7 7.1. Packet Format over MPLS PSNs . . . . . . . . . . . . . . 7 7.2. The Control Word . . . . . . .. . . . . . . . . . . . . . 8 7.3. PW Packet Processing . . . . .. . . . . . . . . . . . . . 9 7.3.1 Encapsulation of IB Frames.. . . . . . . . . . . . . . 9 7.3.2 MTU Requirements . . . . . . . . . . . . . . . . . . . 10 7.3.3 Decapsulation of PW Packets. . . . . . . . . . . . . . 10 7.3.3.1 Processing the Sequence Number . . . . . . . . . . 10 7.3.3.2 Processing of the Length Field by the Receiver . . 10 7.4. QoS Considerations . . . . . .. . . . . . . . . . . . . . 11 8. Signaling of IB Pseudowires . . . . . . . . . . . . . . . . 11 8.1. Control Plane Details for IB . . . . . . . . . . . . . . 11 8.2. Fault Management . . . . . . . . . . . . . . . . . . . . 12 9. Congestion Control . . . . . . . . . . . . . . . . . . . . . 12 10. Rate Management . . . . . . . . . . . . . . . . . . . . . . 12 11. MIB Support . . . . . . . . . . . . . . . . . . . . . . . . 13 12. Security . . . . . . . . . . . . . . . . . . . . . . . . . 13 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . 13 14. Normative References . . . . . . . . . . . . . . . . . . . 13 15. Informative References . . . . . . . . . . . . . . . . . . 14 16. Author Information . . . . . . . . . . . . . . . . . . . . . 15 Appendix A. Interoperability Guidelines . . . . . . . . . . . . 15 A.1. IB Specific Parameters Configuration Guidelines . . . . . 15 A.2. QoS Considerations . . . . . . . . . . . . . . . . . . . . 15 A.3. Pseudo-Port State Transitions . . . . . . . . . . . . . . 16 Shelvapille et al. Expires March 12, 2009 [Page 2] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 A.4. IB Subnet Initialization . . . . . . . . . . . . . . . . 16 1. Introduction During recent years applications such as Storage Area Networks (SAN) extension, virtualization, Disaster Recovery, and Distributed/Cloud Computing have become a prominent business opportunity for network service providers. Since InfiniBand is gaining market acceptance for supporting the above applications, interconnecting islands of IB fabrics over the WAN is becoming increasingly important. This document provides a method for transporting IB frames over an MPLS-based transport network. It defines the encapsulation of IB Protocol Data Units (PDUs) into an MPLS pseudowire, as well as procedures for using PW encapsulation to enable IB services such as SAN extension and virtualization over a Packet Switched Network (PSN). In addressing the issues associated with carrying an IB PDU over a PSN, this document assumes that a PW has been set up using a control protocol such as the one as described in [PWE3-CTRL]. The design of IB pseudowire described in this document conforms to the pseudowire architecture described in [RFC3985]. The following figure describes the reference model that is derived from [RFC3985] to support the IB PW emulated services. Shelvapille et al. Expires March 12, 2009 [Page 3] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 PW PW End Service End Service | | |<------- Pseudowire ------->| | | | |<-- PSN Tunnel -->| | V V V V +-----+----+ +----+-----+ +-----+ | | PE1|==================| PE2| | +-----+ | | | | | | | | | | | CE1 |----| .............PW1.............| |-----| CE2 | | | ^ | | | | | | ^ | | +-----+ | | | |==================| | | | +-----+ | +-----+----+ +----+-----+ | | ^ ^ | | | | | | |<----- Emulated Service --->| | | | | | | IB Pseudo-port IB Pseudo-port | | | | | CE1 native CE2 native IB service IB service Figure 1. PWE3 InfiniBand Reference Configuration The "emulated service" shown in Figure 1 shows the case in which there are two CEs on the "emulated IB fabric". Hence, we refer to this service as "emulated point-to-point IB service". Specification of the procedures for using pseudowires to emulate IB with more than two CEs are beyond the scope of the current document. This document describes a "port-mode" mapping between InfiniBand and pseudowires. In this mode each PW acts as an IB-link. IB subnet partitioning via partition-keys (similar to VLAN) is transparent to the PW. The following sections describe in detail the procedures for transporting IB frames over an MPLS-based PSN. 2. Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Shelvapille et al. Expires March 12, 2009 [Page 4] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 3. Terminology Below are the definitions for the terms used throughout the document. PWE3 definitions can be found in [RFC3916, RFC3985]. This section defines terms specific to InfiniBand. Invariant CRC A CRC covering the fields in a packet that do not change from the source to the destination. Local Identifier An address assigned to a port by the Subnet Manager, unique within the subnet, used for directing packets within the subnet. The Source and Destination LIDs are present in the Local Route Header. Local Route Header Routing header present in all InfiniBand Architecture packets; used for Local Route Header routing through switches within a subnet. Router A device that transports packets between IB subnets. Service Level Value in the Local Route Header identifying the appropriate Virtual Lane for a packet enabling the implementation of differentiated services. Subnet A set of InfiniBand Architecture Ports and associated links that have a Subnet common Subnet ID and are managed by a common Subnet Manager. Subnet Manager One of several entities involved in the configuration and control of the InfiniBand subnet. Switch A device that forwards packets from one link to another of the same Subnet, using the Destination Local Identifier field in the Local Route Header. Variant CRC A CRC covering all the fields of a packet, including those that may be changed by Switches. Shelvapille et al. Expires March 12, 2009 [Page 5] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 4. Acronyms and Abbreviations AC Attachment Circuit BECN Backward Explicit Congestion Notification BTH Base Transport Header CE Customer Edge CRC Cyclic Redundancy Check EF Expedited Forwarding EXP Experimental use bits field FCS Frame Check Sequence FECN Forward Explicit Congestion Notification GS Guaranteed Service IB InfiniBand LDP Label Distribution Protocol LID Local Identifier LRH Local Route Header LSP Label Switched Path LSR Label Switching Router MPLS Multiprotocol Label Switching MTU Maximum Transfer Unit NSP Native Service Processing PE Provider Edge PSN Packet Switched Network PW Pseudowire PWE3 Pseudowire Emulation Edge to Edge POS Packet over SONET/SDH PVC Permanent Virtual Circuit QoS Quality of Service SL Service Level SM Subnet Manager SMP Subnet Management Packets VC Virtual Circuit VL Virtual Lane 5. Reference Model This document assumes that the PEs in Figure 1 function as IB switches with a minimum of two IB ports that are used to inter-connect CEs to form a single IB subnet. Figure 2 provides a general reference model for a PE: Shelvapille et al. Expires March 12, 2009 [Page 6] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 Multiple +--------------------------------------------------+ AC's | PE | +-+ +-----+ +------+ +------+ +------+ +-+ |P| | | | | |PW ter| | PSN | |P| <==>|h|<=>| NSP |<=>| |<=>|minati|<=>|Tunnel|<=>|h|<==> PSN |y| | | | | |on | | | |y| +-+ +-----+ | | +------+ +------+ +-+ | | FWD | | +-+ +-----+ | | +------+ +------+ +-+ |P| | | | | |PW ter| | PSN | |P| <==>|h|<=>| NSP |<=>| |<=>|minati|<=>|Tunnel|<=>|h|<==> PSN |y| | | | | |on | | | |y| +-+ +-----+ +------+ +------+ +------+ +-+ | | +--------------------------------------------------+ ^ ^ ^ | | | A B C Figure 2. IB PW Reference Diagram The PW terminates at a logical port within the PE, defined at point B in Figure 2. This port delivers each IB frame that is received at point A, unaltered, to point A in the corresponding PE at the other end of the PW. The Forwarder (FWD) permits the PE to classify IB packets received on one or more ACs based on the incoming AC, the contents of the payload, or some statically and/or dynamically configured information and forwards them to the appropriate PW. The Forwarder performs the inverse function on PWE3-PDUs received by the PE from the PSN. The Native Service Processing (NSP) function includes native IB traffic processing that is required for the proper operation of the IB link, or for the IB frames that are forwarded to the PW termination points. The points to the left of A, including the physical layer between the CE and PE, and any adaptation (NSP) functions between it and the PW terminations, are outside of the scope of PWE3 and are not defined here. "PW Termination", between A and B, represents the operations for setting up and maintaining the PW, and for encapsulating and decapsulating the IB frames as necessary to transmit them across the MPLS network. In addition, this module advertises the PW logical port (point B) as an IB port to the Subnet Manager. Shelvapille et al. Expires March 12, 2009 [Page 7] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 This logical port is referred to as the IB Pseudo-port within this document. Configuration of this port occurs via Subnet Management Packets as would be the case for any physical IB port. Additional details about the initialization of the Pseudo-port is provided in Section A.4 in the Appendix. 6. Applicability Statement The IB over PW service is not intended to perfectly emulate a traditional IB link, but it can be used for applications that need InfiniBand transport service. This document describes a "port-mode" mapping between InfiniBand and Pseudowires. In this mode each PW acts as an IB-link. In addition, it is assumed that the PEs in Figure 1 function as IB switches that are used to inter-connect CEs to form a single IB subnet. Support for multiple IB subnets is for further study. The IB native service characteristics and their mapping to PWs is explored further in the following sections. 7. IB over MPLS PSN 7.1. Packet Format over MPLS PSN The general IB pseudowire packet format for carrying InfiniBand information (user's payload and IB control information) between two PEs is shown in Figure 3. +-------------------------------+ | MPLS Tunnel Label(s) | n*4 octets (four octets per label) +-------------------------------+ | PW Label | 4 octets +-------------------------------+ | Control Word | | (See Figure 4) | 4 octets +-------------------------------+ | Payload | | (IB Service Payload) | n octets | | +-------------------------------+ Figure 3. General Format of IB Encapsulation over PSN The meaning of the different fields is as follows: Shelvapille et al. Expires March 12, 2009 [Page 8] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 MPLS Tunnel Label(s) The MPLS Tunnel label(s) is/are used by MPLS LSRs to forward a PW packet from one PE to the other. PW Label The PW label identifies one PW (i.e., one LSP) assigned to an IB port in one direction. Together the MPLS Tunnel label(s) and PW label form an MPLS label stack [RFC3032]. Control Word The Control Word contains protocol control information. Its structure is shown in Figure 4. Payload The payload field corresponds to the IB service payload. The maximum length of the payload field MUST be agreed upon by the two PEs. This can be achieved by using the MTU interface parameter when the PW is established [PWE3-CTRL]. 7.2. The Control Word The control word in Figure 4 is REQUIRED for IB port-mode. Its structure conforms to the "Preferred PW Control Word" defined in [RFC4385]. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0| Flags |FRG| Length | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4. Preferred PW MPLS Control Word The meaning of the Control Word fields (Figure 4) is as follows: Bits 0 to 3: In Figure 4, the first 4 bits MUST be set to 0 to indicate PW data. Flags (bits 4 to 7): These bits are currently undefined for IB PWs and should Shelvapille et al. Expires March 12, 2009 [Page 9] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 be initialized to 0. FRG (bits 8 and 9): These bits are defined by [RFC4623]. However, fragmentation SHALL NOT be used for IB PWs. Length (bits 10 to 15): If the PW traverses a network link that requires a minimum frame size (a notable example is Ethernet), padding is required to reach its minimum frame size. If the frame's length (defined as the length of the layer 2 payload plus the length of the control word) is fewer than 64 octets, the length field MUST be set to the PW payload length. Otherwise, the length field MUST be set to zero. The value of the length field, if non-zero, is used to remove the padding characters by the egress PE. Sequence number (bits 16 to 31): Sequence numbers provide one possible mechanism to ensure the ordered delivery of PW packets. Sequence number field processing is OPTIONAL. The sequence number space is a 16-bit unsigned circular space. The sequence number value 0 indicates that the sequence number check algorithm is not used. 7.3. PW Packet Processing 7.3.1 Encapsulation of IB frames The encapsulation process of an IB frame is initiated when a PE receives an IB frame from an InfiniBand interface. All traffic types - management, data and traffic on all Virtual Lanes - are carried over a single PW. Each IB frame is mapped to a PW PDU as shown in Figure 3. The PE performs the following actions on the received IB frame: - Strips off the VCRC. - Prepends a Control Word to the resulting frame. If the PW packet length (defined as the length of the payload plus the length of the control word) is fewer than 64 octets, the length field MUST be set to the packet's length. Otherwise, the length field MUST be set to zero. The sequence number field is processed if the PW uses sequence Shelvapille et al. Expires March 12, 2009 [Page 10] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 numbers. [RFC4385] - Prepends a PW label to the resulting packet. - Prepends the proper tunnel encapsulation to the packet. - Transmits the packet. 7.3.2. MTU Requirements IB supports the following discrete MTU sizes: 256, 512, 1024, 2058 and 4096 bytes. Hence, the PW MTU must be translated into one of those discrete MTU sizes (IB MTU < PW MTU). Fragmentation, described in [RFC4623], SHALL NOT be used for IB PWs. Therefore, the MTU on the Pseudo-port MUST be configured to the largest IB packet (including the overhead of the tunneling protocol) that can be transported on the PSN without fragmentation. 7.3.3 Decapsulation of PW Packets When a PE receives a PW packet, it decapsulates the IB frame for transmission to a CE on the associated IB interface. The PE performs the following actions: - Processes the length and sequence field (details are in the following sub-sections). - Copies the IB payload from the contents of the PW packet after removing any padding. - If significant congestion is detected while receiving the PW PDUs, the FECN bits SHOULD be set in the IB frames sent out the attachment circuit. Changing the state of this bit by a PE is OPTIONAL. - Generates the VCRC on the resulting IB packet. Once the above steps are completed, the IB frame is queued for transmission on the selected IB interface. 7.3.3.1. Processing the Sequence Number If a router PE2 supports received sequence number processing, then the procedures in [RFC4385], Section 4.2, MUST be used. 7.3.3.2. Processing of the Length Field by the Receiver Shelvapille et al. Expires March 12, 2009 [Page 11] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 Any padding octet, if present, in the payload field of a PW packet received MUST be removed before forwarding the data. - If the Length field is set to zero, then there are no padding octets following the payload field. - Otherwise, if the payload is longer, then the length specified in the control word padding characters are removed according to the length field. 7.4 QoS Considerations InfiniBand is designed to be a minimum loss/low delay network. Whenever possible, IB PWs should be run over traffic-engineered PSNs providing bandwidth allocation and admission control mechanisms. Such PSNs will minimize loss and delay. QoS is an OPTIONAL feature within IB. If implemented, QoS required for an IB PW SHOULD be represented in the Experimental Use Bits (EXP) field of the PW MPLS label [RFC3032]. If more than one MPLS label is imposed by the ingress LSR, the EXP field of any labels higher in the stack SHOULD also indicate the same traffic class. The ingress PE MAY lookup the Service Level (SL) field within the LRH of the IB packet, and consult the SL-VL mapping table that the SM has programmed on the PE. The ingress MAY then translate this into a value to be placed in the EXP field(s). QOS considerations are further explored in Section A.2 of the Appendix. 8. Signaling of IB Pseudowires [PWE3-CTRL] specifies the use of the MPLS Label Distribution Protocol (LDP) as a protocol for setting up and maintaining pseudo wires. This section describes the use of specific fields and error codes used to control IB pseudowires. 8.1 Control Plane Details for IB The PW Type field in the PWid FEC element and PW generalized ID FEC elements MUST be set to "IB Port Mode". The Control Word is REQUIRED for IB pseudowires. Therefore, the C-Bit in the PWid FEC element and PW generalized ID FEC elements MUST be set. If the C-Bit is not set, the pseudowire MUST NOT be established and a Label Release MUST be sent with an Illegal C-Bit status code [PWE3-CTRL]. Shelvapille et al. Expires March 12, 2009 [Page 12] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 8.2. Fault Management If the PE detects an IB interface failure, or the interface is administratively disabled, the PE MUST notify the remote PE. This can be done either by withdrawing the pseudowire or by sending a PW status TLV notification. Failure of an IB PW is detected when a label withdraw event is received for a specific PW ID, or the targeted Label Distribution Protocol session fails, or a PW status TLV notification is received, or the loss of a connection is detected by using VCCV. Under these conditions, the PE MUST map the failure to an appropriate IB failure indication. This mapping function is performed by the NSP (as defined by [IB-SPEC]), and is beyond the scope of this document. 9. Congestion Control As explained in [RFC3985], the PSN carrying the PW may be subject to congestion with congestion characteristics depending on PSN type, network architecture, configuration, and loading. During congestion, the PSN may exhibit packet loss that will impact the service carried by the IB Pseudowire. Whenever possible, IB Pseudowires SHOULD be run over traffic-engineered PSNs providing bandwidth allocation and admission control mechanisms. IntServ-enabled domains providing the Guaranteed Service (GS) or DiffServ-enabled domains using expedited forwarding (EF) are examples of traffic-engineered PSNs. Such PSNs will minimize loss and delay while providing some degree of isolation of IB Pseudowire's effects from neighboring streams. Congestion control is an optional feature in IB, and requires an extensive system of congestion control managers, congestion aware Channel adapters and switches to work end-to-end as outlined in the [IB-SPEC]. IB defines FECN and BECN bits for congestion noti- fication within the Base Transport Header (BTH) of data, ACKs, and Congestion Notification packets. The PEs SHOULD monitor for congestion (by measuring packet loss) to ensure that the service using the IB Pseudowire may be maintained. When a PE detects significant congestion while receiving the PW PDUs, the FECN bits SHOULD be set in the IB frames sent out the Attachment Circuit. 10. Rate Management IB defines [IB-SPEC] a set of discrete rates that are multiples of Shelvapille et al. Expires March 12, 2009 [Page 13] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 2.5 Gbps. In IB, the port rate is represented by the combination of Link Speed ("LinkSpeedEnabled" attribute) and Link Width ("LinkWidthEnabled" attribute). The rate of the physical medium on which the PW is riding MUST be translated into appropriate values for the above port information fields. If the PW rate is less than 2.5 Gbps, then, "LinkWidthEnabled" MUST be set to 1x and "LinkSpeedEnabled" MUST be set to 2.5. For all other values, a proper multiple of 2.5 MUST be chosen. 11. MIB Support The IB PW management model follows the general PW management model defined in [RFC3985] and [PWE3-MIB]. Many common PW manage- ment facilities are provided here with no additional IB specifics necessary. IB-specific parameters are defined in the [IB-SPEC] as part of the Generic Services Interface. The PE shall implement the Mandatory counters for the Pseudo-port. 12. Security PWE3 provides no means of protecting the contents or delivery of the PW packets on behalf of the native service. PWE3 MAY, however, leverage security mechanisms provided by the MPLS Tunnel Layer. A more detailed discussion of PW security is given in [RFC3985, PWE3-CTRL, RFC3916]. 13. IANA Considerations A new PW type named "IB Port Mode", is requested from IANA. The next available value is requested. 14. Normative References [IB-SPEC] InfiniBand Architecture, Generic Specification vol1. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [PWE3-CTRL] Martini, L., El-Aawar, N., Heron, G., Rosen, E., Tappan, D., and T. Smith, "Pseudowire Setup and Maintenance using the Label Distribution Protocol (LDP)", RFC 4447, April 2006. [MPLS-ARCH] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, January 2001. Shelvapille et al. Expires March 12, 2009 [Page 14] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for Use over an MPLS PSN", RFC 4385, February 2006. [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack Encoding", RFC 3032, January 2001. [RFC4446] Martini, L., "IANA Allocations for Pseudowire Edge to Edge Emulation (PWE3)", BCP 116, RFC 4446, April 2006. [RFC4623] Malis, A. and M. Townsley, "Pseudowire Emulation Edge- to-Edge (PWE3) Fragmentation and Reassembly", RFC 4623, August 2006. 15. Informative References [RFC3916] Xiao, X., et al, "Requirements for Pseudo Wire Emulation Edge-to-Edge (PWE3)", RFC 3916, September 2004. [RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to- Edge (PWE3) Architecture", RFC 3985, March 2005. [FCS] Malis, A., Allan, D., and N. Del Regno, "PWE3 Frame Check Sequence Retention", Work in Progress, September 2005. [VCCV] Nadeau, T., Ed. and R. Aggarwal, Ed., "Pseudo Wire Virtual Circuit Connectivity Verification (VCCV)", Work in Progress, August 2005. [RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, "Encapsulation Methods for Transport of Ethernet over MPLS Networks", RFC 4448, April 2006. [PWE3-MIB] Zelig, D., Ed. and T. Nadeau, Ed., "Pseudo Wire (PW) Management Information Base", Work in Progress, February 2006. [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi-Protocol Label Switching (MPLS) Support of Differentiated Services", RFC 3270, May 2002. Shelvapille et al. Expires March 12, 2009 [Page 15] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 16. Author Information Suresh Shelvapille Bay Microsystems, Inc. 20251 Century Boulevard, Suite 250 Germantown, MD 20874 e-mail: suri@baymicrosystems.com Vikas Puri Bay Microsystems, Inc. 20251 Century Boulevard, Suite 250 Germantown, MD 20874 e-mail: vikas@baymicrosystems.com Appendix A. Interoperability Guidelines A.1 IB Specific Parameters Configuration Guidelines IB NodeInfo, SwitchInfo, and most of the other configurable tables are not affected by this implementation. The only table that is affected is the PortInfo. Within this table the following parameters need consideration for the PW (effectively the Pseudo-port). LinkWidthEnabled LinkWidthSupported LinkWidthActive LinkSpeedSupported LinkSpeedEnabled LinkSpeedActive PortState PortPhysicalState LinkDownDefaultState MTUCap A.2. QoS Considerations QoS is an optional feature within IB and, like congestion control, requires an extensive system of QoS Managers and QoS aware switches to work effectively. IB uses Diffserv codepoints as the basis for identifying the service level of flows. Hence, when implementing QoS, mapping of Diffserv traffic can be used as a model to carry IB traffic. Note that IB defines 16 Service Levels (although only eight would Shelvapille et al. Expires March 12, 2009 [Page 16] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 be used for QOS), which are mapped to Virtual Lanes on which the IB data arrives at the NSP on the attachment circuit. Knowing the mapping between the Diffserv CP and the VL, appropriate EXP bits on the MPLS label SHOULD be chosen at the NSP before transmitting over the PW. A.3 Pseudo-Port State Transitions InfiniBand defines a parameter called "PortState" for each port. Four port states are defined: "down", "init", "arm", and "active". In addition, two other attributes: "PortPhysicalState" and "LinkDownDefaultState" are associated with IB ports. With regard to the Pseudo-port, only the following states are relevant for the "PortPhysicalState" parameter: "sleep", "polling", "disabled", and "Linkup". By default the "PortPhysicalState" and "LinkDownDefaultState" attributes MUST be set to "polling" for the Pseudo-port. In this case, if the SM brings the port down by setting the "PortState" to "down", then the Pseudowire MUST be torn down, and then re-signaled. If, however, the "PortPhysicalState" is set to "disabled" or if the "LinkDownDefaultState" is set to "sleep" by the SM, and the "PortPhysicalState" transitions to "down", then, re-signaling of the Pseudowire MUST NOT be attempted. The same procedure MUST be followed if the Pseudowire goes down for any other reason. When a Pseudo-port is in the "down" state (because of "PortPhysicalState" and/or "LinkDownDefaultState" values), if the SM transitions the "PortPhysicalState" to "polling" and "LinkDownDefaultState" to "polling", then a Pseudowire setup MUST be initiated. A.4 IB Subnet Initialization The PE identifies itself as a switch to the IB Subnet Manager (SM) with at least two ports, one native IB and the other being the Pseudowire end-point. When the Pseudowire is signaled up, the IB Pseudo-port is considered to be in "init" state. This is conveyed to the SM by the NSP over the native IB link. The SM sends Subnet Management Packets (SMP) to the PE to transition the Pseudo-port state to "arm" and then "active". Once the Pseudo-port closer to the SM is in arm/active state, the SM can transition the other end of the PW to arm and active state. Shelvapille et al. Expires March 12, 2009 [Page 17] Internet-Draft Transport of InfiniBand over MPLS March 6, 2009 Assignment of LIDs, and all other configurations happen via SMPs on both PEs as they would on any IB switch. Shelvapille et al. Expires March 12, 2009 [Page 18]