EVPN/VXLAN Architecture Deep Dive
EVPN/VXLAN Architecture Deep Dive
BGP EVPN route types, VTEP discovery, symmetric vs asymmetric IRB, ARP suppression, and multi-homing — with Cisco NX-OS, Arista EOS, and Junos CLI examples.
1. Why VXLAN?
IEEE 802.1Q VLANs are capped at 4,094 IDs per broadcast domain — a hard constraint in multi-tenant data centers where thousands of customer segments must coexist on shared infrastructure. VXLAN (Virtual eXtensible LAN, RFC 7348) solves this by encapsulating Ethernet frames inside UDP/IP, using a 24-bit VNI (VXLAN Network Identifier) to support up to 16.7 million logical segments.
VXLAN decouples the virtual Layer 2 topology from the physical Layer 3 underlay, allowing standard IP routing (ECMP, OSPF, BGP) between VXLAN Tunnel Endpoints (VTEPs) without stretching VLANs. The outer UDP header uses destination port 4789 (IANA-assigned; early deployments used 8472). Total encapsulation overhead is ~50 bytes over IPv4, ~70 bytes over IPv6.
2. VTEP Discovery Methods
VTEPs must discover peer VTEPs to set up tunnels and distribute BUM (Broadcast, Unknown unicast, Multicast) traffic. Three mechanisms are deployed in practice:
| Method | How it works | Pros | Cons |
|---|---|---|---|
| Multicast | Each VNI maps to a PIM multicast group in the underlay; BUM traffic is flooded to that group | Simple; automatic peer discovery | Requires PIM multicast in underlay; many operators disable multicast |
| Ingress Replication | Each VTEP maintains an explicit unicast list of remote VTEPs per VNI; BUM traffic is replicated to each peer | No multicast required | Head-end does O(N) replication per BUM packet; static peer lists require manual maintenance |
| BGP EVPN | RT-3 IMET routes advertise VTEP membership; RT-2 routes distribute MAC+IP bindings; no flood-and-learn | Control-plane MAC learning; ARP suppression; scales to thousands of VTEPs; standard | BGP stack required on all VTEPs or route-reflectors |
Modern greenfield data centers use BGP EVPN exclusively. Multicast and ingress-replication are legacy approaches still found in brownfield environments.
3. BGP EVPN Route Types
BGP EVPN (RFC 7432) uses AFI 25 (L2VPN) / SAFI 70 (EVPN) to distribute five route types. RT-5 was defined separately in RFC 9136 (October 2021).
| RT | Name | Purpose | Key NLRI fields |
|---|---|---|---|
| 1 | Ethernet Auto-Discovery | Per-ES and per-EVI mass-withdraw on link failure; aliasing for all-active multi-homing load-balancing | RD, ESI, Ethernet Tag ID, MPLS label |
| 2 | MAC/IP Advertisement | Distribute MAC addresses (and optionally the bound IP) to enable ARP suppression and eliminate flood-and-learn | RD, ESI, VLAN tag, MAC address, IP address (optional), L2VNI + L3VNI labels |
| 3 | Inclusive Multicast Ethernet Tag (IMET) | Advertise VTEP reachability per VNI; used to build ingress-replication lists and trigger BUM forwarding | RD, Ethernet Tag ID, Originating Router's IP (VTEP address); PMSI Tunnel attribute carries VNI and tunnel type |
| 4 | Ethernet Segment Route | Designated Forwarder (DF) election among PEs sharing an Ethernet Segment; ensures only one PE forwards BUM into the CE segment | RD, ESI, Originating Router IP |
| 5 | IP Prefix Route (RFC 9136) | Advertise IP prefixes into the EVPN overlay for inter-subnet routing; requires a dedicated L3VNI (transit VNI) | RD, Ethernet Tag ID, IP prefix length, IP prefix, GW IP address, L3VNI label |
4. Symmetric vs Asymmetric IRB
Integrated Routing and Bridging (IRB) describes how VTEPs route traffic between overlay subnets. Two models are defined in RFC 9135:
Asymmetric IRB: The ingress VTEP performs L3 routing (TTL decrement, next-hop rewrite) into the destination L2VNI before encapsulating and sending. The egress VTEP only bridges — it sees the inner frame already addressed to the final MAC. Every VTEP must have every VNI (subnet) programmed locally, even those with no local hosts, which limits scale.
Symmetric IRB: The ingress VTEP routes from the source L2VNI into a shared L3VNI (transit VNI, one per VRF). The egress VTEP routes out of the L3VNI into the local destination L2VNI. Both endpoints perform routing. Each VTEP only needs its own local L2VNIs; the single L3VNI is universal. This is the recommended model for large fabrics.
| Asymmetric IRB | Symmetric IRB | |
|---|---|---|
| L2VNIs needed per VTEP | All VNIs in the fabric | Only locally attached subnets |
| L3VNI (transit VNI) | Not required | Required — one per VRF |
| Routing hops | Ingress VTEP only | Ingress and egress VTEPs |
| Scale | Poor (all VNIs everywhere) | Good (local subnets only) |
| RT-5 prefixes | Not supported | Supported (uses L3VNI) |
5. ARP Suppression
Without EVPN, an ARP request from a host is broadcast into its VNI and flooded to every VTEP in the fabric. With BGP EVPN, RT-2 routes distribute MAC+IP bindings to all VTEPs as soon as hosts are learned. When a host ARPs for a remote IP, the local VTEP answers directly from its BGP-populated table — no ARP packet crosses the VXLAN fabric. This eliminates BUM flooding for known hosts and is especially impactful in fabrics with thousands of VMs per VTEP.
ND (Neighbor Discovery) suppression works identically for IPv6 — RT-2 routes carry IPv6 addresses in the IP field of the NLRI, and the VTEP answers NS messages locally.
6. Multi-Homing and ESI
An Ethernet Segment Identifier (ESI) is a 10-byte identifier assigned to the logical bundle connecting a CE device to multiple PE VTEPs. Two forwarding modes exist:
- Single-Active: One PE forwards at a time. The DF election (using RT-4 routes) picks the Designated Forwarder for each Ethernet Tag. The non-DF PE blocks BUM forwarding into the segment but can still receive unicast.
- All-Active: All PEs forward simultaneously, enabling ECMP across the bundle (like a port-channel with remote legs). RT-1 "aliasing" routes allow remote VTEPs to load-balance traffic toward the ESI across all attached PEs. MAC mobility is handled via the MAC Mobility extended community in RT-2.
7. Vendor CLI Quick Reference
| Task | Cisco NX-OS | Arista EOS | Juniper Junos |
|---|---|---|---|
| Show EVPN routes | show bgp l2vpn evpn |
show bgp evpn |
show route table bgp.evpn.0 |
| Show VTEP peers | show nve peers |
show vxlan vtep |
show evpn instance |
| Show overlay MACs | show mac address-table |
show vxlan address-table |
show evpn mac-ip-table |
| Show ARP suppression cache | show ip arp suppression-cache detail |
show vxlan address-table detail |
show evpn mac-ip-table extensive |
| Show VNI-to-VRF mapping | show nve vni |
show vxlan vni |
show evpn instance extensive |
| Show ESI multi-homing | show nve ethernet-segment |
show bgp evpn instance |
show evpn instance extensive |
References
- RFC 7348 — VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks (2014)
- RFC 7432 — BGP MPLS-Based Ethernet VPN (BGP EVPN) (2015)
- RFC 8365 — A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN) (2018)
- RFC 9135 — Integrated Routing and Bridging in Ethernet VPN (EVPN) (2021)
- RFC 9136 — IP Prefix Advertisement in Ethernet VPN (EVPN) (2021)
- IETF BESS Working Group — BGP Enabled ServiceS (active EVPN drafts)