Cisco WLC: Pitfalls when not using LAG

Since the very early days of AireOS WLCs, Cisco strongly recommends the usage of LAG (aka “static port channels” aka “static EtherChannels” aka “mode on channels”) to uplink the WLCs to the rest of the network.

The logical management/RMI interface, optionally AP manager interface and dynamic interfaces are assigned to the Link Aggregation Group. All physical network ports are assinged to the LAG. It’s only possible to create ONE LAG. To get high availability in terms of uplink switch failures, it is recommeded to distribute the LAG links to different switch chassis. Thus, a switch that supports multi-chassis channels (e.g VSS, stacks) are recommended by Cisco.

However sometimes it is not possible to use LAG, because the WLC must be connected to different physical switches. A good example is a WLC, that must have an INSIDE port for CAPWAP, management access and maybe internal WLANs and a separated EXTERNAL port for all the “evil” dynamic Interfaces (e.g. Internet WLANs).

In this case, link redundancy is not possible with LAGs, because all ports are assigned to ONE LAG and therefore, the WLC cannot be connected to different physical switches / switch stacks / VSSs. However, link redundancy can be achived by using a primary / backup port assigned.

For example on a 5508 WLC, port 1 (active) and port 2 (backup) are used for a redundant INSIDE connection and port 3 (active) and port 4 (backup) are used for the EXTERNAL connection.

How do you solve the problem with a 5520 WLC, which has only two physical ports? The simplest approach is, not using the active/backup port feature and achive redundancy using SSO (two WLC boxes) as illustrated in the following figure:

5520 SSO without LAG

The idea behind it is simple: The physical ports of the WLCs are distributed over different switches. If any switch or connection towards the active WLC fails, the SSO cluster converges and wlc2 will become active.

This is only true for connections, that are considered in the SSO algorithm. The only interfaces / ports, that are relevant for SSO operation are:

RP port (dedicated interface)
RMI interface (same network port as management interface)

A dynamic interface failure is not relevant for SSO operation.

Long story short: If port 2 (dynamic interface on active WLC) fails, which could happen for a cable failure or switch failure on the right side in the example above, the SSO cluster won’t converge and wlc2 won’t take over.

What happens is a blackhole scenario. All the APs are still associated to wlc1 and the CAPWAP control and data sessions terminate at wlc1. However WLANs, which are mapped to dynamic interfaces to port 2 won’t work properly any more. The wireless users won’t be bridged to the wired network.

Other networking platforms (e.g. Cisco ASA) solve this problem by monitoring additional interfaces within the redundancy configuration (configurable). A failure of these monitored interfaces result in a HA switchover. Another example are HSRP and VRRP priority penalties in case certain interfaces goes down.

So take care when using the combination of:

No LAG
Connection the different switching domains
High availability (e.g. SSO and plain old N+1)
No backup interfaces

I know it’s kinda exotic and special …. ….. but we love corner cases, because they always break our necks 🙂

Published by joe on 08/09/201708/09/2017

0 Comments

Leave a ReplyCancel reply

wesp9: Wireless Endpoint Statistics Ping

EAP TLS ciphers

DHCP / DHCPv6 tests using Docker