Building redundant and distributed L3 network in Juno

Leave a comment

November 25, 2014 by kimizhang

Before Juno, when we deploy Openstack in production, there always is a painful point about L3 Agent:   High availability and performance bottleneck. Now Juno comes with new Neutron features to provide HA L3-agent and Distributed Virtual Router (DVR).

Specifications:

https://github.com/openstack/neutron-specs/blob/master/specs/juno/neutron-ovs-dvr.rst

https://github.com/openstack/neutron-specs/blob/master/specs/juno/l3-high-availability.rst

DVR distributes East-West traffic via virtual routers running on compute nodes. Also virtual routers on compute nodes handle North-South floating IP traffic locally for VM running on the same node. However if floating IP is not in use, VM originated external SNAT traffic is still handled centrally by virtual router in controller/network node.

HA L3 Agent provides virtual router HA by VRRP. A virtual gateway IP is always available from one of controller/network node.

Let’s take a look how they work in details

DVR

Steps to enable DVR:

  1. Precondition
    DVR currently only supports tunnel overlays (VxLAN or GRE) with l2population enabled, VLAN as overlay is not supported yet.
    So to continue, we need a running Juno Openstack setup with VxLAN or GRE as overlay network configured.
    In my setup, RDO with RHEL7 is used to deploy a multi-node Juno setup, 1 controller and 2 compute nodes.
  2. Neutron.conf on controller node:
    router_distributed = True	#This parameter controls the default property when
     # creating a new router,admin can override by "--distributed False" option 
    dvr_base_mac = fa:16:3f:00:00:00
  3. l3_agent.ini on all nodes:
    #On controller node
    agent_mode = dvr_snat
     
    #On compute nodes
    agent_mode = dvr
  4. ml2_conf.ini on all nodes:
    #Append l2population to mechanism_drivers
    [ml2]
    mechanism_drivers =openvswitch,l2population
  5. ovs_neutron_plugin.ini on all nodes:
    [agent]
    l2_population = True
    enable_distributed_routing = True
  6. Restart Openstack services on all nodes:
    openstack-service restart

Now we are ready to use DVR.

  • SNAT traffic flow

If a VM does not need floating IP, only needs outgoing external network access, then the traffic will still be handled by centralized SNAT on L3 agent running on controller node.

The SNAT traffic flow from VM to external network:

Juno-Neutron-DVR-Centralized-SNAT

Let’s create a DVR:

neutron router-create router
#By default the router will be distributed, if you want to created old style centralized
#router,use neutron router-create --distributed False <router name> (admin only)

Then create a tenant network, subnet, link network to the router, setup router gateway

neutron net-create internal
neutron subnet-create --name internal internal 10.0.0.0/24
neutron router-interface-add router internal
neutron router-gateway-set <router-id> <external network id>

Then launch a VM using this tenant network from dashboard.

Check controller node:

[root@RDO-Juno-controller ~(keystone_admin)]# ip netns
snat-db5090df-8385-4e42-a176-bec0ea9d6691
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691
qdhcp-3efe26de-741f-4d62-8fc0-0f8b1cce07db
 
[root@RDO-Juno-controller ~(keystone_admin)]# ip netns exec \
snat-db5090df-8385-4e42-a176-bec0ea9d6691 ip a |grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 192.168.122.218/24 brd 192.168.122.255 scope global qg-f5182aa4-e9
    inet 13.0.0.5/24 brd 13.0.0.255 scope global sg-71b3591a-b6
 
[root@RDO-Juno-controller ~(keystone_admin)]# ip netns exec \
snat-db5090df-8385-4e42-a176-bec0ea9d6691 iptables-save | grep SNAT
-A neutron-l3-agent-snat -s 13.0.0.0/24 -j SNAT --to-source 192.168.122.218

We can see 3 network namespaces are created, this snat namespace is newly introduced by DVR, to handle SNAT iptables separately.

In SNAT namespace, there is one internal IP 13.0.0.5 for receiving SNAT traffic from VMs, and one external IP 192.168.122.218 for SNAT.

Then we go to check the compute node where the VM is running

[root@RDO-Juno-compute ~]# ip netns
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691
 
[root@RDO-Juno-compute ~]# ip netns exec\
 qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 13.0.0.1/24 brd 13.0.0.255 scope global qr-078b1a5a-d0

We can see now on compute node, there is one new router namespace, which hosting the gateway IP 13.0.0.1 in it. But like we said, the outgoing external traffic will be centrally handled by router in controller node still, how it goes ? Let’s check the routing policies and tables:

[root@RDO-Juno-compute neutron]# ip netns exec \
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip rule ls
0:	from all lookup local 
32766:	from all lookup main 
32767:	from all lookup default 
218103809:	from 13.0.0.1/24 lookup 218103809 
 
[root@RDO-Juno-compute ~]# ip netns exec \
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip route show table all
default via 13.0.0.5 dev qr-078b1a5a-d0  table 218103809 
13.0.0.0/24 dev qr-078b1a5a-d0  proto kernel  scope link  src 13.0.0.1
...

We can see a new routing policy is created, to lead traffic from 13.0.0.1/24 to use routing table “218103809”, in routing table “218103809”, the default gateway is pointed to 13.0.0.5, which is the interface in centralized router handling SNAT traffic.

  • Floating IP traffic flow

In DVR, Floating IP traffic will be handled locally on the compute node, of course this requires each compute having external connectivity

The Floating IP traffic flow from VM to external network:

Juno-Neutron-DVR-FloatingIP-traffic-flow

Let’s create a floating IP to the VM

neutron floatingip-create <external network id>
Created a new floatingip:
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| fixed_ip_address    |                                      |
| floating_ip_address | 192.168.122.219                      |
| floating_network_id | e720244f-fd4f-4811-9435-4a48e13519f7 |
| id                  | a7d2cf45-e178-4caa-b616-6309cf7c2467 |
| port_id             |                                      |
| router_id           |                                      |
| status              | DOWN                                 |
| tenant_id           | 243dcb3c3af948fda81960307af2b36e     |
+---------------------+--------------------------------------+
neutron floatingip-associate <floating-ip id> <the VM's neutron port id >

Then let’s check network namespaces on compute node

[root@RDO-Juno-compute ~]# ip netns
fip-e720244f-fd4f-4811-9435-4a48e13519f7
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691
 
[root@RDO-Juno-compute ~]# ip netns exec \
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.31.28/31 scope global rfp-db5090df-8
    inet 192.168.122.219/32 brd 192.168.122.219 scope global rfp-db5090df-8
    inet 13.0.0.1/24 brd 13.0.0.255 scope global qr-078b1a5a-d0
 
[root@RDO-Juno-compute ~]# ip netns exec \
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 iptables-save -t nat | grep "^-A"|grep l3-agent
-A PREROUTING -j neutron-l3-agent-PREROUTING
-A OUTPUT -j neutron-l3-agent-OUTPUT
-A POSTROUTING -j neutron-l3-agent-POSTROUTING
-A neutron-l3-agent-OUTPUT -d 192.168.122.219/32 -j DNAT --to-destination 13.0.0.6
-A neutron-l3-agent-POSTROUTING ! -i rfp-db5090df-8 ! -o rfp-db5090df-8 -m conntrack ! --ctstate DNAT -j ACCEPT
-A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
-A neutron-l3-agent-PREROUTING -d 192.168.122.219/32 -j DNAT --to-destination 13.0.0.6
-A neutron-l3-agent-float-snat -s 13.0.0.6/32 -j SNAT --to-source 192.168.122.219
-A neutron-l3-agent-snat -j neutron-l3-agent-float-snat
-A neutron-postrouting-bottom -j neutron-l3-agent-snat
 
[root@RDO-Juno-compute ~]# ip netns exec \
fip-e720244f-fd4f-4811-9435-4a48e13519f7  ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.31.29/31 scope global fpr-db5090df-8
    inet 192.168.122.220/24 brd 192.168.122.255 scope global fg-c56eb4c0-b0
 
[root@RDO-Juno-compute ~]# ip netns exec \
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip rule ls
0:	from all lookup local 
32766:	from all lookup main 
32767:	from all lookup default 
32768:	from 13.0.0.6 lookup 16 
218103809:	from 13.0.0.1/24 lookup 218103809 
218103809:	from 13.0.0.1/24 lookup 218103809
 
[root@RDO-Juno-compute ~]# ip netns exec \
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip route show table 16
default via 169.254.31.29 dev rfp-db5090df-8
 
[root@RDO-Juno-compute ~]# ip netns exec\
 fip-e720244f-fd4f-4811-9435-4a48e13519f7 ip route
default via 192.168.122.1 dev fg-c56eb4c0-b0 
169.254.31.28/31 dev fpr-db5090df-8  proto kernel  scope link  src 169.254.31.29 
192.168.122.0/24 dev fg-c56eb4c0-b0  proto kernel  scope link  src 192.168.122.220 
192.168.122.219 via 169.254.31.28 dev fpr-db5090df-8

We can see the floating IP 192.168.122.219 is created in DVR namespace, also NAT iptables entries in the same namespace, then we see there’s one more network namespace “fip-e720244f-fd4f-4811-9435-4a48e13519f7” is created, it’s acturally for routing the Floating IP traffic to external network. It acts a “FloatingIP router”, DVR and FloatingIP router are connected by a point-to-point network “169.254.31.28/31”.

In DVR namespace, a new routing policy is created to lead traffic from this VM (13.0.0.6) to routing table 16, then in table 16, the default GW is pointed to FloatingIP router. In FloatingIP router namespace, the traffic will be routed out towards 192.168.122.1, the physical GW IP of the external network. Here we notice that the FloatingIP router consumes one external IP “192.168.122.220”, just for routing purpose.

  • Intra-tenant east-west traffic

With DVR, east-west traffic between VMs is handled by local DVRs directly, no need to travel to controller node anymore

We have one tenant network connected to one router already, let’s create another networking in the same tenant, connecting to the created router:

neutron net-create internal-2
neutron subnet-create --name internal internal-2 11.0.0.0/24
neutron router-interface-add router internal-2

Launch a VM using this new tenant network from dashboard. And then check running VMs:

[root@RDO-Juno-controller ~(keystone_admin)]# nova list \
--fields name,status,Networks,OS-EXT-SRV-ATTR:host
+--------------------------------------+--------+--------+------------------------------------+-----------------------+
| ID                                   | Name   | Status | Networks                           | OS-EXT-SRV-ATTR: Host |
+--------------------------------------+--------+--------+------------------------------------+-----------------------+
| 635a11aa-94e3-4b7e-8d6d-3d6ce22d7f82 | test   | ACTIVE | internal=13.0.0.6, 192.168.122.219 | RDO-Juno-compute      |
| acc6c793-77fc-4a74-91f6-2deaa57276a1 | test-2 | ACTIVE | internal-2=11.0.0.3                | RDO-Juno-compute-2    |
+--------------------------------------+--------+--------+------------------------------------+-----------------------+

We can see 2 VMs in 2 tenant networks are running on 2 different compute nodes.

Let’s check the network namespaces on both compute nodes

[root@RDO-Juno-compute neutron]# ip netns exec \
qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.31.28/31 scope global rfp-db5090df-8
    inet 192.168.122.219/32 brd 192.168.122.219 scope global rfp-db5090df-8
    inet 13.0.0.1/24 brd 13.0.0.255 scope global qr-078b1a5a-d0
    inet 11.0.0.1/24 brd 11.0.0.255 scope global qr-a4a68c83-2d
 
 
[root@RDO-Juno-compute-2 ~]# ip netns exec\
 qrouter-db5090df-8385-4e42-a176-bec0ea9d6691 ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 13.0.0.1/24 brd 13.0.0.255 scope global qr-078b1a5a-d0
    inet 11.0.0.1/24 brd 11.0.0.255 scope global qr-a4a68c83-2d

We can see on each compute, they all have the same namespace for the router, in each namespace, the gateway IPs of internal and internal-2 tenant networks are there locally.

The traffic flow from VM2 in blue tenant network to VM1 in green tenant network:

Juno-Neutron-DVR-intra-tenant-traffic

Packet 1: Since routing needed, VM2 use MAC of its blue network gateway as destination MAC, self MAC as source MAC.
Packet 2: When the packet reach the local router, router forwards the packet to tunnel, and change the source MAC to its green network gateway MAC and destination to VM1 MAC. (VM1 MAC is in ARP table of DVR router)
Packet 3: Packet travelling in the tunnel still uses VM1 MAC as destination, but source MAC is changed to DVR MAC of compute node 2.
Packet 4: Finally the packet reaches VM1, with of course VM1 MAC as destination, and the source should be the MAC of green network gateway.

DVR MAC for each DVR router is involved to identify different DVRs from different nodes, because every DVR has same tenant network gateway IP and gateway MAC. 

L3 Agent HA

Currently creating a distributed HA router is not supported yet, we have to disable DVR to make HA router working.

See Neutron bug: https://bugs.launchpad.net/neutron/+bug/1365473

To make L3 Agent HA, we need at least 2 L3 agent nodes, so here we make 1st compute node working as L3 agent node as well, to make L3 Agent HA working.

The HA router traffic flow:

Juno-Neutron-L3-Agent-HA-TrafficFlow

Install keepalived on 2 L3 Agent nodes (RDO packstack does not install it, bug created: https://bugzilla.redhat.com/show_bug.cgi?id=1166653)

yum install keepalived

Configure neutron.conf on controller node:

[DEFAULT]
router_distributed = False
l3_ha = True    #This parameter controls the default property when creating a new router,
                # admin can override by "--ha False" option of neutron router-create command
max_l3_agents_per_router = 2	#This means the HA router will run on 2 L3 agents

Disable DVR support in ovs_neutron_plugin.ini on all nodes:

[agent]
enable_distributed_routing = False

l3_agent.ini on 2 L3 Agent nodes (controller and 1st compute node):

agent_mode = legacy

Stop and disable L3 agent on 2nd compute node

systemctl stop neutron-l3-agent.service 
systemctl disable neutron-l3-agent.service

Restart all Openstack services on all every node

openstack-service restart

Now we can create a HA router

neutron router-create --ha True harouter
Created a new router:
+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | 7422a91c-207e-42e7-b100-9164693b4c99 |
| name                  | harouter                             |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | 243dcb3c3af948fda81960307af2b36e     |
+-----------------------+--------------------------------------+
#By default the HA is enabled, if you want to created non-HA router,
# use neutron router-create --ha False <router name> (admin only)

We can see a router is created with “distributed” as “False”, and “ha” as True. It’s HA router without DVR

Let’s check the neutron networks and ports of “harouter”, also router namespace on 2 L3 agent node.

 

[root@RDO-Juno-controller ~(keystone_admin)]# neutron net-list
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
| id                                   | name                                               | subnets                                               |
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
| da51edbd-da2e-4763-b343-cbc8af227901 | internal-2                                         | 878e9fdc-1171-4a90-a12c-253bff7a51bb 11.0.0.0/24      |
| e720244f-fd4f-4811-9435-4a48e13519f7 | public                                             | 3927a1d4-5717-4bb5-808c-b848e2eb0e7e 192.168.122.0/24 |
| 3efe26de-741f-4d62-8fc0-0f8b1cce07db | internal                                           | d657942d-1f92-4067-bdb4-c8ea4a5ea86e 13.0.0.0/24      |
| 44e987b7-7251-4414-9c12-97cb237f3234 | HA network tenant 243dcb3c3af948fda81960307af2b36e | 1bdbc82e-e1f8-4c2d-bb05-2e779d67f2aa 169.254.192.0/18 |
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
 
[root@RDO-Juno-controller ~(keystone_admin)]# neutron router-port-list harouter
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
| id                                   | name                                            | mac_address       | fixed_ips                                                                            |
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
| b1c5fa46-b8d8-42a4-834b-147bf96082c2 | HA port tenant 243dcb3c3af948fda81960307af2b36e | fa:16:3e:4e:8d:be | {"subnet_id": "1bdbc82e-e1f8-4c2d-bb05-2e779d67f2aa", "ip_address": "169.254.192.3"} |
| e8ad61d3-b16e-49c7-a4c1-6bf027f156e7 | HA port tenant 243dcb3c3af948fda81960307af2b36e | fa:16:3e:b8:f8:6a | {"subnet_id": "1bdbc82e-e1f8-4c2d-bb05-2e779d67f2aa", "ip_address": "169.254.192.4"} |
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
 
[root@RDO-Juno-controller ~(keystone_admin)]# ip netns exec qrouter-7422a91c-207e-42e7-b100-9164693b4c99\
 ip a |grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-b1c5fa46-b8
 
[root@RDO-Juno-compute ~]# ip netns exec qrouter-7422a91c-207e-42e7-b100-9164693b4c99 ip a |grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.192.4/18 brd 169.254.255.255 scope global ha-e8ad61d3-b1

We can see there’s one HA network 169.254.192.0/18 created, each L3 agent on controller and 1st compute node has one port in this HA network. This network will be dedicatedly used as VRRP/keepalived communications

Now we connect tenant network “internal” to this “harouter”:

 

neutron router-interface-add harouter internal

Let’s check router namespace again on 2 L3 agent nodes

 

[root@RDO-Juno-controller ~(keystone_admin)]# ip netns exec \
qrouter-7422a91c-207e-42e7-b100-9164693b4c99 ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-b1c5fa46-b8
 
[root@RDO-Juno-compute ~]# ip netns exec \
qrouter-7422a91c-207e-42e7-b100-9164693b4c99 ip a | grep "inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.192.4/18 brd 169.254.255.255 scope global ha-e8ad61d3-b1
    inet 13.0.0.1/24 scope global qr-b2d3590b-61

We can see the router gateway VIP is up on 2nd L3 agent node.

We can check also the keepalived processes are running on both nodes and the configuration file:

[root@RDO-Juno-compute ~]# ps aux |grep keepalived
root     17454  0.0  0.0 112084  1328 ?        Ss   16:36   0:00 keepalived -P -f /var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99/keepalived.conf -p /var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99.pid -r /var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99.pid-vrrp
root     17455  0.0  0.0 118316  2344 ?        S    16:36   0:00 keepalived -P -f /var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99/keepalived.conf -p /var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99.pid -r /var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99.pid-vrrp
root     18601  0.0  0.0 112640   940 pts/0    R+   16:44   0:00 grep --color=auto keepalived

[root@RDO-Juno-compute ~]# cat \
/var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99/keepalived.conf
vrrp_sync_group VG_1 {
    group {
        VR_1
    }
    notify_master "/var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99/notify_master.sh"
    notify_backup "/var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99/notify_backup.sh"
    notify_fault "/var/lib/neutron/ha_confs/7422a91c-207e-42e7-b100-9164693b4c99/notify_fault.sh"
}
vrrp_instance VR_1 {
    state BACKUP
    interface ha-e8ad61d3-b1
    virtual_router_id 1
    priority 50
    nopreempt
    advert_int 2
    track_interface {
        ha-e8ad61d3-b1
    }
    virtual_ipaddress {
        13.0.0.1/24 dev qr-b2d3590b-61
    }
}

Then launch a VM using “internal” network from dashboard, try to ping the gateway VIP from the VM

[root@host-13-0-0-8 ~]# ping 13.0.0.1
PING 13.0.0.1 (13.0.0.1) 56(84) bytes of data.
64 bytes from 13.0.0.1: icmp_seq=1 ttl=64 time=1.53 ms
64 bytes from 13.0.0.1: icmp_seq=2 ttl=64 time=0.459 ms...

Leave a comment