Tuesday, November 17, 2015

Attempt to set up HAProxy/Keepalived 3 Node Controller on RDO Liberty per Javier Pena

URGENT UPDATE 11/18/2015
 Please, view https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
 It looks as work in progress.
 See also https://www.redhat.com/archives/rdo-list/2015-November/msg00168.html
END UPDATE

  Actually, setup bellow follows closely https://github.com/beekhof/osp-ha-deploy/blob/master/HA-keepalived.md
As far as to my knowledge Cisco's schema has been implemented :-
Keepalived, HAProxy,Galera for MySQL Manual install, at least 3 controller nodes. I just highlighted several steps  which as I believe allowed me to bring this work to success.  Javier is using flat external network provider for Controllers cluster disabling from the same start NetworkManager && enabling service network, there is one step which i was unable to skip. It's disabling IP's of eth0's interfaces && restarting network service right before running `ovs-vsctl add-port br-eth0 eth0` per  Neutron building instructions of mentioned "Howto", which seems to be one of the best I've ever seen.
  I (just) guess that due this sequence of steps even on already been built and seems to run OK  three nodes Controller Cluster external network is still ping able :-

 
  However, would i disable eth0's IPs from the start i would lost connectivity right away switching to network service from NetworkManager . In general,  external network is supposed to be ping able from qrouter namespace due to Neutron router's  DNAT/SNAT IPtables forwarding, but not from Controller . I am also aware of that when Ethernet interface becomes an OVS port of OVS bridge it's IP is supposed to be suppressed. When external network provider is not used , then br-ex gets any IP  available IP on external network. Using external network provider changes situation. Details may be seen here :-
https://www.linux.com/community/blogs/133-general-linux/858156-multiple-external-networks-with-a-single-l3-agent-testing-on-rdo-liberty-per-lars-kellogg-stedman

[root@hacontroller1 ~(keystone_admin)]# systemctl status NetworkManager
NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled)
   Active: inactive (dead)

[root@hacontroller1 ~(keystone_admin)]# systemctl status network
network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network)
   Active: active (exited) since Wed 2015-11-18 08:36:53 MSK; 2h 10min ago
  Process: 708 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS)


Nov 18 08:36:47 hacontroller1.example.com network[708]: Bringing up loopback interface:  [  OK  ]
Nov 18 08:36:51 hacontroller1.example.com network[708]: Bringing up interface eth0:  [  OK  ]
Nov 18 08:36:53 hacontroller1.example.com network[708]: Bringing up interface eth1:  [  OK  ]
Nov 18 08:36:53 hacontroller1.example.com systemd[1]: Started LSB: Bring up/down networking.

[root@hacontroller1 ~(keystone_admin)]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::5054:ff:fe6d:926a  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:6d:92:6a  txqueuelen 1000  (Ethernet)
        RX packets 5036  bytes 730778 (713.6 KiB)
        RX errors 0  dropped 12  overruns 0  frame 0
        TX packets 15715  bytes 930045 (908.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.169.142.221  netmask 255.255.255.0  broadcast 192.169.142.255
        inet6 fe80::5054:ff:fe5e:9644  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:5e:96:44  txqueuelen 1000  (Ethernet)
        RX packets 1828396  bytes 283908183 (270.7 MiB)
        RX errors 0  dropped 13  overruns 0  frame 0
        TX packets 1839312  bytes 282429736 (269.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 869067  bytes 69567890 (66.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 869067  bytes 69567890 (66.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@hacontroller1 ~(keystone_admin)]# ping -c 3  10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=2.04 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.103 ms
64 bytes from 10.10.10.1: icmp_seq=3 ttl=64 time=0.118 ms

--- 10.10.10.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.103/0.754/2.043/0.911 ms



 
  Both mgmt and external networks emulated by corresponging Libvirt Networks
on F23 Virtualization Server. Total four VMs been setup , 3 of them for Controller
nodes and one for compute (4 VCPUS, 4 GB RAM)

[root@fedora23wks ~]# cat openstackvms.xml ( for eth1's)
<network>
   <name>openstackvms</name>
   <uuid>d0e9964a-f91a-40c0-b769-a609aee41bf2</uuid>
   <forward mode='nat'>
     <nat>
       <port start='1024' end='65535'/>
     </nat>
   </forward>
   <bridge name='virbr1' stp='on' delay='0' />
   <mac address='52:54:00:60:f8:6d'/>
   <ip address='192.169.142.1' netmask='255.255.255.0'>
     <dhcp>
       <range start='192.169.142.2' end='192.169.142.254' />
     </dhcp>
   </ip>
 </network>
[root@fedora23wks ~]# cat public.xml ( for external network provider )
<network>
   <name>public</name>
   <uuid>d0e9965b-f92c-40c1-b749-b609aed42cf2</uuid>
   <forward mode='nat'>
     <nat>
       <port start='1024' end='65535'/>
     </nat>
   </forward>
   <bridge name='virbr2' stp='on' delay='0' />
   <mac address='52:54:00:60:f8:6d'/>
   <ip address='10.10.10.1' netmask='255.255.255.0'>
     <dhcp>
       <range start='10.10.10.2' end='10.10.10.254' />
     </dhcp>
   </ip>
 </network>

Only one file is bit different on Controller Nodes , it is l3_agent.ini

[root@hacontroller1 neutron(keystone_demo)]# cat l3_agent.ini | grep -v ^# | grep -v ^$
[DEFAULT]
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
handle_internal_only_routers = True
send_arp_for_ha = 3
metadata_ip = controller-vip.example.com
external_network_bridge =
gateway_external_network_id =
[AGENT]

When "external_network_bridge = " , Neutron places the external interface of the router into the OVS bridge specified by the "provider_network" provider attribute in the Neutron network. Traffic is processed by Open vSwitch flow rules. In this configuration it is possible to utilize flat and VLAN provider networks.

*************************************************************************************
Due to posted "UPDATE" on the top of  the blog entry in meantime
perfect solution is provided by https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
Per mentioned patch, assuming eth0 is your interface attached to the external network, create two files in /etc/sysconfig/network-scripts/ as follows (change MTU if you need):

    cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    ONBOOT=yes
    DEVICETYPE=ovs
    TYPE=OVSPort
    OVS_BRIDGE=br-eth0
    ONBOOT=yes
    BOOTPROTO=none
    VLAN=yes
    MTU="9000"
    NM_CONTROLLED=no
    EOF

    cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-br-eth0
    DEVICE=br-eth0
    DEVICETYPE=ovs
    OVSBOOTPROTO=none
    TYPE=OVSBridge
    ONBOOT=yes
    BOOTPROTO=static
    MTU="9000"
    NM_CONTROLLED=no
    EOF

Restart the network for the changes to take effect.

systemctl restart network

The commit has been done on 11/14/2015 right after discussion at RDO mailing list.

Details maiby seen here Nova and Neutron work-flow && CLI for HAProxy/Keepalived 3 Node Controller RDO Liberty

 *************************************************************************************
One more step which I did ( not sure that is really has
to be done at this point of time ).IP's on eth0's interfaces were disabled just before
running `ovs-vsctl add-port br-eth0 eth0`:-

1. Updated ifcfg-eth0 files on all Controllers
2. `service network restart` on all Controllers
3.  `ovs-vsctl add-port br-eth0 eth0`on all Controllers

*****************************************************************************************
Targeting just POC ( to get floating ips accessible from Fedora 23 Virtualization
host )  resulted  Controllers Cluster setup:-
*****************************************************************************************
I installed only

Keystone

**************************
UPDATE to official docs
**************************
[root@hacontroller1 ~(keystone_admin)]# cat   keystonerc_admin
export OS_USERNAME=admin
export OS_TENANT_NAME=admin
export OS_PROJECT_NAME=admin
export OS_REGION_NAME=regionOne
export OS_PASSWORD=keystonetest
export OS_AUTH_URL=http://controller-vip.example.com:35357/v2.0/
export OS_SERVICE_ENDPOINT=http://controller-vip.example.com:35357/v2.0
export OS_SERVICE_TOKEN=$(cat /root/keystone_service_token)

export PS1='[\u@\h \W(keystone_admin)]\$ '

Glance
Neutron
Nova
Horizon

Due to running Galera Synchronous MultiMaster Replication between Controllers each commands like :-

# su keystone -s /bin/sh -c "keystone-manage db_sync"
# su glance -s /bin/sh -c "glance-manage db_sync"
# neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head
# su nova -s /bin/sh -c "nova-manage db sync"

are supposed to run just once from Conroller node 1 ( for instance )


************************
Compute Node setup:-
*************************

Compute setup

**********************
On all nodes
**********************
[root@hacontroller1 neutron(keystone_demo)]# cat /etc/hosts
192.169.142.220 controller-vip.example.com controller-vip
192.169.142.221 hacontroller1.example.com hacontroller1
192.169.142.222 hacontroller2.example.com hacontroller2
192.169.142.223 hacontroller3.example.com hacontroller3
192.169.142.224 compute.example.con compute
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

[root@hacontroller1 ~(keystone_admin)]# cat /etc/neutron/neutron.conf | grep -v ^$| grep -v ^#
[DEFAULT]
bind_host = 192.169.142.22(X)
auth_strategy = keystone
notification_driver = neutron.openstack.common.notifier.rpc_notifier
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
service_plugins = router,lbaas
router_scheduler_driver = neutron.scheduler.l3_agent_scheduler.ChanceScheduler
dhcp_agents_per_network = 2
api_workers = 2
rpc_workers = 2
l3_ha = True
min_l3_agents_per_router = 2
max_l3_agents_per_router = 2

[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
[keystone_authtoken]
auth_uri = http://controller-vip.example.com:5000/
identity_uri = http://127.0.0.1:5000
admin_tenant_name = %SERVICE_TENANT_NAME%
admin_user = %SERVICE_USER%
admin_password = %SERVICE_PASSWORD%
auth_plugin = password
auth_url = http://controller-vip.example.com:35357/
username = neutron
password = neutrontest
project_name = services
[database]
connection = mysql://neutron:neutrontest@controller-vip.example.com:3306/neutron
max_retries = -1
[nova]
nova_region_name = regionOne
project_domain_id = default
project_name = services
user_domain_id = default
password = novatest
username = compute
auth_url = http://controller-vip.example.com:35357/
auth_plugin = password
[oslo_concurrency]
[oslo_policy]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_hosts = hacontroller1,hacontroller2,hacontroller3
rabbit_ha_queues = true
[qos]


[root@hacontroller1 haproxy(keystone_demo)]# cat haproxy.cfg
global
    daemon
    stats socket /var/lib/haproxy/stats
defaults
    mode tcp
    maxconn 10000
    timeout connect 5s
    timeout client 30s
    timeout server 30s

listen monitor
    bind 192.169.142.220:9300
    mode http
    monitor-uri /status
    stats enable
    stats uri /admin
    stats realm Haproxy\ Statistics
    stats auth root:redhat
    stats refresh 5s

frontend vip-db
    bind 192.169.142.220:3306
    timeout client 90m
    default_backend db-vms-galera
backend db-vms-galera
    option httpchk
    stick-table type ip size 1000
    stick on dst
    timeout server 90m
    server rhos8-node1 192.169.142.221:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
    server rhos8-node2 192.169.142.222:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
    server rhos8-node3 192.169.142.223:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions

# Note the RabbitMQ entry is only needed for CloudForms compatibility
# and should be removed in the future
frontend vip-rabbitmq
    option clitcpka
    bind 192.169.142.220:5672
    timeout client 900m
    default_backend rabbitmq-vms
backend rabbitmq-vms
    option srvtcpka
    balance roundrobin
    timeout server 900m
    server rhos8-node1 192.169.142.221:5672 check inter 1s
    server rhos8-node2 192.169.142.222:5672 check inter 1s
    server rhos8-node3 192.169.142.223:5672 check inter 1s

frontend vip-keystone-admin
    bind 192.169.142.220:35357
    default_backend keystone-admin-vms
    timeout client 600s
backend keystone-admin-vms
    balance roundrobin
    timeout server 600s
    server rhos8-node1 192.169.142.221:35357 check inter 1s on-marked-down shutdown-sessions
    server rhos8-node2 192.169.142.222:35357 check inter 1s on-marked-down shutdown-sessions
    server rhos8-node3 192.169.142.223:35357 check inter 1s on-marked-down shutdown-sessions

frontend vip-keystone-public
    bind 192.169.142.220:5000
    default_backend keystone-public-vms
    timeout client 600s
backend keystone-public-vms
    balance roundrobin
    timeout server 600s
    server rhos8-node1 192.169.142.221:5000 check inter 1s on-marked-down shutdown-sessions
    server rhos8-node2 192.169.142.222:5000 check inter 1s on-marked-down shutdown-sessions
    server rhos8-node3 192.169.142.223:5000 check inter 1s on-marked-down shutdown-sessions

frontend vip-glance-api
    bind 192.169.142.220:9191
    default_backend glance-api-vms
backend glance-api-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:9191 check inter 1s
    server rhos8-node2 192.169.142.222:9191 check inter 1s
    server rhos8-node3 192.169.142.223:9191 check inter 1s

frontend vip-glance-registry
    bind 192.169.142.220:9292
    default_backend glance-registry-vms
backend glance-registry-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:9292 check inter 1s
    server rhos8-node2 192.169.142.222:9292 check inter 1s
    server rhos8-node3 192.169.142.223:9292 check inter 1s

frontend vip-cinder
    bind 192.169.142.220:8776
    default_backend cinder-vms
backend cinder-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8776 check inter 1s
    server rhos8-node2 192.169.142.222:8776 check inter 1s
    server rhos8-node3 192.169.142.223:8776 check inter 1s

frontend vip-swift
    bind 192.169.142.220:8080
    default_backend swift-vms
backend swift-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8080 check inter 1s
    server rhos8-node2 192.169.142.222:8080 check inter 1s
    server rhos8-node3 192.169.142.223:8080 check inter 1s

frontend vip-neutron
    bind 192.169.142.220:9696
    default_backend neutron-vms
backend neutron-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:9696 check inter 1s
    server rhos8-node2 192.169.142.222:9696 check inter 1s
    server rhos8-node3 192.169.142.223:9696 check inter 1s

frontend vip-nova-vnc-novncproxy
    bind 192.169.142.220:6080
    default_backend nova-vnc-novncproxy-vms
backend nova-vnc-novncproxy-vms
    balance roundrobin
    timeout tunnel 1h
    server rhos8-node1 192.169.142.221:6080 check inter 1s
    server rhos8-node2 192.169.142.222:6080 check inter 1s
    server rhos8-node3 192.169.142.223:6080 check inter 1s

frontend nova-metadata-vms
    bind 192.169.142.220:8775
    default_backend nova-metadata-vms
backend nova-metadata-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8775 check inter 1s
    server rhos8-node2 192.169.142.222:8775 check inter 1s
    server rhos8-node3 192.169.142.223:8775 check inter 1s

frontend vip-nova-api
    bind 192.169.142.220:8774
    default_backend nova-api-vms
backend nova-api-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8774 check inter 1s
    server rhos8-node2 192.169.142.222:8774 check inter 1s
    server rhos8-node3 192.169.142.223:8774 check inter 1s

frontend vip-horizon
    bind 192.169.142.220:80
    timeout client 180s
    default_backend horizon-vms
backend horizon-vms
    balance roundrobin
    timeout server 180s
    mode http
    cookie SERVERID insert indirect nocache
    server rhos8-node1 192.169.142.221:80 check inter 1s cookie rhos8-horizon1 on-marked-down shutdown-sessions
    server rhos8-node2 192.169.142.222:80 check inter 1s cookie rhos8-horizon2 on-marked-down shutdown-sessions
    server rhos8-node3 192.169.142.223:80 check inter 1s cookie rhos8-horizon3 on-marked-down shutdown-sessions

frontend vip-heat-cfn
    bind 192.169.142.220:8000
    default_backend heat-cfn-vms
backend heat-cfn-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8000 check inter 1s
    server rhos8-node2 192.169.142.222:8000 check inter 1s
    server rhos8-node3 192.169.142.223:8000 check inter 1s

frontend vip-heat-cloudw
    bind 192.169.142.220:8003
    default_backend heat-cloudw-vms
backend heat-cloudw-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8003 check inter 1s
    server rhos8-node2 192.169.142.222:8003 check inter 1s
    server rhos8-node3 192.169.142.223:8003 check inter 1s

frontend vip-heat-srv
    bind 192.169.142.220:8004
    default_backend heat-srv-vms
backend heat-srv-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8004 check inter 1s
    server rhos8-node2 192.169.142.222:8004 check inter 1s
    server rhos8-node3 192.169.142.223:8004 check inter 1s

frontend vip-ceilometer
    bind 192.169.142.220:8777
    timeout client 90s
    default_backend ceilometer-vms
backend ceilometer-vms
    balance roundrobin
    timeout server 90s
    server rhos8-node1 192.169.142.221:8777 check inter 1s
    server rhos8-node2 192.169.142.222:8777 check inter 1s
    server rhos8-node3 192.169.142.223:8777 check inter 1s

frontend vip-sahara
    bind 192.169.142.220:8386
    default_backend sahara-vms
backend sahara-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8386 check inter 1s
    server rhos8-node2 192.169.142.222:8386 check inter 1s
    server rhos8-node3 192.169.142.223:8386 check inter 1s

frontend vip-trove
    bind 192.169.142.220:8779
    default_backend trove-vms
backend trove-vms
    balance roundrobin
    server rhos8-node1 192.169.142.221:8779 check inter 1s
    server rhos8-node2 192.169.142.222:8779 check inter 1s
    server rhos8-node3 192.169.142.223:8779 check inter 1s

[root@hacontroller1 ~(keystone_demo)]# cat /etc/my.cnf.d/galera.cnf
[mysqld]
skip-name-resolve=1
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
max_connections=8192
query_cache_size=0
query_cache_type=0
bind_address=192.169.142.22(X)
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://192.169.142.221,192.169.142.222,192.169.142.223"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync

[root@hacontroller1 ~(keystone_demo)]# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
}

vrrp_instance VI_PUBLIC {
    interface eth1
    state BACKUP
    virtual_router_id 52
    priority 101
    virtual_ipaddress {
        192.169.142.220 dev eth1
    }
    track_script {
        chk_haproxy
    }
    # Avoid failback
    nopreempt
}

vrrp_sync_group VG1
    group {
        VI_PUBLIC
    }

*************************************************************************
The most difficult  procedure is re-syncing Galera Mariadb cluster
*************************************************************************
https://github.com/beekhof/osp-ha-deploy/blob/master/keepalived/galera-bootstrap.md

Due to nova services start not waiting for getting in sync Galera databases, after sync is done and regardless systemctl reports that service are up and running, database update by `openstack-service restart nova` is required on every Controller.  Also the most suspicious reason for failure access Nova metadata Server by starting VMs is failure to start neutron-l3-agent service  on each Controller due to classical design - VM's access metadata via neutron-ns-metadata-proxy running in qrouter namespace. neutron-l3-agents may be started with no problems, some times just restarted when needed.



*****************************************
Creating Neutron Router via CLI.
*****************************************

[root@hacontroller1 ~(keystone_admin)]# cat  keystonerc_admin
export OS_USERNAME=admin
export OS_TENANT_NAME=admin
export OS_PROJECT_NAME=admin
export OS_REGION_NAME=regionOne
export OS_PASSWORD=keystonetest
export OS_AUTH_URL=http://controller-vip.example.com:35357/v2.0/
export OS_SERVICE_ENDPOINT=http://controller-vip.example.com:35357/v2.0
export OS_SERVICE_TOKEN=$(cat /root/keystone_service_token)

export PS1='[\u@\h \W(keystone_admin)]\$ '


[root@hacontroller1 ~(keystone_admin)]# keystone tenant-list

+----------------------------------+----------+---------+
|                id                |   name   | enabled |
+----------------------------------+----------+---------+
| acdc927b53bd43ae9a7ed657d1309884 |  admin   |   True  |
| 7db0aa013d60434996585c4ee359f512 |   demo   |   True  |
| 9d8bf126d54e4d11a109bd009f54a87f   | services |   True  |
+----------------------------------+----------+---------+

[root@hacontroller1 ~(keystone_admin)]# neutron router-create --ha True --tenant-id 7db0aa013d60434996585c4ee359f512  RouterDS
Created a new router:
+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up   |   True                                 |
| distributed          | False                                |
| external_gateway_info |                                      |
| ha                       | True                                 |
| id                        | fdf540d2-c128-4677-b403-d71c796d7e18 |
| name                  | RouterDS                             |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id           | 7db0aa013d60434996585c4ee359f512   |
+-----------------------+--------------------------------------+


    

    RUN Time Snapshots. Keepalived status on Controller's nodes

   

   HA Neutron router belonging tenant demo create via Neutron CLI 
 

***********************************************************************
 At this point hacontroller1 goes down. On hacontroller2 run :-
***********************************************************************
root@hacontroller2 ~(keystone_admin)]# neutron l3-agent-list-hosting-router RouterHA
+--------------------------------------+---------------------------+----------------+-------+----------+
| id                                   | host                      | admin_state_up | alive | ha_state |
+--------------------------------------+---------------------------+----------------+-------+----------+
| a03409d2-fbe9-492c-a954-e1bdf7627491 | hacontroller2.example.com | True           | :-)   | active   |
| 0d6e658a-e796-4cff-962f-06e455fce02f | hacontroller1.example.com | True           | xxx   | active   |
+--------------------------------------+---------------------------+----------------+-------+-------

  
***********************************************************************
 At this point hacontroller2 goes down. hacontroller1 goes up :-
***********************************************************************


          Nova Services status on all Controllers
  



     Neutron Services status on all Controllers  


   Compute Node status
  

  

 ******************************************************************************
 Cloud VM (L3) at runtime . Accessibility from F23 Virtualization Host,
 running HA 3  Nodes Controller and Compute Node VMs (L2)
 ******************************************************************************
[root@fedora23wks ~]# ping  10.10.10.103
PING 10.10.10.103 (10.10.10.103) 56(84) bytes of data.
64 bytes from 10.10.10.103: icmp_seq=1 ttl=63 time=1.14 ms
64 bytes from 10.10.10.103: icmp_seq=2 ttl=63 time=0.813 ms
64 bytes from 10.10.10.103: icmp_seq=3 ttl=63 time=0.636 ms
64 bytes from 10.10.10.103: icmp_seq=4 ttl=63 time=0.778 ms
64 bytes from 10.10.10.103: icmp_seq=5 ttl=63 time=0.493 ms
^C
--- 10.10.10.103 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 0.493/0.773/1.146/0.218 ms

[root@fedora23wks ~]# ssh -i oskey1.priv fedora@10.10.10.103
Last login: Tue Nov 17 09:02:30 2015
[fedora@vf23dev ~]$ uname -a
Linux vf23dev.novalocal 4.2.5-300.fc23.x86_64 #1 SMP Tue Oct 27 04:29:56 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

  
   
  

 ********************************************************************************
 Verifying neutron workflow on 3 node controller been built via patch:-
 ********************************************************************************
[root@hacontroller1 ~(keystone_admin)]# ovs-ofctl show br-eth0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000baf0db1a854f
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(eth0): addr:52:54:00:aa:0e:fc
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 2(phy-br-eth0): addr:46:c0:e0:30:72:92
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-eth0): addr:ba:f0:db:1a:85:4f
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@hacontroller1 ~(keystone_admin)]# ovs-ofctl dump-flows  br-eth0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=15577.057s, table=0, n_packets=50441, n_bytes=3262529, idle_age=2, priority=4,in_port=2,dl_vlan=3 actions=strip_vlan,NORMAL
 cookie=0x0, duration=15765.938s, table=0, n_packets=31225, n_bytes=1751795, idle_age=0, priority=2,in_port=2 actions=drop
 cookie=0x0, duration=15765.974s, table=0, n_packets=39982, n_bytes=42838752, idle_age=1, priority=0 actions=NORMAL

Check `ovs-vsctl show`

 Bridge br-int
        fail_mode: secure
        Port "tapc8488877-45"
            tag: 4
            Interface "tapc8488877-45"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "tap14aa6eeb-70"
            tag: 2
            Interface "tap14aa6eeb-70"
                type: internal
        Port "qr-8f5b3f4a-45"
            tag: 2
            Interface "qr-8f5b3f4a-45"
                type: internal
        Port "int-br-eth0"
            Interface "int-br-eth0"
                type: patch
                options: {peer="phy-br-eth0"}
        Port "qg-34893aa0-17"
            tag: 3



[root@hacontroller2 ~(keystone_demo)]# ovs-ofctl show  br-eth0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000b6bfa2bafd45
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(eth0): addr:52:54:00:73:df:29
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 2(phy-br-eth0): addr:be:89:61:87:56:20
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-eth0): addr:b6:bf:a2:ba:fd:45
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

[root@hacontroller2 ~(keystone_demo)]# ovs-ofctl dump-flows  br-eth0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=15810.746s, table=0, n_packets=0, n_bytes=0, idle_age=15810, priority=4,in_port=2,dl_vlan=2 actions=strip_vlan,NORMAL
 cookie=0x0, duration=16105.662s, table=0, n_packets=31849, n_bytes=1786827, idle_age=0, priority=2,in_port=2 actions=drop
 cookie=0x0, duration=16105.696s, table=0, n_packets=39762, n_bytes=2100763, idle_age=0, priority=0 actions=NORMAL

Check `ovs-vsctl show`

   Bridge br-int
        fail_mode: secure
        Port "qg-34893aa0-17"
            tag: 2
            Interface "qg-34893aa0-17"
                type: internal


Qrouter's namespace output interface   qg-xxxxxx sends vlan tagged packets to eth0 (which has VLAN=yes, see link bellow) , but OVS bridge br-eth0 is not aware of vlan tagging , it strips tags before sending packets outside into external flat network. In case of external network providers qg-xxxxxx interfaces are on br-int and that is normal. I believe that it's core reason why patch  https://github.com/beekhof/osp-ha-deploy/commit/b2e01e86ca93cfad9ad01d533b386b4c9607c60d
works pretty stable. This issue doesn't show up on single controller and appears
to be critical for HAProxy/Keepalived 3 node controllers cluster at least via my
experience.

Per Lars Kellogg-Stedman [ 1 ]
  1. The packet exits the qg-... interface of the router (where it is assigned the VLAN tag associated with the external network). (N)
  2. The packet is delivered to the external bridge, where a flow rule strip the VLAN tag. (P)
  3. The packet is sent out the physical interface associated with the bridge.

References
1.  http://blog.oddbit.com/2015/08/13/provider-external-networks-details/
2.  https://ask.openstack.org/en/question/85055/how-does-external-network-provider-work-flatvlangre/
 

No comments:

Post a Comment