rConfig on CentOS 6.6

I’ve used RANCID in the past, but I wanted to use a more modern configuration management tool at my current organization. I’ve been following the rConfig project for a while now and recently setup an instance of it on our network.

Here is a short guide on how to get rConfig up and running on a CentOS instance.

# CentOS not getting DHCP address on eth0 under VMware
/etc/sysconfig/network-scripts/ifcfg-eth0
edit and set to yes

# Install some tools
# I prefer vim over vi
sudo yum install net-tools
sudo yum install wget
sudo yum install zip unzip
sudo yum install vim-common vim-minimal vim-enhanced vim-X11

# Install Apache
sudo yum install httpd

#Install MySQL
yum install mysql mysql-server
service mysqld start

#Install PHP
yum install php php-common
yum install php-common php-cli php-mysql php-devel

# Service Restarts
service httpd restart
chkconfig httpd on
service mysqld restart
chkconfig mysqld on

# Adjust firewall to allow for inbound http
sudo iptables -I INPUT -p tcp --dport 80 -j ACCEPT
sudo service iptables save

# Use wget to get the rConfig zip from http://www.rconfig.com/index.php/download-menu
download, unzip to /home/rconfig

# Change ownership
chown -R apache /home/rconfig
mv /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.original
cp /home/rconfig/www/install/httpd.conf.new /etc/httpd/conf/httpd.conf

# Adjust permissions
vim /etc/selinux/config
"SELINUX=enforcing" to "SELINUX=disabled"

# Test that Apache and MySQL autostart
reboot

# Setup rConfig via the web interface
http://ipaddress/install/preinstall.php

# Create your rConfig user
mysql
CREATE USER 'rconfig_user'@'localhost' IDENTIFIED BY 'some-password';
GRANT ALL PRIVILEGES ON * . * TO 'rconfig_user'@'localhost';
FLUSH PRIVILEGES;

# Change rconfig_user password
SET PASSWORD FOR 'rconfig_user'@'localhost' = PASSWORD('some-password');

# Setup NTP
yum install ntp ntpdate ntp-doc
chkconfig ntpd on
ntpdate pool.ntp.org
/etc/init.d/ntpd start

 

GNS3 and VRRP Timers

While testing out a VRRP solution, I noticed that it was not performing as expected. The VRRP address was unresponsive so I started to investigate. Turning on console logging, I saw a large amount of flapping between Backup and Master states.

...
*Mar  1 02:37:23.739: VRRP: Grp 1 Event - Master down timer expired
*Mar  1 02:37:23.739: %VRRP-6-STATECHANGE: Vl20 Grp 1 state Backup -> Master
*Mar  1 02:37:25.095: %VRRP-6-STATECHANGE: Vl20 Grp 1 state Master -> Backup
...

It turns out that running 8 routers in GNS3 on my laptop was slightly under-powered platform and resulting in over a 2 second maximum response time from a VRRP peer.

Sending 8000, 100-byte ICMP Echos to 10.10.20.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!..!!!..........................
......................................................!!!!!!!!!!!!!!!!
!!....................................................................
.....................!!!......................!!!!!!!!!!!.!!!!!!!!!!!!
..!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!.!!!!!!!!!!.!!!!!!!!!!.!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.
Success rate is 74 percent (611/818), round-trip min/avg/max = 4/705/1996 ms
Server-A#

After adjusting the advertise timers, everything started to perform as expected.

R1#
interface Vlan20
 ip address 10.10.20.2 255.255.255.0
 vrrp 1 ip 10.10.20.1
 vrrp 1 timers advertise 10
 vrrp 1 priority 110
 
R2#
interface Vlan20
 ip address 10.10.20.3 255.255.255.0
 vrrp 1 ip 10.10.20.1
 vrrp 1 timers advertise 10

 

CCNP Achieved

I passed CCNP Route 642-813 in January before the exam changed thus completing all three exams. Route was the most challenging of the three exams for me because I am now taking the lead on projects that involve routing, which is part of why I wanted to peruse the certification. Exciting times and I’ve started to take a peek at the CCIE 5.0 exam.

url

Advanced Light Source User Meeting

I was at the Advanced Light Source User Meeting as a representative of LBLnet today talking about the architecture of the Science DMZ to enable big data transfers across the WAN. We had an elegant poster that showed how the DMZ architecture fits into the enterprise design. There are still groups that are saving large data sets to hard drives and shipping them to the destination location rather than attempting to utilize the network and we want to help change that paradigm.

Credit for the majority of the design goes to my co-worker Michael at smitasin.com.

IMG_0199

IMG_0200

TP-LINK Powerline Adapter Performance

For the longest time I would never advocate Poweline Ethernet as a viable solution for getting connectivity into a troublesome area. I felt that the technology was prone to interference and therefore an unreliable solution that could never deliver a consistent connection.

After a few failed attempts to trace Cat5 cable into the garage in my San Francisco apartment in order to connect two pairs to get 100MBPS connectivity between the front and back of the apartment, I decided to try out a Powerline Ethernet solution. I picked up a pair of TP-LINK 200Mbps units and was surprised at the setup procedure. It was fairly easy and I had the units connected in under five minutes.

Of course I wanted to test performance so I started to graph latency to the wireless router on the other side of the apartment.

2014-07-07_device_ping

An iperf test also showed a solid 28.9 Mbits/second transfer rate.

C:\tools>iperf -c 10.10.1.27 -p 200 -t 120
------------------------------------------------------------
Client connecting to 10.10.1.27, TCP port 200
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[156] local 10.10.1.60 port 58742 connected with 10.10.1.27 port 200
[ ID] Interval       Transfer     Bandwidth
[156]  0.0-120.0 sec   413 MBytes  28.9 Mbits/sec

C:\tools>

Overall I’ve been impressed by the performance and now have Ethernet extended to the other half of the apartment. Not bad for a $30 connectivity solution.

Reserved IP Addresses in prefix-list Format

Use these with the load merge terminal command for easy cut-and-pasting in Junos.

policy-options {
    prefix-list localhost {
        127.0.0.1/32;
    }
    prefix-list martians-IPv4 {
        0.0.0.0/8;
        10.0.0.0/8;
        127.0.0.0/8;
        169.254.0.0/16;
        172.16.0.0/12;
        192.168.0.0/16;
    }
    prefix-list multicast {
        224.0.0.0/4;
    }
    prefix-list multicast-all-systems {
        224.0.0.1/32;
    }
    prefix-list rfc1918 {
        10.0.0.0/8;
        172.16.0.0/12;
        192.168.0.0/16;
    }
    prefix-list martians-IPv6 {
        ::/96;
        ::1/128;
        fe80::/10;
        fec0::/10;
        ff00::/8;
        ff02::/16;
    }
    prefix-list other-bad-src-addrs-IPv6 {
        ::/128;
        ::ffff:0.0.0.0/96;
        ::ffff:10.0.0.0/104;
        ::ffff:127.0.0.0/104;
        ::ffff:172.16.0.0/108;
        ::ffff:192.168.0.0/112;
        ::ffff:224.0.0.0/100;
        ::ffff:240.0.0.0/100;
        ::ffff:255.0.0.0/104;
        2001:db8::/32;
        2002:0000::/24;
        2002:0a00::/24;
        2002:7f00::/24;
        2002:ac10::/28;
        2002:c0a8::/32;
        2002:e000::/20;
        2002:ff00::/24;
        3ffe::/16;
        fc00::/7;
    }
}

ARP Cache Poisoning

Overview

We received reports from end-users that a few client workstations on a specific subnet were experiencing around 70% packet loss when attempting to communicate between a few hosts. Since the initial report seemed to be rather isolated, we started with some basic ping tests, but as time went on and more hosts became affected, we started to escalate the issue and increasing our troubleshooting efforts.

Ping Tests

Take a look at the following ping samples from two different sources to different targets that were passing traffic on the switch in question:

-bash-4.1$ ping host1
PING host1 (1.2.3.4) 56(84) bytes of data.
64 bytes from host1 (1.2.3.4): icmp_seq=1 ttl=128 time=1.51 ms
64 bytes from host1 (1.2.3.4): icmp_seq=35 ttl=128 time=0.291 ms
64 bytes from host1 (1.2.3.4): icmp_seq=36 ttl=128 time=0.361 ms
64 bytes from host1 (1.2.3.4): icmp_seq=37 ttl=128 time=0.400 ms
64 bytes from host1 (1.2.3.4): icmp_seq=38 ttl=128 time=0.264 ms
64 bytes from host1 (1.2.3.4): icmp_seq=39 ttl=128 time=0.356 ms
64 bytes from host1 (1.2.3.4): icmp_seq=40 ttl=128 time=0.419 ms
64 bytes from host1 (1.2.3.4): icmp_seq=41 ttl=128 time=0.260 ms
64 bytes from host1 (1.2.3.4): icmp_seq=42 ttl=128 time=0.349 ms
64 bytes from host1 (1.2.3.4): icmp_seq=43 ttl=128 time=0.416 ms
64 bytes from host1 (1.2.3.4): icmp_seq=44 ttl=128 time=0.429 ms
64 bytes from host1 (1.2.3.4): icmp_seq=45 ttl=128 time=0.314 ms
64 bytes from host1 (1.2.3.4): icmp_seq=46 ttl=128 time=0.359 ms
64 bytes from host1 (1.2.3.4): icmp_seq=47 ttl=128 time=0.447 ms
64 bytes from host1 (1.2.3.4): icmp_seq=48 ttl=128 time=0.287 ms
64 bytes from host1 (1.2.3.4): icmp_seq=49 ttl=128 time=0.405 ms
64 bytes from host1 (1.2.3.4): icmp_seq=50 ttl=128 time=0.416 ms
^C
-bash-4.1$ ping host2
PING host2 (2.3.4.5) 56(84) bytes of data.
64 bytes from host2 (2.3.4.5): icmp_seq=1 ttl=128 time=1.21 ms
64 bytes from host2 (2.3.4.5): icmp_seq=30 ttl=128 time=0.484 ms
64 bytes from host2 (2.3.4.5): icmp_seq=59 ttl=128 time=0.467 ms
64 bytes from host2 (2.3.4.5): icmp_seq=83 ttl=128 time=0.197 ms
64 bytes from host2 (2.3.4.5): icmp_seq=84 ttl=128 time=0.241 ms
64 bytes from host2 (2.3.4.5): icmp_seq=85 ttl=128 time=0.210 ms
64 bytes from host2 (2.3.4.5): icmp_seq=86 ttl=128 time=0.240 ms
64 bytes from host2 (2.3.4.5): icmp_seq=87 ttl=128 time=0.171 ms
64 bytes from host2 (2.3.4.5): icmp_seq=88 ttl=128 time=0.216 ms
64 bytes from host2 (2.3.4.5): icmp_seq=89 ttl=128 time=0.194 ms
64 bytes from host2 (2.3.4.5): icmp_seq=90 ttl=128 time=0.392 ms
64 bytes from host2 (2.3.4.5): icmp_seq=91 ttl=128 time=0.240 ms
64 bytes from host2 (2.3.4.5): icmp_seq=92 ttl=128 time=0.235 ms
64 bytes from host2 (2.3.4.5): icmp_seq=93 ttl=128 time=0.222 ms
^C

The first response in both samples take a little longer than what we expected for two hosts connected to a local Gigabit switch, but falls within the profiled delay for ARP resolution. After the longer initial ping response time, all subsequent pings fall within the expected value for the local network.

The TTL values all look normal so we know the traffic isn’t leaving the local network and hitting additional hops, but take a look at the sequence numbers; they are not sequential and show a sign of a larger problem.

In the first sample, sequence numbers 2-34 were lost as were 2-29 in the second sample. We know that around 30 seconds of traffic was being completely lost in both test cases.

Find the Layer

At this point we could rule out a Layer 1 issues so we knew there was no issue with our optics being dirty or the input/output queues on any of the switching equipment.

There had to be a Layer 2 issue, which could include bridging, MAC address learning occurring on different pieces of equipment at different times, competition for an IP, or ARP poisoning.

As more time passed, more workstations started to notice more loss of traffic on the network. All signs pointed to something wrong with the ARP table. A show mac address-table command on one of the switches showed that a number of hosts were associated with a MAC address of 00:00:00:00:00:00 and that number was increasing over time.

Wireshark

After getting a SPAN port on the switch and looking at traffic with Wireshark, we found a large number of responses to ARP requests with a value of 00:00:00:00:00:00 for hosts in the local subnet that were all sourced from one machine. This one computer was poisoning the ARP cache on the network with all zeros, causing the location for every host to eventually become a blackhole for traffic.

As more ARP caches on end-user machines and switching equipment were timing out, they were sending out new ARP requests and getting poisoned information. Once the offending machine’s port was shut down, we started to notice traffic return to normal.