Reserved IP Addresses in prefix-list Format

Use these with the load merge terminal command for easy cut-and-pasting in Junos.

policy-options {
    prefix-list localhost {
        127.0.0.1/32;
    }
    prefix-list martians-IPv4 {
        0.0.0.0/8;
        10.0.0.0/8;
        127.0.0.0/8;
        169.254.0.0/16;
        172.16.0.0/12;
        192.168.0.0/16;
    }
    prefix-list multicast {
        224.0.0.0/4;
    }
    prefix-list multicast-all-systems {
        224.0.0.1/32;
    }
    prefix-list rfc1918 {
        10.0.0.0/8;
        172.16.0.0/12;
        192.168.0.0/16;
    }
    prefix-list martians-IPv6 {
        ::/96;
        ::1/128;
        fe80::/10;
        fec0::/10;
        ff00::/8;
        ff02::/16;
    }
    prefix-list other-bad-src-addrs-IPv6 {
        ::/128;
        ::ffff:0.0.0.0/96;
        ::ffff:10.0.0.0/104;
        ::ffff:127.0.0.0/104;
        ::ffff:172.16.0.0/108;
        ::ffff:192.168.0.0/112;
        ::ffff:224.0.0.0/100;
        ::ffff:240.0.0.0/100;
        ::ffff:255.0.0.0/104;
        2001:db8::/32;
        2002:0000::/24;
        2002:0a00::/24;
        2002:7f00::/24;
        2002:ac10::/28;
        2002:c0a8::/32;
        2002:e000::/20;
        2002:ff00::/24;
        3ffe::/16;
        fc00::/7;
    }
}

ARP Cache Poisoning

Overview

We received reports from end-users that a few client workstations on a specific subnet were experiencing around 70% packet loss when attempting to communicate between a few hosts. Since the initial report seemed to be rather isolated, we started with some basic ping tests, but as time went on and more hosts became affected, we started to escalate the issue and increasing our troubleshooting efforts.

Ping Tests

Take a look at the following ping samples from two different sources to different targets that were passing traffic on the switch in question:

-bash-4.1$ ping host1
PING host1 (1.2.3.4) 56(84) bytes of data.
64 bytes from host1 (1.2.3.4): icmp_seq=1 ttl=128 time=1.51 ms
64 bytes from host1 (1.2.3.4): icmp_seq=35 ttl=128 time=0.291 ms
64 bytes from host1 (1.2.3.4): icmp_seq=36 ttl=128 time=0.361 ms
64 bytes from host1 (1.2.3.4): icmp_seq=37 ttl=128 time=0.400 ms
64 bytes from host1 (1.2.3.4): icmp_seq=38 ttl=128 time=0.264 ms
64 bytes from host1 (1.2.3.4): icmp_seq=39 ttl=128 time=0.356 ms
64 bytes from host1 (1.2.3.4): icmp_seq=40 ttl=128 time=0.419 ms
64 bytes from host1 (1.2.3.4): icmp_seq=41 ttl=128 time=0.260 ms
64 bytes from host1 (1.2.3.4): icmp_seq=42 ttl=128 time=0.349 ms
64 bytes from host1 (1.2.3.4): icmp_seq=43 ttl=128 time=0.416 ms
64 bytes from host1 (1.2.3.4): icmp_seq=44 ttl=128 time=0.429 ms
64 bytes from host1 (1.2.3.4): icmp_seq=45 ttl=128 time=0.314 ms
64 bytes from host1 (1.2.3.4): icmp_seq=46 ttl=128 time=0.359 ms
64 bytes from host1 (1.2.3.4): icmp_seq=47 ttl=128 time=0.447 ms
64 bytes from host1 (1.2.3.4): icmp_seq=48 ttl=128 time=0.287 ms
64 bytes from host1 (1.2.3.4): icmp_seq=49 ttl=128 time=0.405 ms
64 bytes from host1 (1.2.3.4): icmp_seq=50 ttl=128 time=0.416 ms
^C
-bash-4.1$ ping host2
PING host2 (2.3.4.5) 56(84) bytes of data.
64 bytes from host2 (2.3.4.5): icmp_seq=1 ttl=128 time=1.21 ms
64 bytes from host2 (2.3.4.5): icmp_seq=30 ttl=128 time=0.484 ms
64 bytes from host2 (2.3.4.5): icmp_seq=59 ttl=128 time=0.467 ms
64 bytes from host2 (2.3.4.5): icmp_seq=83 ttl=128 time=0.197 ms
64 bytes from host2 (2.3.4.5): icmp_seq=84 ttl=128 time=0.241 ms
64 bytes from host2 (2.3.4.5): icmp_seq=85 ttl=128 time=0.210 ms
64 bytes from host2 (2.3.4.5): icmp_seq=86 ttl=128 time=0.240 ms
64 bytes from host2 (2.3.4.5): icmp_seq=87 ttl=128 time=0.171 ms
64 bytes from host2 (2.3.4.5): icmp_seq=88 ttl=128 time=0.216 ms
64 bytes from host2 (2.3.4.5): icmp_seq=89 ttl=128 time=0.194 ms
64 bytes from host2 (2.3.4.5): icmp_seq=90 ttl=128 time=0.392 ms
64 bytes from host2 (2.3.4.5): icmp_seq=91 ttl=128 time=0.240 ms
64 bytes from host2 (2.3.4.5): icmp_seq=92 ttl=128 time=0.235 ms
64 bytes from host2 (2.3.4.5): icmp_seq=93 ttl=128 time=0.222 ms
^C

The first response in both samples take a little longer than what we expected for two hosts connected to a local Gigabit switch, but falls within the profiled delay for ARP resolution. After the longer initial ping response time, all subsequent pings fall within the expected value for the local network.

The TTL values all look normal so we know the traffic isn’t leaving the local network and hitting additional hops, but take a look at the sequence numbers; they are not sequential and show a sign of a larger problem.

In the first sample, sequence numbers 2-34 were lost as were 2-29 in the second sample. We know that around 30 seconds of traffic was being completely lost in both test cases.

Find the Layer

At this point we could rule out a Layer 1 issues so we knew there was no issue with our optics being dirty or the input/output queues on any of the switching equipment.

There had to be a Layer 2 issue, which could include bridging, MAC address learning occurring on different pieces of equipment at different times, competition for an IP, or ARP poisoning.

As more time passed, more workstations started to notice more loss of traffic on the network. All signs pointed to something wrong with the ARP table. A show mac address-table command on one of the switches showed that a number of hosts were associated with a MAC address of 00:00:00:00:00:00 and that number was increasing over time.

Wireshark

After getting a SPAN port on the switch and looking at traffic with Wireshark, we found a large number of responses to ARP requests with a value of 00:00:00:00:00:00 for hosts in the local subnet that were all sourced from one machine. This one computer was poisoning the ARP cache on the network with all zeros, causing the location for every host to eventually become a blackhole for traffic.

As more ARP caches on end-user machines and switching equipment were timing out, they were sending out new ARP requests and getting poisoned information. Once the offending machine’s port was shut down, we started to notice traffic return to normal.

Intel NIC Broadcast Storm

As part of a standardization project, we have been enabling new port-security options on our Access switches that provide connectivity for end-users. When we made this change for a switch that serves around 240 users, we started to receive alerts for port security violations from three hosts at very inconsistent hours. Below is a small sample of one of the broadcast storms.

2014-04-08_syslog

Given the large amount of MAC addresses that were broadcast in a short amount of time, the switchport port-security maximum 50 was being triggered after the switch saw the 51st MAC address.

interface GigabitEthernet1/1
 description Access Port
 switchport access vlan 200
 switchport mode access
 switchport port-security maximum 50
 switchport port-security
 switchport port-security aging time 1
 switchport port-security violation restrict
 no logging event link-status
 storm-control broadcast level 3.40
 storm-control action trap
 spanning-tree portfast
 ip dhcp snooping limit rate 50

I consolidated all the MAC addresses seen into a table and was not able to find any duplicates. A search on a OIU database also showed that they were unregistered so they appeared to be randomly generated.

2014-04-08_mac-list

Looking at the MAC address-table for each port after the storm incident, I discovered that each port contained only a single Dell computer with a Intel 82579M Gigabit NIC. Some research lead me to a case of OptiPlex 790, 7010, 9010 and Latitude E6520/E6530 systems generating a network broadcast storm after coming out of sleep mode (2) and requiring a driver update on the Intel NIC in order to fix the issue.

References

  1. http://forums.juniper.net/t5/Ethernet-Switching/Power-saving-NICs-Dell-causing-EX3300-VC-port-problems/td-p/182897
  2. https://supportforums.cisco.com/discussion/11141666/port-secuity-issue-win-7
  3. http://www.dell.com/support/troubleshooting/bz/en/bzdhs1/KCS/KcsArticles/ArticleView?c=bz&l=en&s=dhs&docid=615706
  4. http://www.networksteve.com/windows/topic.php/Vista_Sleep_Mode_and_MAC_addresses/?TopicId=25326&Posts=1

Catalyst Spring Cleaning

Don’t forget to remove dust from the inlet ports on your Catalyst 4500 chassis on a routine basis if they exist in locations where they are exposed to large amounts of particulates. You don’t want to be woken up at 6:19AM by a downed device.

Mar 18 06:19:40 1.2.3.4 : %C4K_IOSMODPORTMAN-2-MODULESHUTDOWNTEMP: Module 1 Sensor air outlet temperature is at or over shutdown threshold - current temp: 86C, shutdown threshold: 86C
Mar 18 08:25:23 1.2.3.4 : %C4K_IOSMODPORTMAN-2-MODULESHUTDOWNTEMP: Module 1 Sensor air outlet temperature is at or over shutdown threshold - current temp: 86C, shutdown threshold: 86C.
Mar 18 09:34:49 1.2.3.4 : %C4K_IOSMODPORTMAN-2-MODULESHUTDOWNTEMP: Module 1 Sensor air outlet temperature is at or over shutdown threshold - current temp: 86C, shutdown threshold: 86C

Sadly the Syslog events never made it to our central repository. Our solution for the first automated shutdown was to simply power the device back on to enable connectivity for the building. After two more instances of shutting itself off, we reviewed the logs on the local device and discovered that they never reached out syslog server.

A simple wipe of accumulated dust saw our temperature take a drastic drop.

2014-04-03_dust

Dropbox LAN sync Protocol Noise

An end-user recently submitted a ticket that the connection sharing switch in his office was blinking more than usual and wanted confirmation that nothing malicious was occurring on the network. I took a small sample of the traffic that was on the end-user’s switch with Wireshark and found a surprising amount of Dropbox noise being broadcast on the network.

In our /22 subnet, I found 88 hosts that were advertising UDP/17500 Dropbox LanSync Protocol (db-lsp) traffic. The Dropbox advertisements were accounting for 67% of our total UDP traffic and were occurring on average every 0.14 seconds using 13.3 kbps of bandwidth. Hardly an impact on a Gigabit connection, but I can see from an end-user perspective that when they plug a quiet switch into the network there is a change in LED behavior.

dropbox-db-lsp

 

802.3az Energy-Efficient Ethernet on Juniper EX3300 Switches

Unlike Cisco 2960-X switches, 802.3az does not come enabled by default on Juniper EX3300 models. Use ether-options ieee-802-3az under the interface-range tree to enable this energy saving protocol.

interfaces {
    interface-range WIRED_PORTS {
        member-range ge-0/0/0 to ge-0/0/47;
        ether-options {
            ieee-802-3az-eee;
        }
    }
}

When committing this option, we noticed around 8 seconds of connectivity loss for 96 wired connected hosts. Be careful when enabling this in a production setting.

Visualizing Subnet Boundaries

I was recently helping a friend understand how to grasp where hosts can exist given various CIDR block combinations. After going over the math, I started to draw on the whiteboard and came up with this chart. I am a visual learner and I’ve always kept this style of grouping in my mind when thinking about various subnets. Here’s a quick digital version of what I drew to share.

Fx7b2Bm

10 Gigabit Speeds in Science DMZs

I come from a background of enabling office users with Internet access for general business purposes with ingress traffic filtering and NAT services being handled at the border with a firewall appliance. Over the past few months, I’ve had to shift my paradigm to one that doesn’t contain a stateful ingress filtering firewall, which has been a culture shock for my Cisco “Defense in Depth” ideals.

Part of my focus at Berkeley Lab is to enable scientists with the ability to transfer large amounts of research datasets over the network. Their transfers may remain locally within our laboratory campus, transverse the San Francisco Bay Area, or end up going across the globe to institutions located many miles away. These three types of transfers all present the same challenge of keeping total RTT low, making sure that interface buffers are undersubscribed, and maintaining an ideal zero packet loss for the transmission.

Science DMZ Architecture

The Science DMZ architecture, coined by collaborators in 2010 from the US Department of Energy’s Energy Sciences Network (ESnet), enables us to approach 10G transfers speeds for our scientists. The design calls for the network to be attached to the border router, disparate from the stateful firewall protected Internal network, with end-to-end 10G connectivity between the WAN and the storage network with a Data Transfer Node (DTN) facilitating transfers.

Science DMZ

With this architecture, data acquisition and data transfer steps are separated into two discrete processes:

  • First the acquisition hardware, which could be a camera, sensor, or other recording device, writes information to a local storage array. This array is usually Solid State in order to accommodate the volume of the incoming data stream(s).
  • The second step is to transfer the data to high performance processing nodes that are not contained in the Science DMZ. The transfer method in this step could be single-stream FTP, parallel-stream GridFTP, or SaaS transfers to locations like Globus.

The workflow that we have seen scientists operate under is that they will often discard or re-sample datasets and only send a fraction of the captured data for processing to offsite nodes. With this two step process, the amount of data that goes offsite for processing is reduced as it has already been pre-filtered by the scientists.

Case Study

Below is a sample of a recent transfer conducted on one of our Science DMZ networks that is connected at 10G speed. As you can see the performance is far below the expected theoretical max. There are many pieces of equipment that needs to be optimized in order to achieve near-10G speeds:

  1. Storage Array I/O
  2. File System on the Array
  3. NIC on the DTN
  4. Buffers on the DMZ Switch
  5. Buffers on the Border Router
  6. End-to-end Jumbo Frames
  7. Transfer Protocol

2014-03-13_10g_cacti

In this case we found that the local storage array was not able to saturate the network even with 16x1TB SSDs running in RAID5. The resulting transfers peaking around 2.3 GBPS; our tests with perfSONAR showed that our network equipment is capable of pushing up to 9 GBPS.

Achieving 10 Gigabit speeds on a host-to-host transfer is not as simple as I thought as it requires the optimization along many layers of hardware and software. For further information, a detailed list of client optimization steps can be found on the Network Tuning page at fasterdata.es.net.

Introduction to DNS and BIND

This week I attended a two day long session on DNS and BIND hosted by Internet Systems Consortium with Alan Clegg as the instructor. Coming from a background of Active Directory-Integrated DNS, this was a wonderful opportunity to have hands on exposure to BIND.

The class started with topics that covered RFC 799 (1981) and RFC 822/833 (1983), DNS namespace, name resolution, caching, recursion, iteration, and stub resolvers. A major focus of the class was on BIND, which included history, configuration, and hands-on lab time to setup and troubleshoot common BIND issues.

A major benefit of being in the ISC office was the ability to talk to people that operate the F-Root servers. We have the opportunity to engineers about their software life-cycle, patching procedures, common support issues, and thread mitigation techniques.

Interface Message Processor

While in Redwood City, CA and I had the chance to get a picture of a former production IMP (Interface Message Processor) from the early 70’s. Reading about the early architecture of ARPANET sparked my interest in telecommunications at an early age; it was a unique opportunity to see one of the first routers that moved traffic in the precursor to the modern day Internet.

IMG_0095

IMG_0091

Based on Report 2059 from Bolt Beranek and Newman Inc. in October of 1970, it looks like IMP #11 was installed at Stanford University for testing on a new 230.4 kilobit/sec circuit in the third quarter of 1970.