Site Maintenance with Arista eAPI

I started a public GitHub Repository in order to share some of my Python automation tools.

I work remotely on a large number of Arista switches and I have developed a script that captures the state of the network in order to run a comparison before and after network maintenance events. We have a monitoring system in place that alerts on various SNMP traps and events, but often times I need to interpret a number of changes via my network engineer lens to confirm that a maintenance period has been completed successfully.

The first Python script I’m publishing interfaces with an Arista switch over HTTPS and captures various outputs via the eAPI: inventory with serial numbers, vlan states, mac address table, lldp neighbors, routing protocol states, and route entries to a text file. A diff on the files can be run post maintenance event to highlight any changes for review.

This type of granular record can show that an item such as a metric on a route has changed. Our monitoring system does not alert on this type of change as it is monitoring the BGP peer status.

 

LinkĀ github.com/brentnowak/arista-tools

Advertisement

Using Arista Telemetry to Monitor Network State

Arista’s Telemetry product allows you to stream network state in real-time from each piece of switching hardware to a central management point. Recently I’ve been waiting to prove out the visibility of the Telemetry product during a maintenance event. I recently had the opportunity when we were switching carriers on a transatlantic link between two sites.

The screenshot below shows a 30 minute slice of time where fiber optic links were brought down on Carrier A, traffic flow changed, connectivity was restored with Carrier B, and normal traffic flow resumed.

1 – Technicians disconnect fiber for the Primary Circuit at sites A and B.
2 – As the routing neighbor between the A and B sites drop, traffic is automatically moved to the Secondary Circuits.
3 – When optical connectivity has been restored between the two sites, the Primary Circuit re-establishes. Routing reconverges and traffic is shifted back onto the Primary Circuit.

Normal workflow for a change like this involves camping Syslog and the various switches involved, issuing commands to show network activity as the maintenance progressed. With Arista’s Telemetry product, I was able to see the state of various network components (light levels, interface state, bit rate, etc) all in a real-time display.

So far an impressive way to gain visibility into network state across a bunch of different metrics at a glance. I’m looking forward to test this product with my other future projects.