Easily build an Ethernet switch from a Linux machine

Dmitrii Overchenko
9 min readMar 26, 2021

--

History — investigation, predicaments and failings

I was always interested in how to make a Layer2-switch or a router out of a bare Linux machine. Since I’ve started my career as a network engineer I became familiar with various network products and vendors, learned their advantages and disadvantages.

Once I put my hands on a Linux machine and understood immediately the power of open source software, it was a moment when I began developing idea around classical open source Ethernet switch.

Simple architecture assumes the presence of next components:

  • user-friendly interface (in 2010 it was stack of Apache, JavaScript and PHP)
  • database (requires planning of the database itself: tables, relations etc.)
  • core-engine (set of scripts to reinvigorate my idea)

…well then you need to think of how to maintain that solution (backups, updates, scaling)

OK, primarily I was a network engineer and didn’t know much about listed functions, to be honest I didn’t even know about that simple architecture. So, as you can see I’d started this path of learning and investigation long before I created this list. :-)

Long story short, I created this solution, spent lots of time and efforts (mainly discovering best practices), and my solution was far away from a production-like system. I was clearly seeing what to do next, how to improve that, but there was question I should answer — is it worth it? Don’t worry, I made this Ethernet switch to gain experience basically.

Today I found something awesome, something that allows me concentrate on my idea rather than a database architecture or user interface — something that will help me to create open source and a maintainable ethernet switch and has a user friendly interface.

Architecture

Let’s start from the VLAN definition.

A virtual LAN (VLAN) is any broadcast domain that is partitioned and isolated in a computer network at the data link layer (OSI Layer 2). The broadcast domain is partitioned and isolated in a computer network. A Linux bridge perfectly fits this requirement.
Let’s do it in the next way:

one broadcast domain — one bridge, as below’s diagram. Don’t worry about ‘MSA’ … I’ll explain what that is in due course!

Sandbox environment

To make a lightweight playground topology I decided to use Docker containers, which are simple to use and cross-platform capable.

  1. SWITCH: we need next packages on the top of basic OS (Alpine):
  • openssh (to take control over the SWITCH)
  • bash (to make some stdout parsing)
  • tcpdump (to capture/verify tagged traffic)

Docker image is 14.4MB (alpine:3.12).

2. HOSTs. we need several hosts connected to the SWITCH to test different VLAN scenarios:

  • pc_01, pc_02 — placed in default_vlan, traffic from hosts untagged.
  • pc_03 — placed in 100 vlan, traffic from host untagged.
  • pc_04 — placed in 200 vlan, traffic from host tagged.

packages required:

✓ openssh (to take control over)

Docker image is 11.7MB (alpine:3.12).

Scenarios

  1. Default state:
  • pc_01, pc_02 — can ping each other, placed in default_vlan
  • pc_03 is not reachable, placed in 100 vlan
  • pc_03 is not reachable, placed in 200 vlan, encapsulates frames with 200 802.1q tag

2. Set pc_01 in vlan 100 and ping pc_03

3. Set pc_01 in vlan 200(untagged) and ping pc_04

✓ ensure with TCPdump packets are encapsulated/de-encapsulated properly

4. Set pc_01 in vlan 200(tagged) and ping pc04

★ note: pc_01 should have tagging enabled

Well, here is docker-compose file:

version: "3.8"services:pc_01:privileged: truebuild:context: .dockerfile: pc.dockerfileports:- "60622:22"tty: truenetworks:default:ipv4_address: 172.20.0.141intranet_01:ipv4_address: 10.222.222.11hostname: pc_01pc_02:privileged: truebuild:context: .dockerfile: pc.dockerfileports:- "60722:22"tty: truenetworks:default:ipv4_address: 172.20.0.142# IP ADDRESS will be replaced with ENTRYPOINT scriptintranet_02:ipv4_address: 10.222.223.12hostname: pc_02pc_03:privileged: truebuild:context: .dockerfile: pc.dockerfileports:- "60822:22"tty: truenetworks:default:ipv4_address: 172.20.0.143# IP ADDRESS will be replaced with ENTRYPOINT scriptintranet_03:ipv4_address: 10.222.224.12hostname: pc_03pc_04:privileged: truebuild:context: .dockerfile: pc.dockerfileports:- "60922:22"tty: truenetworks:default:ipv4_address: 172.20.0.144# IP ADDRESS will be replaced with ENTRYPOINT scriptintranet_04:ipv4_address: 10.222.225.12hostname: pc_04switch:privileged: truebuild:context: .dockerfile: switch.dockerfileports:- "61022:22"tty: truenetworks:default:ipv4_address: 172.20.0.145# DUMMY ADDRESSES - interfaces will be switched into promiscuous modeintranet_01:ipv4_address: 10.222.222.10intranet_02:ipv4_address: 10.222.223.10intranet_03:ipv4_address: 10.222.224.10intranet_04:ipv4_address: 10.222.225.10hostname: switchnetworks:default:external:name: quickstart_defaultintranet_01:ipam:config:- subnet: 10.222.222.0/24intranet_02:ipam:config:- subnet: 10.222.223.0/24intranet_03:ipam:config:- subnet: 10.222.224.0/24intranet_04:ipam:config:- subnet: 10.222.225.0/24

✓ Docker compose file presumes using “quickstart_default” network created in advance for management plane.

✓ Docker requires numbered networks to be used, thus intranet networks are created, IP-prefixes allocated, but IP-prefixes will be replaced.

✓ pc_ services have network interfaces connected, network interface order matters, but Docker Compose 3 makes this order random, here below is a simple work-around script.

✓ Intranet networks — for the data plane and demo use-cases.

✓ “quickstart_default” network for control.

PC.dockerfile looks like this:

FROM alpine:3.12RUN mkdir /startWORKDIR /startCOPY ./pc.sh /startRUN apk add --no-cache opensshRUN /usr/bin/ssh-keygen -ARUN ssh-keygen -t rsa -b 4096 -f  /etc/ssh/ssh_host_keyRUN ["chmod", "+x", "/start/pc.sh"]ENTRYPOINT ["/start/pc.sh"]

SWITCH.dockerfile like this:

FROM alpine:3.12RUN mkdir /startWORKDIR /startCOPY ./switch.sh /startCOPY ./port /rootRUN apk add --no-cache openssh bash tcpdumpRUN /usr/bin/ssh-keygen -ARUN ssh-keygen -t rsa -b 4096 -f  /etc/ssh/ssh_host_keyRUN ["chmod", "+x", "/root/port"]RUN ["chmod", "+x", "/start/switch.sh"]ENTRYPOINT ["/start/switch.sh"]

PC.sh — implements a workaround as follows:

WORKAROUND to asking certain network addresses to interfaces randomized by Docker.

consider “eht0” interface one that have 172.20.0.x address assigned by Docker DHCP.

consider “eht1” interface one that have 10.222.x.y address assigned by Docker DHCP.

# WORKAROUND FOR UNCERTAIN DOCKER INTERFACE ORDEReth0=$(ifconfig | grep -B1 "inet addr:172.20.0." | awk '$1!="inet" && $1!="--" {print $1}')eth1=$(ifconfig | grep -B1 "inet addr:10.222." | awk '$1!="inet" && $1!="--" {print $1}')# CHANGE IP ADDRESS TO THE PROPER ONE AND MAKE 4th MACHINE TAGGEDNUM=`echo $HOSTNAME | grep -E -o '[1-9]'`IPADDR=`ifconfig $eth1 | grep 'inet addr' | cut -d: -f2 | awk '{print $1}'`NEW_IPADDR='10.222.222.1'$NUM'/24'

complete pc.sh - available here

For PC_04 here is 802.1q tagging enabling, for PC_01,PC_02,PC_03 — untagged:

if [ $NUM = '4' ]; thenip a d $IPADDR dev $eth1ip link add link $eth1 name $eth1.200 type vlan id 200ip a a $NEW_IPADDR dev $eth1.200iplink set $eth1.200 upelseip a d $IPADDR dev $eth1ip a a $NEW_IPADDR dev $eth1fi

SWITCH.sh

  • Assigns certain network addresses to interfaces randomized by Docker.
  • Creates tagged interface faced on PC_04.
  • Uses bridge-utils to create network broadcast domains (VLANs).

complete switch.sh — available here

So, at this step we are good to go and should be ready to start managing the SWITCH.

Framework

MSActivator (noted as MSA above in the first diagram) is an Integrated Automation Platform (IAP) — a powerful framework to create user-friendly, easy, maintainable and scalable solutions. It works with infrastructure automation and I found it the perfect tool for DevOps when I use it in the context of developing infrastructure processes.

  1. First thing — Register the SWITCH:

2. Then we need to think of how to control bridge utils — microservices will help us much:

here you can see representation of the next output:

switch:~# ip a1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever2: eth3.200@eth3: <BROADCAST,MULTICAST,UP,LOWER_UP100> mtu 1500 qdisc noqueue master vlan_200 state UP qlen 1000link/ether 02:42:0a:de:e1:0a brd ff:ff:ff:ff:ff:ff3: vlan_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000link/ether 02:42:0a:de:de:0a brd ff:ff:ff:ff:ff:ff4: vlan_100: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000link/ether 02:42:0a:de:e0:0a brd ff:ff:ff:ff:ff:ff5: vlan_200: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000link/ether 02:42:0a:de:e1:0a brd ff:ff:ff:ff:ff:ff36: eth0@if37: <BROADCAST,MULTICAST,UP,LOWER_UP100,M-DOWN> mtu 1500 qdisc noqueue master vlan_default state UPlink/ether 02:42:0a:de:de:0a brd ff:ff:ff:ff:ff:ffinet 10.222.222.10/24 brd 10.222.222.255 scope global eth0valid_lft forever preferred_lft forever48: eth1@if49: <BROADCAST,MULTICAST,UP,LOWER_UP100,M-DOWN> mtu 1500 qdisc noqueue master vlan_default state UPlink/ether 02:42:0a:de:df:0a brd ff:ff:ff:ff:ff:ffinet 10.222.223.10/24 brd 10.222.223.255 scope global eth1valid_lft forever preferred_lft forever50: eth2@if51: <BROADCAST,MULTICAST,UP,LOWER_UP100,M-DOWN> mtu 1500 qdisc noqueue master vlan_100 state UPlink/ether 02:42:0a:de:e0:0a brd ff:ff:ff:ff:ff:ffinet 10.222.224.10/24 brd 10.222.224.255 scope global eth2valid_lft forever preferred_lft forever54: eth3@if55: <BROADCAST,MULTICAST,UP,LOWER_UP100,M-DOWN> mtu 1500 qdisc noqueue master vlan_default state UPlink/ether 02:42:0a:de:e1:0a brd ff:ff:ff:ff:ff:ffinet 10.222.225.10/24 brd 10.222.225.255 scope global eth3valid_lft forever preferred_lft forever56: eth4@if57: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UPlink/ether 02:42:ac:14:00:91 brd ff:ff:ff:ff:ff:ffinet 172.20.0.145/24 brd 172.20.0.255 scope global eth4valid_lft forever preferred_lft forever

There are three interfaces which names starts with “vlan”, so there is a naming convention I’ve chosen and I’m following, to retrieve and parse that data we just need to specify appropriate command and regexp — that is all!
According to CRUD/I model we can CREATE interface (bridge), DELETE or UPDATE, let’s see how it works:

3. CREATE

4. DELETE

5. UPDATE

6. Finally we can see it works from UI

7. For example change VLAN 100 to DOWN state

8. Now we can control the processes:

  • create bridge
  • delete bridge
  • enable bridge
  • disable bridge

9. Let’s think how to control host-faced (end-user) network interfaces. I suggest creating one more Microservice, these feature should be decoupled in order to be reused and simplified.

10. That is how I want to see it

11. And that is how it actually looks

switch:~# brctl showbridge name     bridge id               STP enabled     interfacesvlan_200                8000.02420adee10a       no              eth3.200vlan_100                8000.02420adee00a       no              eth2vlan_default            8000.02420adede0a       no              eth0eth1eth3

12. CREATE method — more complicated than first Microservice but still simple and much more flexible because it handles user input exceptions

All you need to do is just to list command as you are in CLI and replace certain values with variables

13. DELETE

14. UPDATE — presumes several options:
switch interface from one vlan to other: (untagged > untagged), (tagged > untagged), (tagged > untagged)
Option (tagged > tagged) handles by DELETE (or/and) CREATE method, You create one more bridge and assign port to it.

15. Finally here is an example from UI

What else?

There are more useful cases that you may develop such as:

  • Control KVM bridging — with topology view and network configuration
  • Control OVS (open vswitch)
  • Control IPtables, NAT rules
  • Control Routing
  • Control Queuing (it may significantly improve forwarding performance)
  • And much more not only network functions but any single function or a complete service!

Why not try for yourself? UBiqube offers a FREE TRIAL of MSA. I would appreciate if you gave me feedback on your experience with the Integrated Automation Platform.

--

--

Dmitrii Overchenko
Dmitrii Overchenko

No responses yet