Scalable Highly-available Resilient UDP Service using IPVS and smart client

The problem

We wanted a service that collects datapoints pass them to some backend to be queried later. We don’t want to bock senders of data or reduce their speed waiting their data to be submitted.

We have made a version of this service that is exposed via UDP, the service respond with “1” to acknowledge client that datapoint is received successfully because UDP unlike TCP have no acknowledgment.

Although UDP is much lightweight compared to TCP-based (including HTTP), it’s more difficult to manage.

At we have a policy of having all services to be redundant and resilient and here is the problem. How to make a UDP service resilient.

Server-side resilience

Linux Virtual Server (IPVS)

At we run every service at redundant number hosts and on each host we run it on redundant number of processes on different ports.

Linux kernel provides very powerful load-balancing for both UDP and TCP which is exposed using “ipvsadm” user-land tool just like “iptables” for firewall and if you have used “iptables”  then ipvsadm would be very familiar and easy to use.

IPVS provide different scheduling methods like round-robin, least-connection, source hashing,  …etc.

yum install ipvsadm
touch /etc/sysconfig/ipvsadm
systemctl start ipvsadm
systemctl enable ipvsadm

now assuming our base UDP port is 7000 and the IP of the host is $IP and we want to use round-robin (rr)

ipvsadm -A -u $IP:7000 -s rr
systemctl save ipvsadm

and if we have two processes one that listen on port 7001 and another on port 7002.

ipvsadm -a -u $IP:7000 -r -m
ipvsadm -a -u $IP:7000 -r -m

In the above example I used the only possible way which is “masquerading” (-m) which is the only possible way when port numbers are different.

Auto-Pilot with Systemd Magic

Using @ magic, we can have a systemd service called “myservice@.service” which can take the port number after @, for example myservice@7001 would start the service at port 7001

ExecStart=/path/to/service --port=%i

and then we can make “ExecStartPre=”, “ExecStopPost=” and “FailureAction=” remove that port from the service

While “ExecStartPost=” adds the port to the service. The complete unit file would look like this

Description=MyService on UDP port %I

ExecStartPre=-/sbin/ipvsadm -d -u $IP:7000 -r
ExecStopPost=-/sbin/ipvsadm -d -u $IP:7000 -r
FailureAction=-/sbin/ipvsadm -d -u $IP:7000 -r
ExecStart=/path/to/service --port=%i
ExecStartPost=/sbin/ipvsadm -a -u $IP:7000 -r -m


and now you can use

systemctl start myservice@7001 myservice@7002 myservice@7003systemctl enable myservice@7001 myservice@7002 myservice@7003

Client-side resilience

We made our client to take an array of hosts to use, at first we thought about using in order, trying the first host, if it’s not successful within 20ms we pick next one.

Our second version started from a random offset so if we have 10 hosts, we pick a random number from 0-9 then start trying from that offset modulo 10.


One Comment on Scalable Highly-available Resilient UDP Service using IPVS and smart client

Leave a Reply

Your email address will not be published.