Tuesday, July 18, 2023

I wrote my own network bandwidth monitoring tool for Linux

Most of existing tools is focused to visualize data. In text mode usually we see tools which used ncurses library. Previously I used e.g. Speedometer to monitoring network usage on servers, but this tool is not ideal for that jobs.

On server we needs tools which generate output like logs.

 $ ifrtstat enp0s3
ifrtstat enp0s3 start at Tue Jul 18 12:21:15 2023
enp0s3[1] Sum rx 5200 bit tx 0 bit Cur rx 5200 bit/s tx 0 bit/s
enp0s3[2] Sum rx 8968 bit tx 0 bit Cur rx 3768 bit/s tx 0 bit/s
enp0s3[3] Sum rx 9448 bit tx 0 bit Cur rx 480 bit/s tx 0 bit/s
enp0s3[4] Sum rx 19456 bit tx 0 bit Cur rx 10008 bit/s tx 0 bit/s

This output we can redirect to some file and see after fews days.

To get precise information when we have a problem with bandwidge we need save date on every time probe. It help use option -d (print date) and -m (record max value).

$ ifrtstat enp0s3 -dm
Tue Jul 18 12:28:14 2023 enp0s3[1] Sum rx 3288 bit tx 0 bit Cur rx 3288 bit/s tx 0 bit/s Max rx 3288 bit/s
Tue Jul 18 12:28:15 2023 enp0s3[2] Sum rx 5616 bit tx 0 bit Cur rx 2328 bit/s tx 0 bit/s
Tue Jul 18 12:28:16 2023 enp0s3[3] Sum rx 8904 bit tx 0 bit Cur rx 3288 bit/s tx 0 bit/s
Tue Jul 18 12:28:17 2023 enp0s3[4] Sum rx 10856 bit tx 0 bit Cur rx 1952 bit/s tx 0 bit/s
Tue Jul 18 12:28:18 2023 enp0s3[5] Sum rx 18984 bit tx 0 bit Cur rx 8128 bit/s tx 0 bit/s Max rx 8128 bit/s

Instead of options -d we can use option -t (print counter from start in days/hours/minutes)

On default ifrtstat print data in 'bps'. Option -B change it to 'Bps'.  When we use option -c values is auto converting to larger units (k,M,G...)

If we want record only bigger values to avoid generate too many data we can use option -g.

$ ifrtstat enp0s3 -dmc -g 1000
Tue Jul 18 13:03:52 2023 enp0s3[15] Sum rx 96.488 kb tx 44.656 kb Cur rx 84.808 kb/s tx 39.408 kb/s Max rx 84808 bit/s Max tx 39408 bit/s
Tue Jul 18 13:03:54 2023 enp0s3[16] Sum rx 274.080 kb tx 91.240 kb Cur rx 177.592 kb/s tx 46.584 kb/s Max rx 177592 bit/s Max tx 46584 bit/s
Tue Jul 18 13:03:55 2023 enp0s3[17] Sum rx 11.637 Mb tx 268.096 kb Cur rx 11.363 Mb/s tx 176.856 kb/s Max rx 11363832 bit/s Max tx 176856 bit/s
Tue Jul 18 13:03:56 2023 enp0s3[18] Sum rx 87.956 Mb tx 553.456 kb Cur rx 76.318 Mb/s tx 285.360 kb/s Max rx 76318720 bit/s Max tx 285360 bit/s

This example print only values greater than 1000 bps (default units).

iftrstat is available on https://github.com/grzesieklog/ifrtstat

Download and installation

Install dependencies (GMP nad netlink):

root# apt install libgmp3-dev
root# apt install libnl-3-dev libnl-route-3-dev

Download sources and compilation:

root# git clone  https://github.com/grzesieklog/ifrtstat.git
root# cd ifrtstat/
root# make
root# cp -v ifrtstat /usr/local/bin/

System network stack performance test

root1# nc -vlp 4567 > /dev/null

root2# dd if=/dev/zero bs=4096 count=500000000 | nc -nv 127.0.0.1 4567

$ ifrtstat lo -ct
...
lo[4m59] Sum rx 0 bit tx 0 bit Cur rx 0 bit/s tx 0 bit/s
lo[5m00] Sum rx 795.401 Mb tx 795.401 Mb Cur rx 795.401 Mb/s tx 795.401 Mb/s
lo[5m01] Sum rx 4.135 Gb tx 4.135 Gb Cur rx 3.339 Gb/s tx 3.339 Gb/s
lo[5m02] Sum rx 7.332 Gb tx 7.332 Gb Cur rx 3.196 Gb/s tx 3.196 Gb/s
lo[5m03] Sum rx 10.506 Gb tx 10.506 Gb Cur rx 3.174 Gb/s tx 3.174 Gb/s
lo[5m04] Sum rx 13.633 Gb tx 13.633 Gb Cur rx 3.126 Gb/s tx 3.126 Gb/s
lo[5m05] Sum rx 16.766 Gb tx 16.766 Gb Cur rx 3.133 Gb/s tx 3.133 Gb/s
lo[5m06] Sum rx 19.634 Gb tx 19.634 Gb Cur rx 2.867 Gb/s tx 2.867 Gb/s
lo[5m07] Sum rx 22.790 Gb tx 22.790 Gb Cur rx 3.156 Gb/s tx 3.156 Gb/s
lo[5m08] Sum rx 26.369 Gb tx 26.369 Gb Cur rx 3.579 Gb/s tx 3.579 Gb/s
lo[5m09] Sum rx 29.786 Gb tx 29.786 Gb Cur rx 3.416 Gb/s tx 3.416 Gb/s

...

What's the deal with counter overflow?

In Linux like other operating system network interface counter is a normal int 64-bit value. It max value is 18446744073709551615. It is around 18 exabyte (EB).

When our system send more data via interface this counter will rewind and start again form 0. At this moment we lost total data rx/tx sum form network interface counted form system start up.

For that reason we use big number library GMP.