Why you should run a 64 bit OS on your Raspberry Pi4

One of the cool thing of working for a software company is that very often you get new hardware prototypes to test.
But this is not the case, I bought the Rpi4 because it’s extremely cheap!

The Rpi4 comes with a quad core ARM Cortex A72, up to 4 GB of RAM and a gigabit ethernet port, at a very low price of 35 $.
Raspberry provides Raspbian (a Debian derivative), an already ready distro for their products, so I put it on an sd card to boot it quickly.
I was looking at the syslog and I noticed that, uh, both the kernel and the whole userland are compiled as armv7, which means 32 bit ARM.

I know for sure that the RPi4 is 64 bit capable, so I refused to run a 32 bit OS on it. I get another sd card and I installed Debian on it. A lean and mean Debian compiled as aarch64, which means 64 bit ARM.
As soon as the 64 bit OS booted, I was curious to know how much it performs better than the 32 bit one, so I did some tests.

EDIT: by popular demand, I’m publishing the Debian image.

The two partitions (boot and root) are compressed in a .tar.xz file, and there is a conveniente script mksd which partitions an SD card and extracts the above.

I’ve kept it simple, so it’s a very minimal distribution, you have to install your preferred tools by hand.
The kernel is not the vanilla I used in the tests, but the stable 4.19 by Raspberry, because it supports a whole range of device that my build doesn’t.

The system is configured to get an IP via DHCP on the ethernet interface. Login via SSH with credential user/user and then gain root with sudo -i.

I’ve put the whole thing in a zip archive here:

http://teknoraver.net/software/rpi4_64bit.zip

Feedback is welcome.

EDIT2:

Raspberry just started selling the Raspberry Pi4 with 8 GB RAM.
As you can imagine, this is another good reason to use a 64 bit kernel, otherwise the usable memory will belimited to a mere 3 GB.

Syntethic benchmarks

dhrystone is a program written in the 1988 which does some math calculations.
It’s unlikely to simulate any modern workload, the only way we still use it is to have somewhat consistency between past architecure and softwares.
A modern number crunching application could be some hash calculation, so I wanted to do a SHA1 test. Unfortunately the Debian sha1sum utility was compiled without libssl or kernel crypto support, so I had to compile it from source.
To avoid I/O bottleneck, I calculated the hash of a 2 GB sparse file as with truncate -s 2GB, so the I/O from the sd card was zero:

SHA1 hash is a more real life benchmark that dhrystone as this algorithm is used in really a lot applications, e.g. torrent, git, etc.

RAM

Audio encoding

Networking benchmarks

Firewalling

Although both systems were unable to reach the line rate (which is 1.5 Mpps), the 64 bit kernel scored a bit more than the 32 bit one. If you want to use the Rpi4 as firewall, a 64 bit kernel is definitely a must have.

VPN

As expected, OpenVPN is 10x slower than WireGuard. A less expected result is that OpenVPN performs the same in both 32 and 64 bit mode.
WireGuard instead, almost saturates the gigabit port in both versions, indeed we have the same results with both kernel, probably we hit the NIC limit.
To check if WireGuard could go even faster, I did another VPN test using two containers, so I skip the physical ethernet.
The only drawback with this container test is that both the iperf3 client and server were running on the Rpi4, keeping two cores busy.

As expected, OpenVPN and 32 bit WireGuard, which were CPU limited, performed worse, while 64 bit WireGuard performed better.

Conclusions

proud free software supporter, working @ Microsoft