Observation: The benefits of disk caching

In various recent discussions, we have debated the benefits of faster storage vs adding RAM. The idea being that a lot of RAM can cache disk access, partially mitigating the slow performance of hard drives.

I recently observed something that drives home this point.

As a part of my work, I run a Linux VM where I compile Linux firmware images for an embedded device. The file system image generated is a 1.4GB file (compressed from about 6GB). The VM is configured to have 8GB of RAM (half of my Mac’s 16GB) and it is using a fairly low-performance USB-attached 1TB portable HDD (WD Elements) for storage.

If I transfer these system images over my LAN to another computer (for archiving and installation) soon after I finish compiling it, the file transfers at about 60-80 MB/s (about 600-800Mbps - close to the limit of my Gigabit Ethernet LAN).

If, however, I shutdown and later restart the VM, and then try to transfer the file, it transfers at about 6-10MB/s (about 60-100Mbps).

The difference? When I transfer it soon after it was generated, the file’s content is almost entirely in the Linux VM’s disk cache. So it is being read from RAM. But when I transfer it after a reboot, none of it is cached, and the speed is throttled to the speed of the HDD.

Nothing surprising here, but an interesting observation nonetheless.

3 Likes

Any modern HDD attached over USB3 to a Mac should see ~50 MB/s. If you see ~6 MB/s type performance outside of the VM, there’s a problem with your disk, its connection, or your Mac. But more likely, this is restricted to the VM and thus simply reflects poor i/o handling.

Probably a bit of both.

I’m running Ubuntu 18.04LTS in VirtualBox on my Mac. The VM’s file system is a 750GB virtual disk containing 8GB swap and the rest formatted a ext4. The virtual disk resides on an exFAT-formatted USB-attached portable hard drive.

The WD Elements drives are known to be low-performance devices (See: UserBenchmark: WD Elements 1TB). Although it reports 50MB/s for sequential reads, it drops down to 1MB/s for random-reads of 4K blocks. That situation (random-read) is closer to my reality, given that I’m working with multi-GB files that are frequently created and deleted. So there’s going to be lots of fragmentation, both within the VM’s ext file system and in the sparse-disk file that contains the virtual disk (it started pretty small, but has grown to almost 800GB to hold that 750G virtual disk).

(Yes, I know I could migrate to an SSD and dramatically speed up everything. But I don’t want to spend the money to buy one at this time. I’m using the HDD because I already had it on-hand from an older project.)

1 Like

Is it possible to configure VirtualBox to have the swap partition on a separate virtual disk you host on your computer’s internal hard drive (which could be an SSD)? That could provide some performance improvements if that VM actually makes use of the swap space.

It can be configured. But the VM rarely swaps. And the observation I made (much slower file transfer when the file isn’t in cache) happens after a clean reboot, when nothing has been running yet.

That having been said, I really don’t like the idea of swapping to my Mac’s internal SSD. Swapping, if it is used, involves a lot of writing, and I don’t want to shorten the life of that SSD. Especially since I can’t replace it myself, should it become necessary in the future.