Do you warm up volumes ?

Nowadays, most of the cloud vendors provide different solutions to store your data and exploit them from other services. In virtual machine realm, it is often admitted that block storage brings a flexible consumption and local device ensures a low latency. At Cloud Mercato we continuously test and report storage metrics and beyond performance announced by providers, we often face bias effects all related to a phenomenons called "volume warming-up".

We don't talk about sport ?

No sorry, it isn't even linked to temperature, your volumes are supposed to be in fresh rooms somewhere with many other peers. Here the subject is about your HDD/SSD performance when you just get it. Brand new volumes may suffer from several kind of phenomena mainly bound to block allocation. Our team sees this regularly and in fact the expression "warm-up" is the solution not the issue. Here's what we observe:

When you read your volume, you have very high performance: It's not really disturbing as a real user won't read an empty disk. The problem is for testers like us who risk to collect results sometime too good to be true.
When you write, on the contrary, low performance occurs and penalty of 50 to 95% is seeable. Here an end-user will be directly affected, Imagine a fresh new database node working at 30% of its capacity: just to populate your database will take a while.

Why does it occurs ?

As you guess, providers won't sell under-effective drives. Some vendors will declare clearly in their documentation if their volumes suffer from that issue, other will let you guess by yourself. In that last case we advise you to ensure that your usage won't be degraded. As we are in virtualized environments, it's difficult to give a general description of what's going on but the idea of this handicap is around block allocation. Let's explain these problems by taking the point of view of a volume controller, as a block storage system or a device controller:

Read scenario: The OS asks X amount of block in an area of my volume where I never wrote, I even not yet set a registry and I know this part is empty, so I can quickly answer "zero" whatever you ask me. This is why the high read performance.
Write scenario: The OS wants to store X amount of block, firstly I need to allocate a space in storage and update my registry. These operations are done automatically when you use your volume for first time, they represent the overhead and why you should warm up your devices before use them.

How to resolve this issue

The fix consists to produce the block allocation before the real usage, basically you must write on the entire device and read it. The intention is to allocate every block on the entire system with write and be sure they are available with read. Despite the variety of hypervisors and distributed storage, this method works for most penalized storage. On Unix platforms, only 2 lines are required:

# Replace /dev/vdX by your device path
dd if=/dev/zero of=/dev/vdX bs=1M  # Write
dd if=/dev/vdX of=/dev/null bs=1M  # Read

Still stay one problem, the time for these operations. Firstly there is the latency given by our base problem, then the elapsed time is proportional to the volume's size. Do you see it coming ? Imagine fill a disk of 3TB at 1.5MB/sec, the setup could be highly time consuming. So another solution would be to parallelize jobs but dd is not made for that. That's where we use FIO:

# Replace /dev/vdX by your device path
fio --filename=/dev/vdX --rw=write --bs=1m --iodepth=32 --ioengine=libaio --numjobs=32 --direct=1 --name=fio  # Write
fio --filename=/dev/vdX --rw=read --bs=1m --iodepth=32 --ioengine=libaio --numjobs=32 --direct=1 --name=fio  # Read

Even with simultaneous operations, warm up is still a potential long task. But we can relativize things by thinking that this penalize only fresh blocks without allocation, so this operation has to be launched only at server startup. No need to launch it several time or periodically. On the other hand, this is something to take in consideration in infrastructure setup time. For example, in a modern application with a lambda xSQL cluster supporting replication, if this system is configured with auto-scaling helping to spin-up VMs and set replicas. If my storage suffers from lazy allocation I have two options:

I take the time to warm-up, it could take 1hour and autoscaling becomes useless
My RDMS will warm-up the volume by writing replication on its storage: The process will be very slow and you'll have bad performance for any new block allocated

So there isn't any quick solution, as written above, we advise to know accurately where you store your data. Volumes experiencing this disadvantage are simply not adapted to auto-scaling or other scenario presenting time constraints.

Let's visualize it

Here's a graph representing writing on a fresh SSD through block storage.

{"configuration":{"chart":{"type":"line","polar":false,"zoomType":"xy","options3d":{},"height":null,"inverted":false},"credits":{"enabled":false},"title":{"text":"FIO write IOPS"},"colorAxis":null,"subtitle":{"text":"4KB blocks - random"},"xAxis":[{"minPadding":0,"maxPadding":0,"overscroll":0,"ordinal":true,"title":{"text":"Time"},"labels":{"overflow":"justify","enabled":true,"formatter":""},"showLastLabel":true,"categories":null,"lineWidth":1,"tickInterval":null,"tickWidth":1,"plotLines":null,"plotBands":null,"visible":true,"floor":null,"ceiling":null,"type":"datetime","startOnTick":false,"endOnTick":false,"index":0,"isX":true}],"yAxis":[{"labels":{"y":-2},"opposite":true,"showLastLabel":false,"title":{"text":"In IOPS"},"categories":null,"plotLines":null,"plotBands":null,"floor":null,"ceiling":null,"gridLineInterpolation":null,"index":0}],"zAxis":{"title":{"text":"In IOPS"}},"plotOptions":{"series":{"dataLabels":{"enabled":false,"format":null},"turboThreshold":1000},"column":{"stacking":""},"area":{"stacking":""}},"rangeSelector":{"enabled":false},"legend":{"enabled":null}, "series": [{"name":"Write","type":"spline","marker":{"enabled":false},"data":[[1576156850000,14],[1576156878000,18],[1576156984000,35],[1576157100000,73],[1576157183000,66],[1576157303000,81],[1576157426000,425],[1576157451000,446],[1576157475000,465],[1576157498000,456],[1576157523000,481],[1576157545000,472],[1576157568000,456],[1576157593000,465],[1576157616000,453],[1576157639000,442],[1576157663000,460],[1576157686000,456],[1576157709000,476],[1576157732000,443],[1576157756000,466],[1576157778000,459],[1576157801000,452],[1576157825000,466],[1576157848000,464],[1576157870000,437]],"color":"rgba(197, 78, 13, 0.3)","index":0}], "tooltip":{"enabled":false}},"hc_type":"chart","id":"140492426722440"}

I attached my device at 1:20pm and started to write continuously until reach the maximum performance. My test scenario writes randomly on the SSD, so I'm not sure to warm all the blocks and that's the point, a user writing on FS don't really chose which block will be filled. So what can we see ?

Performance starts really low : 10 IOPS
More I write on the disk, more its throughput increases
After 10 minutes, maximum is reached and stable between 450 and 500 IOPS

End stability at 500 IOPS is a low number unveiling a throttling set by the cloud provider. If this limit would be 5MIOPS, I think we may have a clearer view on this phenomenon. Similarly, bigger the volume is, longer it will take to be hot and ready.

Conclusion

If we place these data in a real infrastructure, it could have a huge impact like a null one, all depends of the kind of system you drive. A classical 3/3 will just require unique operation at start-up, but a cloud-native architecture which claims flexibility will suffers either from a low beginning or from a setup time due to warming-up.