Do you warm up volumes ?

Nowadays, most of the cloud vendors provide different solutions to store your data and exploit them from other services. In virtual machine realm, it is often admitted that block storage brings a flexible consumption and local device ensures a low latency. At Cloud Mercato we continuously test and report storage metrics and beyond performance announced by providers, we often face bias effects all related to a phenomenons called “volume warming-up“.

We don’t talk about sport ?

No sorry, it isn’t even linked to temperature, your volumes are supposed to be in fresh rooms somewhere with many other peers. Here the subject is about your HDD/SSD performance when you just get it. Brand new volumes may suffer from several kind of phenomena mainly bound to block allocation. Our team sees this regularly and in fact the expression “warm-up” is the solution not the issue. Here’s what we observe:

  • When you read your volume, you have very high performance: It’s not really disturbing as a real user won’t read an empty disk. The problem is for testers like us who risk to collect results sometime too good to be true.
  • When you write, on the contrary, low performance occurs and penalty of 50 to 95% is seeable. Here an end-user will be directly affected, Imagine a fresh new database node working at 30% of its capacity: just to populate your database will take a while.

Why does it occurs ?

As you guess, providers won’t sell under-effective drives. Some vendors will declare clearly in their documentation if their volumes suffer from that issue, other will let you guess by yourself. In that last case we advise you to ensure that your usage won’t be degraded. As we are in virtualized environments, it’s difficult to give a general description of what’s going on but the idea of this handicap is around block allocation.

Let’s explain these problems by taking the point of view of a volume controller, as a block storage system or a device controller:

  • Read scenario:  The OS asks X amount of block in an area of my volume where I never wrote, I even not yet set a registry and I know this part is empty, so I can quickly answer “zero” whatever you ask me. This is why the high read performance.
  • Write scenario: The OS wants to store X amount of block, firstly I need to allocate a space in storage and update my registry. These operations are done automatically when you use your volume for first time, they represent the overhead and why you should warm up your devices before use them.

How to resolve this issue

The fix consists to produce the block allocation before the real usage, basically you must write on the entire device and read it. The intention is to allocate every block on the entire system with write and be sure they are available with read.

Despite the variety of hypervisors and distributed storage, this method works for most penalized storage. On Unix platforms, only 2 lines are required:

# Replace /dev/vdX by your device path
dd if=/dev/zero of=/dev/vdX bs=1M  # Write
dd if=/dev/vdX of=/dev/null bs=1M  # Read

Still stay one problem, the time for these operations. Firstly there is the latency given by our base problem, then the elapsed time is proportional to the volume’s size. Do you see it coming ? Imagine fill a disk of 3TB at 1.5MB/sec, the setup could be highly time consuming. So another solution would be to parallelize jobs but dd is not made for that. That’s where we use FIO:

# Replace /dev/vdX by your device path
fio --filename=/dev/vdX --rw=write --bs=1m --iodepth=32 --ioengine=libaio --numjobs=32 --direct=1 --name=fio  # Write
fio --filename=/dev/vdX --rw=read --bs=1m --iodepth=32 --ioengine=libaio --numjobs=32 --direct=1 --name=fio  # Read

Even with simultaneous operations, warm up is still a potential long task. But we can relativize things by thinking that this penalize only fresh blocks without allocation, so this operation has to be launched only at server startup. No need to launch it several time or periodically. On the other hand, this is something to take in consideration in infrastructure setup time. For example, in a modern application with a lambda xSQL cluster supporting replication, if this system is configured with auto-scaling helping to spin-up VMs and set replicas. If my storage suffers from lazy allocation I have two options:

  • I take the time to warm-up, it could take 1hour and autoscaling becomes useless
  • My RDMS will warm-up the volume by writing replication on its storage: The process will be very slow and you’ll have bad performance for any new block allocated

So there isn’t any quick solution, as written above, we advise to know accurately where you store your data. Volumes experiencing this disadvantage are simply not adapted to auto-scaling or other scenario presenting time constraints.

Let’s visualize it

Here’s a graph representing writing on a fresh SSD through block storage.

I attached my device at 1:20pm and started to write continuously until reach the maximum performance. My test scenario writes randomly on the SSD, so I’m not sure to warm all the blocks and that’s the point, a user writing on FS don’t really chose which block will be filled. So what can we see ?

  • Performance starts really low : 10 IOPS
  • More I write on the disk, more its throughput increases
  • After 10 minutes, maximum is reached and stable between 450 and 500 IOPS

End stability at 500 IOPS is a low number unveiling a throttling set by the cloud provider. If this limit would be 5MIOPS, I think we may have a clearer view on this phenomenon. Similarly, bigger the volume is, longer it will take to be hot and ready.


If we place these data in a real infrastructure, it could have a huge impact like a null one, all depends of the kind of system you  drive. A classical 3/3 will just require unique operation at start-up, but a cloud-native architecture which claims flexibility will suffers either from a low beginning or from a setup time due to warming-up.


dd is not a benchmarking tool

There is a widely held idea in the Internet that a written snippet will be universally valid to test and produce comparable results from any machine. Said like that, this assertion is globally false but a piece of code, valid in a context, can do a lot of road on the web and easily fool a good amount of people. Benchmarks with dd are a good example. Which Unix nerd didn’t test his brand new device with dd ? The command outputs an accurate value in MB/sec, what more ?

The problem is already in benchmark conception

If I quickly get the dd’s user manual or more simple, the help text, I can read:

Copy a file, converting and formatting according to the operands.

If my goal was to benchmark a device, it already appears that this tool is not the most appropriate. Firstly, I don’t aim to copy anything but just read or write. Next, I don’t want to work with files but with block device. Then, I don’t need the announced features about data handling. The three points are really important, because they show how much the tool is inappropriate.

Don’t get me wrong, I don’t denigrate dd. It personally saved me tons of hours with ISOs or disk migrations. But use it as a standard benchmark tool is more a hack than a reliable idea.

The first issue: The files

A major misconception of benchmark is in what I want to test and how I’ll do it. Here, our goal is HDD/SSD performance and pass by a filesystem can create a big biais in your analysis. Here the kind of command findable on the Internet:

dd bs=1M count=1024 if=/dev/zero of=/root/test

For those not familiar with dd, the above command creates a 1GB file containing only zeros at root user’s home: /root/test.  The authors generally claim the goal is to collect performance of the device where the file is stored, it’s poorly reached. Storage performance are mainly affected by a set of caches/buffers from the user level to the blocks located in SSD. File system is the main entry for users but as it is a software, it can hide you the reality of your hardware as good as well as bad.

By default, dd toward a file systems uses an asynchronous method, meaning that the if the written file is small enough to fit in RAM, the OS won’t write it on drive and will wait the most appropriate time to do so. In this configuration, the command’s output will absolutely not represent storage’s performance and as only volatile-memory is implied, dd displays very good performance.

At Cloud Mercato, as we want to reflect infrastructure performance, we bypass file system as much as possible and directly test device by its absolute path. So from our benchmark you know your hardware possibilities and can boost them with the file system of your choice. There’s only few cases where files are involved such as test root volume in write mode, you mustn’t not write on your root device directly or you’ll erase its OS.

Second issue: A tool without data generation

dd is designed around the concept of copy, it is also quite well explained by its long name “Data Duplicator”. Fortunately in Unix everything is a file and kernels provide pseudo-files generating data. There are:

  • /dev/zero
  • /dev/random
  • /dev/urandom

Under the hood, these pseudo-files are real software and suffer from this. /dev/zero is CPU bound but because it only produces zeros, it cannot represent a real workload. /dev/random is quite slow due to its high randomness and /dev/urandom is too intensive in term of CPU cycles.

Basically, you may not reach the storage maximum performance if you are limited by CPU. Moreover, dd isn’t a multi-thread software, so only one thread at once can stress the device decreasing chances to get the best.

Third: A lack of features

It is said, dd is not a benchmarking tool, if you look at the the open-source catalog of storage testing and the common features, dd, not being intended for this purpose, it is out of competition:

  • Single thread only
  • No optimized data generation
  • No access mode: Sequential or random
  • No deep control such as I/O depth
  • Only average bandwidth, no IOPS, latency or percentiles
  • No mixed patterns: read/write
  • No time control

This shortened list is eloquent, Data-Duplicator doesn’t provide the necessary features to be declared as a performance test tool.

Then the solution

Here are real benchmark tools that you can use:

FIO is really our daily tool (if not hourly), it brings to us possibilities not imaginable with dd like IO depth or random access. vdbench is also very handy, in a similar concept than FIO, you can create complex scenario such as imply multiple files in read/write access.

In conclusion, benchmark is not only a suite of commands ran in a shell. Executed tests and expected output really depend of context: What do you want to test ? Which component should be implied ? Why this value will represent something ? Any snippets taken on the Internet may have its value in a certain environment and be untruthful in another. It’s up to the tester to understand these factors and chose the appropriate tool to her/his purpose.