Why I love sysbench

Sysbench is an open-source benchmarking tool designed to assess the performance and stability of various system components. Initially developed for MySQL as an OLTP benchmark, it has evolved into a versatile tool capable of evaluating CPU, memory, and file I/O performance. Sysbench operates by executing a set of standardized tests, measuring the system's response under different workloads. It provides metrics such as transactions per second, queries per second, and latency, aiding in the evaluation and comparison of system performance across different hardware configurations or software setups.

While the Online transaction processing part of the software is very disputed for modern and large database environment, at Cloud Mercato we consider the RAM and morevover the CPU tests as a relevant tool among many others. As you may know, evaluating CPU performance helps determine how efficiently the system can handle complex tasks and process data and efficient RAM is vital to prevent bottlenecks and ensure smooth functioning.

It's also known that the sysbench tests don't represent real workloads, this is true and also a prerequisite to understand the results we get from it. For us, we aim to capture 2 main values:

  • How fast can we operate a single arithmetic operation ?
  • How fast is memory under high stress ?

What are the unrealistic workloads ?

Before to dive into the features, it's good to note that Sysbench being light and opensource, it's always good to compile by yourself. It'll help to benefits from the latest CPU enhancements and herever you are (except PowerZ/PPC where I failed), you should be able to compile an optimized Sysbench.

Now, you've got it, it's not hard to understand why the CPU tests is not similar to a modern workload, here is its source code:

int cpu_execute_event(sb_event_t *r, int thread_id)
{
  unsigned long long c;
  unsigned long long l;
  double t;
  unsigned long long n=0;

  (void)thread_id; /* unused */
  (void)r; /* unused */

  /* So far we're using very simple test prime number tests in 64bit */

  for(c=3; c < max_prime; c++)
  {
    t = sqrt((double)c);
    for(l = 2; l <= t; l++)
      if (c % l == 0)
        break;
    if (l > t )
      n++; 
  }

  return 0;
}

No AI inference, no physic simulation or compression. Just two nested for-loop, doing square root and testing if modulos are 0, in other words, a prime number search. There's nothing too fancy that a CPU cannot achieve with a high rate. For us, this simplicity coupled with the inerty of source code are major assets. Since major part of this code didn't changed in 14 years, we can trust that the performance variation along the CPU history isn't correlated with the software itself.

Sysbench memory is quite similar to FIO, it's about accessing to RAM through different ways, random or sequential, read or write and the chunk size.

Here's a piece of code for the memory evaluation:

int event_rnd_read(sb_event_t *req, int tid)
{
  (void) req; /* unused */

  for (ssize_t i = 0; i <= max_offset; i++)
  {
    size_t offset = (size_t) sb_rand_default(0, max_offset);
    size_t val = SIZE_T_LOAD(buffers[tid] + offset);
    (void) val; /* unused */
  }

  return 0;
}

This snippet corresponds to the execution of random read operations. Again nothing exceptionnal, a for-loop and an access to an array.

This is it ! It's a benchmark tool !

Shortly, yes! Of course there are some small parameters you need to configure and more, understand them. CPU test is anything but complicated, cpu-max-prime is the single parameter which matters. It defines the maximum value for finding prime number, knowing that the greater this limit, the longer the search will take. At Cloud Mercato we use the constant of 64,000 which is an arbitrary number deemed neither too small nor big.

Below an example of our daily usage of Sysbench CPU:

$ sysbench --threads=16 --time=30 cpu --cpu-max-prime=64000 run

sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Initializing random number generator from current time


Prime numbers limit: 64000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1663.55

Throughput:
    events/s (eps):                      1663.5517
    time elapsed:                        30.0135s
    total number of events:              49929

Latency (ms):
         min:                                    3.50
         avg:                                    9.61
         max:                                   42.58
         95th percentile:                       14.73
         sum:                               479610.00

Threads fairness:
    events (avg/stddev):           3120.5625/90.17
    execution time (avg/stddev):   29.9756/0.02

In this output we maily focus on "events/s (eps)" which represent our rate.

And memory tests

Our goal in testing the RAM is not to find the maximum bandwidth available by the system but approach how a real program is supposed to use memory, by accessing randomly to locations where variable, libraries and buffer reside.

Unlike the CPU test, memory has plenty of options:

  • Memory Block Size (--memory-block-size): Size of each chunk transfered from/to RAM
  • Total Memory Size (--memory-total-size): Total size of transfer in Byte
  • Memory Max Operations (--memory-max-requests): Total number of read/write operations
  • Memory Scope (--memory-scope): Global to test the whole memory or local for just a portion
  • Memory Threads (--num-threads): Number of simultaneous operations
  • Memory Access Mode (--memory-access-mode): Determines if chunk are read/written randomly or sequentialy
  • Memory Operations (--memory-oper): Read or write operations
  • Memory Lock (--memory-lock): Memory lock can affects performance by disabling swapping, enhance OS operations.

In our standard test suite we run this tests with both access modes read then write, with a number of thread equal to the CPU one then 2x the number of CPU. The default chunk size of 4K is OK, we disable swap by ourselves and let OS manage memory as usual.

$ sysbench --threads=16 --time=30 memory --memory-oper=write run

sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1KiB
  total size: 102400MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 104857600 (14183521.78 per second)

102400.00 MiB transferred (13851.10 MiB/sec)


Throughput:
    events/s (eps):                      14183521.7843
    time elapsed:                        7.3929s
    total number of events:              104857600

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                   16.01
         95th percentile:                        0.00
         sum:                                94904.04

Threads fairness:
    events (avg/stddev):           6553600.0000/0.00
    execution time (avg/stddev):   5.9315/0.04

Below, you'll catch the transfer 13,851.10 MiB/sec but this value alone doesn't mean a lot. Like any benchmark, multiple runs will confirm it or display a variation.

Yeap, we use it intensively

I don't think there's a study where we don't use Sysbench, for instance in our latest evalutation of the AWS r7iz, Sysbench CPU gives results quite correlated with the Geekbench one. Checkout these other studies using Sysbench: