Why you should use sysbench ?
Among all benchmark tools available online, sysbench is one of my favorite. If I have to give two words to it , it would be: simplicty and reliability. At Cloud Mercato we mainly focus on 3 of its parts:
- CPU: Simple CPU benchmark
- Memory: Memory access benchmark
- OLTP: Collection of OLTP-like database benchmarks
On top of that, sysbench can be considered as an agnostic benchmark tool, like wrk, it integrates a LuaJIT interpreter allowing to plug the task of your choice and thus to obtain rate, maximum, percentile and more. But today, let's just focus on the CPU benchmark aka prime number search.
Run on command line
sysbench --threads=$cpu_number --time=$time cpu --cpu-max-prime=$max_prime run
The output of this command is not terrifying:
sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 8
Initializing random number generator from current time
Prime numbers limit: 64000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 707.94
Throughput:
events/s (eps): 707.9361
time elapsed: 30.0112s
total number of events: 21246
Latency (ms):
min: 11.25
avg: 11.30
max: 19.08
95th percentile: 11.24
sum: 240041.14
Threads fairness:
events (avg/stddev): 2655.7500/1.79
execution time (avg/stddev): 30.0051/0.00
For the lazyest of you, the most intersting value here is the "events per second" or prime number found per second. But aside of that the output is pretty human readable and very easy to parse. It contains the base stats required for a benchmark in term of context and mathematical aggregations.
But what did I do ?
[caption id="attachment_1313" align="aligncenter" width="679"] Never run a command if you don't know what it's supposed to do[/caption] That being said, as sysbench is open-source and available on GitHub, with a beginner level in C, it's not hard to understand it. The code sample below represents the core of what is timed during the testing:
int cpu_execute_event(sb_event_t *r, int thread_id)
{
unsigned long long c;
unsigned long long l;
double t;
unsigned long long n=0;
(void)thread_id; /* unused */
(void)r; /* unused */
/* So far we're using very simple test prime number tests in 64bit */
for(c=3; c < max_prime; c++)
{
t = sqrt((double)c);
for(l = 2; l <= t; l++) if (c % l == 0) break; if (l > t )
n++;
}
return 0;
}
Source: GitHub akopytov/sysbench
Basically if I translate the function in human words it would be: Loop over numbers and check if they are divisible only by themselves. And yes, the deepness of this benchmark is just these 15 lines, simple loops and some arithmetics. For me, this is clearly the strength of this tool: Again simplicty and reliability. sysbench cpu crossed the ages and as it makes more than 15 years that it didn't changed, it allow you to compare old chips like Intel Sandy Bridge versus the latest ARM. It is a well coded prime-number test, but what I see is that there isn't any complex things developers often setup to achieve their goals such as thread cooperation, encryption or advanced mathematics. It just does a unique task representing a simple "How fast are my CPUs ?". The RAM involment is almost null, processors' features generaly don't improve performance for these kind of tasks and for a Cloud Benchmarker like me, this is essential to erase as most bias as possible. Where I can get performance data from complex workloads and have an idea of the system capacity, sysbench capture a raw strength that I correlate later with more complex results.
How I use it ?
{ "configuration": { "chart": { "type": "spline", "polar": false, "zoomType": "", "options3d": {}, "height": null, "width": null, "margin": null, "inverted": false }, "credits": { "enabled": false }, "title": { "text": "" }, "colorAxis": null, "subtitle": { "text": "" }, "xAxis": { "title": { "text": [ "Threads" ], "useHTML": false, "style": { "color": "#666666" } }, "categories": [ 1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 64 ], "lineWidth": 1, "tickInterval": null, "tickWidth": 1, "tickLength": 10, "tickPixelInterval": null, "plotLines": [{ "value": 9, "color": "rgba(209, 0, 108, 0.5)", "width": 3, "label": { "text": "vCPU number", "align": "left", "style": { "color": "gray" } } }], "labels": { "enabled": true, "formatter": "", "style": { "color": "#666666", "cursor": "default", "fontSize": "11px" } }, "plotBands": null, "visible": true, "floor": null, "ceiling": null, "type": "linear", "min": null, "gridLineWidth": null, "gridLineColor": null, "minorTickInterval": null, "minorTickWidth": 0, "minTickInterval": null, "startOnTick": true, "endOnTick": null, "tickmarkPlacement": null }, "yAxis": { "title": { "text": [ "Number per second" ], "useHTML": false, "style": { "color": "#666666" } }, "categories": [], "plotLines": null, "plotBands": null, "lineWidth": null, "tickInterval": 100, "tickLength": 10, "floor": null, "ceiling": null, "gridLineInterpolation": null, "gridLineWidth": 1, "gridLineColor": "#CCC", "min": 0, "max": null, "minorTickInterval": null, "minorTickWidth": 0, "minTickInterval": null, "startOnTick": true, "endOnTick": null, "minRange": null, "type": "linear", "tickmarkPlacement": null, "labels": { "enabled": true, "formatter": null, "style": { "color": "#666666", "cursor": "default", "fontSize": "11px" } } }, "zAxis": { "title": { "text": "" } }, "plotOptions": { "series": { "dataLabels": { "enabled": false, "format": null, "distance": 30, "align": "center", "inside": null, "style": { "fontSize": "11px" } }, "showInLegend": null, "turboThreshold": 1000, "stacking": "", "groupPadding": 0, "centerInCategory": false } }, "rangeSelector": { "enabled": false }, "legend": { "enabled": true, "align": "center", "verticalAlign": "bottom", "layout": "horizontal", "width": null, "margin": 12, "reversed": false }, "series": [ { "name": "T-Systems Open Telekom Cloud s3.8xlarge.1", "verbose": "T-Systems Open Telekom Cloud s3.8xlarge.1", "data": [ { "y": 88.5275 }, { "y": 176.9944444444444 }, { "y": 265.46900000000005 }, { "y": 353.96000000000004 }, { "y": 530.606 }, { "y": 707.815 }, { "y": 1060.8559999999998 }, { "y": 1414.0780000000002 }, { "y": 2120.651111111111 }, { "y": 2824.994 }, { "y": 2825.9979999999996 }, { "y": 2826.035 } ], "color": "#d1006c" } ], "tooltip": { "enabled": true, "useHTML": false, "headerFormat": "", "pointFormat": "<span style=\"color:{series.color}\">{series.name}</span>: <b>{point.y:.2f}</b><br/>", "footerFormat": "", "shared": false, "outside": false, "valueDecimals": null, "split": false } }, "hc_type": "chart", "id": "139798768566056" }
The only one parameter entirely specific to sysbench cpu is max-prime, it defines the highest prime number during the test. Higher this value is, higher will be the time to find all the prime numbers. Our methodology considers an upper limit of 64000 and a scaling up of thread number, from 1 to 2x the number CPU present on the machine. It produces data like in the chart above where we can see that the OTC's s3.8xlarge.1 has the ability to scale at 100% until 32 threads which is its physical limit.