Sysbench

From Gentoo Wiki
Jump to: navigation, search
Resources

sysbench provides benchmarking capabilities for Linux. It supports testing CPU, memory, File I/O, mutex performance and even MySQL benchmarking.

Installation

USE flags

USE flags for app-benchmarks/sysbench System performance benchmark

aio Enable libaio support local
lua Enable Lua scripting support global
mysql Add mySQL Database support global
postgres Add support for the postgresql database global
test Workaround to pull in packages needed to run with FEATURES=test. Portage-2.1.2 handles this internally, so don't set it in make.conf/package.use anymore global

Emerge

Install app-benchmarks/sysbench:

root #emerge --ask sysbench

Usage

As mentioned, sysbench supports several benchmark workloads: fileio, cpu, memory, threads, mutex, oltp.

Using the fileio workload

When using fileio, you will need to create a set of test files to work on. It is recommended that the size is larger than the available memory to ensure that file caching does not influence the workload too much.

user $sysbench --test=fileio --file-total-size=128G prepare
user $sysbench --test=fileio --file-total-size=128G --file-test-mode=rndrw --max-time=300 --max-requests=0 run
user $sysbench --test=fileio --file-total-size=128G cleanup

As this is I/O benchmarking, you can tell sysbench which kind of workload you want to run: sequential reads, writes or random reads, writes, or a combination. In the above example, random read/write is used (rndrw). The duration of the test is given through the --max-time option (in seconds).

The output of a run is shown below:

user $sysbench --test=fileio --file-total-size=32G --file-test-mode=rndrw --max-time=300 --max-requests=0 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
128 files, 256Mb each
32Gb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  14788 Read, 9858 Write, 31488 Other = 56134 Total
Read 231.06Mb  Written 154.03Mb  Total transferred 385.09Mb  (1.2836Mb/sec)
   82.15 Requests/sec executed

Test execution summary:
    total time:                          300.0149s
    total number of events:              24646
    total time taken by event execution: 173.9797
    per-request statistics:
         min:                                  0.01ms
         avg:                                  7.06ms
         max:                                 92.72ms
         approx.  95 percentile:              16.57ms

Threads fairness:
    events (avg/stddev):           24646.0000/0.00
    execution time (avg/stddev):   173.9797/0.00

The important part to look at is the information regarding the operations:

 Operations performed:  14788 Read, 9858 Write, 31488 Other = 56134 Total
 Read 231.06Mb  Written 154.03Mb  Total transferred 385.09Mb  (1.2836Mb/sec)

These numbers can be compared with runs on different file systems, other systems, etc.

Using the CPU workload

When running with the CPU workload, sysbench will verify prime numbers by doing standard division of the number by all numbers between 2 and the square root of the number. If any number gives a remainder of 0, the next number is calculated. As you can imagine, this will put some stress on the CPU, but only on a very limited set of the CPUs features.

The benchmark can be configured with the number of simultaneous threads and the maximum number to verify if it is a prime.

user $sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 2

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          18.0683s
    total number of events:              10000
    total time taken by event execution: 36.1322
    per-request statistics:
         min:                                  3.44ms
         avg:                                  3.61ms
         max:                                  6.77ms
         approx.  95 percentile:               5.05ms

Threads fairness:
    events (avg/stddev):           5000.0000/7.00
    execution time (avg/stddev):   18.0661/0.00

The number to verify with other systems is given by the execution summary:

 total time:                          18.0683s
 total time taken by event execution: 36.1322

The event execution time is the pure calculation part. If you run the test with multiple threads, it is the sum of the time of all threads. The total time is the end-to-end time, and as such includes the overhead of shared memory access for the threads (although this is usually negligible). Unlike the event execution time, the total time is the duration from start to finish (so no culmination of individual times of the threads).

Using the threads workload

With the threads workload, each worker thread will be allocated a mutex (a sort of lock) and will, for each execution, loop a number of times (documented as the number of yields) in which it takes the lock, yields (meaning it asks the scheduler to stop itself from running and put it back and the end of the runqueue) and then, when it is scheduled again for execution, unlock.

By tuning the various parameters, one can simulate situations with high concurrent threading with the same lock, or high concurrent threading with several different locks, etc.

user $sysbench --test=threads --thread-locks=1 --max-time=20s run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing thread subsystem performance test
Thread yields per test: 1000 Locks used: 1
Threads started!
Time limit exceeded, exiting...
Done.


Test execution summary:
    total time:                          20.0052s
    total number of events:              1894
    total time taken by event execution: 19.9653
    per-request statistics:
         min:                                 10.22ms
         avg:                                 10.54ms
         max:                                 13.42ms
         approx.  95 percentile:              10.71ms

Threads fairness:
    events (avg/stddev):           1894.0000/0.00
    execution time (avg/stddev):   19.9653/0.00

When using the --max-time option, the number you want to use for comparing systems is the per-request statistic. In the above case, a single request (of on average 10.54ms) has run the lock-yield-unlock process 1000 times. Or put differently, on average, the lock-yield-unlock process took about 10.54 microseconds on average.

Using the mutex workload

When using the mutex workload, the sysbench application will run a single request per thread. This request will first put some stress on the CPU (using a simple incremental loop, through the --mutex-loops parameter) after which it takes a random mutex (lock), increments a global variable and releases the lock again. This process is repeated several times identified by the number of locks (--mutex-locks). The random mutex is taken from a pool sized by the --mutex-num parameter.

user $sysbench --test=mutex --num-threads=64 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 64

Doing mutex performance test
Threads started!
Done.


Test execution summary:
    total time:                          0.5510s
    total number of events:              64
    total time taken by event execution: 27.9994
    per-request statistics:
         min:                                 15.40ms
         avg:                                437.49ms
         max:                                520.51ms
         approx.  95 percentile:             516.51ms

Threads fairness:
    events (avg/stddev):           1.0000/0.00
    execution time (avg/stddev):   0.4375/0.07

The duration of such a run here is important, although one has to take into account that the threads will take a random mutex from the available pool. This random factor might influence the results a bit.

Using the memory workload

When using the memory test in sysbench, the benchmark application will allocate a memory buffer and then read or write from it, each time for the size of a pointer (so 32bit or 64bit), and each execution until the total buffer size has been read from or written to. This is then repeated until the provided volume (--memory-total-size) is reached. Users can provide multiple threads (--num-threads), different sizes in buffer (--memory-block-size) and the type of requests (read or write, sequential or random).

user $sysbench --test=memory --num-threads=4 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 102400M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 104857600 (2320057.25 ops/sec)

102400.00 MB transferred (2265.68 MB/sec)


Test execution summary:
    total time:                          45.1961s
    total number of events:              104857600
    total time taken by event execution: 117.7050
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                 18.72ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           26214400.0000/124550.07
    execution time (avg/stddev):   29.4263/0.18

The important number to compare (given the same or similar parameters) is the throughput and operations per second:

 Operations performed: 104857600 (2320057.25 ops/sec)
 102400.00 MB transferred (2265.68 MB/sec)

External resources