Sysbench

sysbench provides benchmarking capabilities for Linux. It supports testing CPU, memory, File I/O, mutex performance and even MySQL benchmarking.

Emerge
Install :

Usage
As mentioned, sysbench supports several benchmark workloads: fileio, cpu, memory, threads, mutex, oltp.

Using the fileio workload
When using fileio, you will need to create a set of test files to work on. It is recommended that the size is larger than the available memory to ensure that file caching does not influence the workload too much.

As this is I/O benchmarking, you can tell sysbench which kind of workload you want to run: sequential reads, writes or random reads, writes, or a combination. In the above example, random read/write is used (rndrw). The duration of the test is given through the  option (in seconds).

The output of a run is shown below:

The important part to look at is the information regarding the operations: Operations performed: 14788 Read, 9858 Write, 31488 Other = 56134 Total Read 231.06Mb Written 154.03Mb  Total transferred 385.09Mb  (1.2836Mb/sec)

These numbers can be compared with runs on different file systems, other systems, etc.

Using the CPU workload
When running with the CPU workload, sysbench will verify prime numbers by doing standard division of the number by all numbers between 2 and the square root of the number. If any number gives a remainder of 0, the next number is calculated. As you can imagine, this will put some stress on the CPU, but only on a very limited set of the CPUs features.

The benchmark can be configured with the number of simultaneous threads and the maximum number to verify if it is a prime.

The number to verify with other systems is given by the execution summary: total time:                         18.0683s total time taken by event execution: 36.1322

The event execution time is the pure calculation part. If you run the test with multiple threads, it is the sum of the time of all threads. The total time is the end-to-end time, and as such includes the overhead of shared memory access for the threads (although this is usually negligible). Unlike the event execution time, the total time is the duration from start to finish (so no culmination of individual times of the threads).

Using the threads workload
With the threads workload, each worker thread will be allocated a mutex (a sort of lock) and will, for each execution, loop a number of times (documented as the number of yields) in which it takes the lock, yields (meaning it asks the scheduler to stop itself from running and put it back and the end of the runqueue) and then, when it is scheduled again for execution, unlock.

By tuning the various parameters, one can simulate situations with high concurrent threading with the same lock, or high concurrent threading with several different locks, etc.

When using the  option, the number you want to use for comparing systems is the per-request statistic. In the above case, a single request (of on average 10.54ms) has run the lock-yield-unlock process 1000 times. Or put differently, on average, the lock-yield-unlock process took about 10.54 microseconds on average.

Using the mutex workload
When using the mutex workload, the sysbench application will run a single request per thread. This request will first put some stress on the CPU (using a simple incremental loop, through the  parameter) after which it takes a random mutex (lock), increments a global variable and releases the lock again. This process is repeated several times identified by the number of locks. The random mutex is taken from a pool sized by the  parameter.

The duration of such a run here is important, although one has to take into account that the threads will take a random mutex from the available pool. This random factor might influence the results a bit.

Using the memory workload
When using the memory test in sysbench, the benchmark application will allocate a memory buffer and then read or write from it, each time for the size of a pointer (so 32bit or 64bit), and each execution until the total buffer size has been read from or written to. This is then repeated until the provided volume is reached. Users can provide multiple threads, different sizes in buffer and the type of requests (read or write, sequential or random).

The important number to compare (given the same or similar parameters) is the throughput and operations per second: Operations performed: 104857600 (2320057.25 ops/sec) 102400.00 MB transferred (2265.68 MB/sec)

External resources

 * How to benchmark your system with sysbench