A FREE TOOL FOR VISUALIZING THE PERFORMANCE OF STREAMING COMPRESSION USING YOUR DATA
The Compression Analysis Tool is a free benchmarking tool for the .NET Framework that lets you analyze the performance characteristics of LZF4, DEFLATE, ZLIB, GZIP, BZIP2 and LZMA and helps you discover which is the best compression method for your requirements.
You provide the data to be benchmarked and CAT produces measurements and charts with which you can compare how different compression methods and compression levels affect the compression and decompression speed and the compression ratio of your data.
Your data, your benchmarks, your choice!
The Compression Analysis Tool is specially designed to accurately measure the throughput capabilities of lossless data compression implementations that conform to the streaming API of the .NET Framework.
CAT is based on the typical compression or decompression operation which involves reading the input data from a source stream, compressing or decompressing the input data to produce the output data, and writing the output data to a target stream.
To make this operation suitable for benchmarking we must ensure that:
When measuring the total time that the compression or decompression operation takes from beginning to end, this total time includes not only the time taken to compress or decompress the data but also the time taken to read and write the data.
In order to calculate the compressor's or decompressor's throughput without confusing it with the throughput of the read/write operations, the time taken to read and write the data must be excluded from the total time. To exclude the reading time we measure the time taken to read the input data and then deduct it from the total time. To exclude the writing time we write the output data to the null stream which consumes almost no resources.
The resulting time represents the time required by the compressor or decompressor to process the data without the overhead introduced by data read/write operations.
The performance of the process that is running the compression or decompression operation is affected by the execution of foreground or background processes, context switching, memory fragmentation, garbage collection, underclocking and other factors.
As a result, the measured performance is not stable but can fluctuate considerably when performing multiple passes over the same data using the same compression method. The throughput of some passes might be close to the full throughput of the compressor or decompressor for that particular data while the throughput of other passes might be significantly less than its real potential.
To minimize the effects of these factors, in each compression or decompression stage we perform multiple passes and calculate the throughput of each pass separately. At the end we select the highest throughput as being the one that is the closest to the full throughput that the implementation is capable of delivering.
The number of passes performed in each stage is determined dynamically and depends on the time taken to compress or decompress the data being processed during that stage.
The reason for abandoning a stage that takes less than 10 ms is that its duration is insufficient for obtaining reliable measurements; the shorter the duration, the more volatile the performance becomes to the effects of external factors. We also abandon any stage that has a throughput of less than 10 KB/s as being unrealistically slow.
CAT's operation consists of four stages:
Stage 1 is always performed. Stages 2, 3 and 4 are optional and independent of each other.
This is a single-pass stage during which uncompressed data is read from the input stream, compressed and written to the output stream. It returns the length of the uncompressed input stream and the length of the compressed output stream.
If only the compression ratio is required, this is the only stage that needs to be performed. If Stage 2 or 3 is performed, this stage also serves as a warm-up run.
// *********************************************************
// Compress a stream and write the output to another stream.
// *********************************************************
private void CompressToFile(
CompressionFactory compression,
int compressionLevel,
Stream uncompressedInputStream,
Stream compressedOutputStream,
ref byte[] inputBuffer,
int inputBufferSize,
out long uncompressedLength,
out long compressedLength)
{
// The source and target streams are being reused. Position their cursors at the
// beginning.
uncompressedInputStream.Position = 0;
compressedOutputStream.Position = 0;
// Create the stream that will compress the data.
compressedOutputStream = compression.CreateOutputStream(compressedOutputStream, compressionLevel, true);
// Read from the source stream as many bytes as the input buffer can hold.
// Process them and write the output to the target stream.
int bytesRead;
while ((bytesRead = uncompressedInputStream.Read(inputBuffer, 0, inputBufferSize)) > 0)
{
compressedOutputStream.Write(inputBuffer, 0, bytesRead);
}
// Close the target stream so that data remaining in the inputBuffer are written out.
compressedOutputStream.Close();
// Assign the size of the data before and after compression.
uncompressedLength = uncompressedInputStream.Length;
compressedLength = compressedOutputStream.Length;
}
This is a multi-pass stage during which uncompressed data is read from the input stream, compressed and written to the null stream. It returns the time taken to compress the data after subtracting from it the time taken to read the data.
This stage is performed only if the compression throughput is required. If this stage is performed, the number of passes to be made will be determined dynamically depending on the time taken to compress the particular data being processed during this stage.
// ***********************************************
// Compress a stream and write the output to null.
// ***********************************************
private void CompressToNull(
CompressionFactory compression,
int compressionLevel,
Stream uncompressedInputStream,
ref byte[] inputBuffer,
int inputBufferSize,
out decimal processingTime)
{
// The source stream is being reused. Position its cursor at the beginning.
uncompressedInputStream.Position = 0;
// To ensure that our measurements do not include the time spent reading data from
// the source, we read from the source using our ByteCounterStream that enables us to
// measure the reading time so that at the end we can deduct it from the total time.
var byteCounter = new ByteCounterStream(uncompressedInputStream, null);
uncompressedInputStream = byteCounter;
// Reset the byteCounter in order to zero the ReadTime accumulated up to now.
// We must start accumulating ReadTime only after the stopwatch is started.
// Otherwise our calculations will be wrong and result in negative processing times
// for very small files.
byteCounter.Reset();
// To ensure that our measurements do not include the time spent writing data to the
// target, we assign the target to a null stream which does not consume any resources.
Stream compressedOutputStream = Stream.Null;
// Create a stopwatch and start it.
Stopwatch processingStopwatch = new Stopwatch();
processingStopwatch.Start();
// Create the stream that will compress the data.
compressedOutputStream = compression.CreateOutputStream(compressedOutputStream, compressionLevel, false);
// Read from the source stream as many bytes as the input buffer can hold.
// Process them and write the output to the target stream.
int bytesRead;
while ((bytesRead = uncompressedInputStream.Read(inputBuffer, 0, inputBufferSize)) > 0)
{
compressedOutputStream.Write(inputBuffer, 0, bytesRead);
}
// Close the target stream so that data remaining in the inputBuffer are written out.
compressedOutputStream.Close();
// Stop the stopwatch.
// Calculate the processing time by substracting the read time from the elapsed time.
processingStopwatch.Stop();
processingTime = (decimal)(processingStopwatch.Elapsed.TotalMilliseconds - byteCounter.ReadTime.TotalMilliseconds);
}
This is a multi-pass stage during which compressed data is read from the input stream, decompressed and written to the null stream. It returns the time taken to decompress the data after subtracting from it the time taken to read the data.
This stage is performed only if the decompression throughput is required. If this stage is performed, the number of passes to be made will be determined dynamically depending on the time taken to decompress the particular data being processed during this stage.
// *************************************************
// Decompress a stream and write the output to null.
// *************************************************
private void DecompressToNull(
CompressionFactory compression,
Stream compressedInputStream,
ref byte[] inputBuffer,
int inputBufferSize,
out decimal processingTime)
{
// The source stream is being reused. Position its cursor at the beginning.
compressedInputStream.Position = 0;
// To ensure that our measurements do not include the time spent reading data from
// the source, we read from the source using our ByteCounterStream that enables us to
// measure the reading time so that at the end we can deduct it from the total time.
var byteCounter = new ByteCounterStream(compressedInputStream, null);
compressedInputStream = byteCounter;
// Reset the byteCounter in order to zero the ReadTime accumulated up to now.
// We must start accumulating ReadTime only after the stopwatch is started.
// Otherwise our calculations will be wrong and result in negative processing times
// for very small files.
byteCounter.Reset();
// To ensure that our measurements do not include the time spent writing data to the
// target, we assign the target to a null stream which does not consume any resources.
Stream decompressedInputStream = Stream.Null;
// Create a stopwatch and start it.
Stopwatch processingStopwatch = new Stopwatch();
processingStopwatch.Start();
// Create the stream that will decompress the data.
compressedInputStream = compression.CreateInputStream(compressedInputStream, true);
// Read from the source stream as many bytes as the input buffer can hold.
// Process them and write the output to the target stream.
int bytesRead;
while ((bytesRead = compressedInputStream.Read(inputBuffer, 0, inputBufferSize)) > 0)
{
decompressedInputStream.Write(inputBuffer, 0, bytesRead);
}
// Close the target stream so that data remaining in the inputBuffer are written out.
decompressedInputStream.Close();
// Stop the stopwatch.
// Calculate the processing time by substracting the read time from the elapsed time.
processingStopwatch.Stop();
processingTime = (decimal)(processingStopwatch.Elapsed.TotalMilliseconds - byteCounter.ReadTime.TotalMilliseconds);
}
This is a single-pass stage during which uncompressed data is read from one input stream, compressed data is read from another input stream and decompressed, and the two of them are compared to confirm that they are identical.
This stage is performed only if integrity checking is required.
// **********************************************************************************************
// Decompress a compressed stream and compare its contents with those of the uncompressed stream.
// **********************************************************************************************
private void DecompressAndCheckIntegrity(
CompressionFactory compression,
Stream uncompressedInputStream,
Stream compressedInputStream,
ref byte[] compressedInputBuffer,
int compressedInputBufferSize,
out bool success)
{
// The two source streams are being reused. Position their cursors at the beginning.
uncompressedInputStream.Position = 0;
compressedInputStream.Position = 0;
// Create the stream that will decompress the data.
compressedInputStream = compression.CreateInputStream(compressedInputStream, true);
// Decompress the compressed data and confirm it is the same as the uncompressed data.
int decompressedBytesRead = 0;
int uncompressedBytesRead = 0;
byte[] uncompressedInputBuffer = new byte[compressedInputBuffer.Length];
success = true;
while ((decompressedBytesRead = compressedInputStream.Read(compressedInputBuffer, 0, compressedInputBufferSize)) > 0)
{
do
{
uncompressedBytesRead = uncompressedInputStream.Read(uncompressedInputBuffer, 0, decompressedBytesRead);
for (int i = 0; i < uncompressedBytesRead; i++)
{
success = success && (uncompressedInputBuffer[i] == compressedInputBuffer[i]);
}
decompressedBytesRead -= uncompressedBytesRead;
} while (success && (decompressedBytesRead > 0));
};
// There are no more compressed bytes left to read. Check that there are also no more
// uncompressed bytes left to read.
uncompressedBytesRead = uncompressedInputStream.Read(uncompressedInputBuffer, 0, compressedInputBufferSize);
success = success && (uncompressedBytesRead == 0);
}