Parallel Gzip - Pigz

Introduction

Sometimes, we would like to compress one or several files into one zipped file or decompress a zipped file. It is very common to use tools such as gzip, zip, or 7zip to create or decompress .gz, .zip, and .7z files, respectively. However, none of these tools on Linux uses multicore and multithread during compression and decompression. When the number of files are large or the file sizes are large, compression and decompression would take a lot of time using single thread.

Pigz is one of the parallel implementation for gzip and zip. Using pigz could greatly save us the time spent on compression and decompression. In this blog post, I would like to briefly discuss how to use pigz.

Pigz

The pigz usages in the blog post are mainly targeted for Ubuntu systems. However, its usages on other Linux operating systems should be almost the same.

Installation

1
2
sudo apt update
sudo apt install pigz

Pigz Usages

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ pigz --help
Usage: pigz [options] [files ...]
will compress files in place, adding the suffix '.gz'. If no files are
specified, stdin will be compressed to stdout. pigz does what gzip does,
but spreads the work over multiple processors and cores when compressing.

Options:
-0 to -9, -11 Compression level (level 11, zopfli, is much slower)
--fast, --best Compression levels 1 and 9 respectively
-b, --blocksize mmm Set compression block size to mmmK (default 128K)
-c, --stdout Write all processed output to stdout (won't delete)
-d, --decompress Decompress the compressed input
-f, --force Force overwrite, compress .gz, links, and to terminal
-F --first Do iterations first, before block split for -11
-h, --help Display a help screen and quit
-i, --independent Compress blocks independently for damage recovery
-I, --iterations n Number of iterations for -11 optimization
-J, --maxsplits n Maximum number of split blocks for -11
-k, --keep Do not delete original file after processing
-K, --zip Compress to PKWare zip (.zip) single entry format
-l, --list List the contents of the compressed input
-L, --license Display the pigz license and quit
-m, --no-time Do not store or restore mod time
-M, --time Store or restore mod time
-n, --no-name Do not store or restore file name or mod time
-N, --name Store or restore file name and mod time
-O --oneblock Do not split into smaller blocks for -11
-p, --processes n Allow up to n compression threads (default is the
number of online processors, or 8 if unknown)
-q, --quiet Print no messages, even on error
-r, --recursive Process the contents of all subdirectories
-R, --rsyncable Input-determined block locations for rsync
-S, --suffix .sss Use suffix .sss instead of .gz (for compression)
-t, --test Test the integrity of the compressed input
-v, --verbose Provide more verbose output
-V --version Show the version of pigz
-Y --synchronous Force output file write to permanent storage
-z, --zlib Compress to zlib (.zz) instead of gzip format
-- All arguments after "--" are treated as files

A typical command for compressing and decompressing a file is like the following:

1
2
3
4
5
# Compress
# Always use -k to keep the original file
$ pigz -k -p8 image.png
# Decompress
$ pigz -dk -p8 image.gz

However, vanilla pigz is not very friendly to compressing multiple files into one single file and custom output filepath. We would need to rely on tar, the archive tool.

Tar-Pigz Usages

Using pipe |, we could first archive multiple files or directories first to .tar file and compress using pigz to further generate .tar.gz file.

1
2
3
4
5
6
7
# Compress
$ tar -cf - data/ index.json | pigz -k -p8 > dataset.tar.gz
# Decompress (Unfortunately two steps)
$ pigz -k -p8 dataset.tar.gz
# Extract file to another directory
$ mkdir -p new_dataset
$ tar -xf dataset.tar -C new_dataset

Alternatively, tar has already integrated custom compressor in its interface, which makes the command looks more clear.

1
2
3
4
5
6
# Compress
$ tar --use-compress-program="pigz -k -p8" -cf dataset.tar.gz data/ index.json
# Extract file to another directory
$ mkdir -p new_dataset
# Decompress
$ tar --use-compress-program="pigz -dk -p8" -xf dataset.tar.gz -C new_dataset

References

Author

Lei Mao

Posted on

07-08-2020

Updated on

07-08-2020

Licensed under


Comments