Site icon Kitchen Soap

Pigz – parallel gzip OMG

Pigz is basically parallel gzip, to take advantage of multiple cores.  When you’ve got massive files, this can be a pretty big advantage, especially when you’ve got lots of cores sitting around.

Taking a 418m squid access log file, on a dual-quad Nehalem L5520  with HyperThreading turned on:

[jallspaw@server01 ~]$ ls -lh daemon.log.2; time gzip ./daemon.log.2 ; ls -lh ./daemon.log.2.gz
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 daemon.log.2

real    0m12.398s
user    0m12.107s
sys     0m0.288s
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 ./daemon.log.2.gz

…now gunziping it:

[jallspaw@server01 ~]$ ls -lh daemon.log.2.gz; time gunzip ./daemon.log.2 ; ls -lh ./daemon.log.2
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 daemon.log.2.gz

real    0m3.245s
user    0m2.693s
sys     0m0.552s
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 ./daemon.log.2

htop looks like this when this is happening:

1 CPU core, 418mb file gzipped in 12.3 sec

(Note the freeloading/lazy 15 cores sitting around watching its friend core #10 sweating)

…now pigz’ing it:

[jallspaw@server01 ~]$ ls -lh daemon.log.2; time ./pigz-2.1.6/pigz ./daemon.log.2 ; ls -lh ./daemon.log.2.gz
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 daemon.log.2

real    0m1.569s
user    0m23.092s
sys     0m0.422s
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 ./daemon.log.2.gz

…now unpigz’ing it:

[jallspaw@server01 ~]$ ls -lh daemon.log.2.gz; time ./pigz-2.1.6/unpigz ./daemon.log.2.gz ; ls -lh ./daemon.log.2
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 daemon.log.2.gz

real    0m1.456s
user    0m1.861s
sys     0m0.867s
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 ./daemon.log.2

and htop looks like this when it’s happening:

16 CPU cores, 418mb pigz’d in 1.5sec

which do you like better?

Exit mobile version