Pigz – parallel gzip OMG

Pigz is basically parallel gzip, to take advantage of multiple cores.  When you’ve got massive files, this can be a pretty big advantage, especially when you’ve got lots of cores sitting around.

Taking a 418m squid access log file, on a dual-quad Nehalem L5520  with HyperThreading turned on:

[jallspaw@server01 ~]$ ls -lh daemon.log.2; time gzip ./daemon.log.2 ; ls -lh ./daemon.log.2.gz
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 daemon.log.2

real    0m12.398s
user    0m12.107s
sys     0m0.288s
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 ./daemon.log.2.gz

…now gunziping it:

[jallspaw@server01 ~]$ ls -lh daemon.log.2.gz; time gunzip ./daemon.log.2 ; ls -lh ./daemon.log.2
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 daemon.log.2.gz

real    0m3.245s
user    0m2.693s
sys     0m0.552s
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 ./daemon.log.2

htop looks like this when this is happening:

1 CPU core, 418mb file gzipped in 12.3 sec

1 CPU core, 418mb file gzipped in 12.3 sec

(Note the freeloading/lazy 15 cores sitting around watching its friend core #10 sweating)

…now pigz’ing it:

[jallspaw@server01 ~]$ ls -lh daemon.log.2; time ./pigz-2.1.6/pigz ./daemon.log.2 ; ls -lh ./daemon.log.2.gz
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 daemon.log.2

real    0m1.569s
user    0m23.092s
sys     0m0.422s
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 ./daemon.log.2.gz

…now unpigz’ing it:

[jallspaw@server01 ~]$ ls -lh daemon.log.2.gz; time ./pigz-2.1.6/unpigz ./daemon.log.2.gz ; ls -lh ./daemon.log.2
-rw-r—– 1 jallspaw jallspaw 45M Apr  2 19:18 daemon.log.2.gz

real    0m1.456s
user    0m1.861s
sys     0m0.867s
-rw-r—– 1 jallspaw jallspaw 418M Apr  2 19:18 ./daemon.log.2

and htop looks like this when it’s happening:

16 CPU cores, 418mb pigzd in 1.5sec

16 CPU cores, 418mb pigz'd in 1.5sec

which do you like better?

I like making things go! At the moment, I'm SVP of Infrastructure and Operations at Etsy, and I'm currently pursuing a Master's degree in Human Factors and Systems Safety at Lund University.

11 comments

  1. Denish Patel   •  

    Seems nice improvement in total run time.

    isn’t load average pretty high with pigz?

    Denish

  2. allspaw   •     Author

    For the 1.55 seconds, yeah the load average I think went above 1. :) The load average will definitely go up because of the number of simultaneous jobs running.

  3. Aaron Bush   •  

    Excellent boost. Makes me wonder if several of the GNU utils need a refresh for parallel opportunities?

  4. Alex   •  

    It’d be interesting to see how long each method takes to gzip 16 files. I suspect for most of us the problem is not compressing one log file – but compressing the 1000′s generated every hour.

    Alex

  5. David Gerard   •  

    So how’s the file format compatibility with gzip, each way?

  6. allspaw   •     Author

    Alex: in our case, we have some database backups that are massive, but yeah assuming there’s some recursive compression going on many many files, that would be interesting to see.

    David: Looks good:
    [jallspaw@server01 ~]$ ./pigz-2.1.6/pigz daemon.log.2
    [jallspaw@server01 ~]$ gunzip ./daemon.log.2.gz
    [jallspaw@server01 ~]$ gzip ./daemon.log.2
    [jallspaw@server01 ~]$ ./pigz-2.1.6/unpigz ./daemon.log.2.gz
    [jallspaw@server01 ~]$

  7. Robert Stewart   •  

    Database backups are a sweet spot for me, too. When I need to set up a new MySQL replication slave of an existing master, the faster I can get the snapshot loaded on the slave, the faster the slave can catch up to the ongoing changes on the master and become available for reads. Of course, the additional load on the master during the parallel gzip has to be taken into account.

    It might be interesting to combine pigz with mk-parallel-dump for quick test exports on quiescent systems, depending on how much mk-parallel-dump is already saturating the cores. mk-parallel-dump currently defaults to gzip.

  8. Raul Martinez   •  

    I like pigz but could not find a statically compiled binary of it.
    Problem is one our server still has outdated version of zlib so I can not build the pigz source.

    Appreciate if you know where I can find one..

  9. cowdog   •  

    if your system’s zlib is too old … but you still need to build zlib from source.

    Get a recent copy of zlib, build it, modify pigz Makefile CFLAGS line to something like this …

    CFLAGS=-O3 -Wall -Wextra -I../zlib-1.2.5/ -L../zlib-1.2.5/

    I used zlib-1.2.5, change the “..” ,of course, to full path of your zlib-1.2.X if necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>