NNCP: Lossless Data Compression with Neural Networks

NNCP is an experiment to build a practical lossless data compressor with neural networks. The best performer uses an LSTM model. A model based on self-attention (Transformer) is also evaluated.

The algorithms and results are described in this paper. Warning: the paper only describes the first version of NNCP. Improvements were made in the following versions.

NNCP is based on the LibNC library which allows fast and deterministic evaluation and training of neural networks on x86 CPUs. It is optimized for small batch sizes and low latency. LibNC has no dependency on other libraries and has a C API.

Compression ratio

Result for enwik8:

Program Compr. size
(bytes)
Ratio
(bpb)
gzip 36 445 2482.92
xz 24 865 2441.99
NNCP (2019-11-16)16 292 7741.30
CMIX (v18) 14 838 3321.19

Result for enwik9:

Program Compr. size
(bytes)
Ratio
(bpb)
Program size
(zip, bytes)
Total
(bytes)
gzip 322 591 995 2.5838 801322 630 796
xz 197 331 816 1.5836 752197 368 568
NNCP (2019-11-16) 119 167 2240.95238 452119 405 676
CMIX (v18) 115 714 367 0.93208 961115 923 328

* The results for the other programs are from the Large Text Compression Benchmark.

Download

Linux version: nncp-2019-11-16.tar.gz. LibNC is currently only provided as object code.

Precompiled Windows version: nncp-2019-11-16-win64.zip.

Related Links


Fabrice Bellard - https://bellard.org/