Skip to main content

Table 1 Compression and runtime comparisons of gzip and PIC

From: Image-centric compression of protein structures improves space savings

Protein

Atom

Original

Rounded coordinates

gzip

PIC

Compression

Images

Decompression

ID

Count

File

Text

Binary

Size

CR

Size

CR

RMSD

Time

Number

Space

Time

  

size (KB)

size (KB)

size (KB)

(KB)

 

(KB)

  

(min:sec)

used

used (%)

(min:sec)

2ja9

1458

163.3

24.1

6.6

6.3

3.834

10.0

2.412

0.031

0:0.1

1

[0.9]

0:0.4

2jan

12591

1101.2

206.1

56.7

54.1

3.813

61.2

3.368

0.047

0:1.3

1

[2.7]

0:2.1

2jbp

27367

2397.4

447.8

133.4

130.2

3.439

108.8

4.117

0.043

0:4.1

1

[11.1]

0:5.0

2ja8

32000

2831.2

507.6

144.0

139.6

3.637

138.0

3.678

0.043

0:5.6

1

[6.5]

0:7.0

2ign

41758

3579.2

666.7

187.9

180.8

3.688

147.3

4.526

0.069

0:9.0

1

[9.5]

0:11.4

2jd8

50351

4457.6

828.1

226.6

219.7

3.769

196.8

4.207

0.056

0:12.8

1

[7.7]

0:15.9

2ja7

63924

5605.5

1077.0

287.7

278.6

3.866

258.8

4.161

0.055

0:19.6

1

[10.2]

0:24.7

2fug

73916

6386.9

1180.7

360.3

347.5

3.398

283.3

4.168

0.060

0:26.2

1

[10.7]

0:33.3

2b9v

80710

6818.4

1279.8

393.5

379.4

3.373

289.0

4.428

0.073

0:32.2

1

[10.3]

0:39.5

2j28

95358

8152.3

1526.2

429.1

412.2

3.702

346.6

4.403

0.055

0:47.0

1

[13.7]

1:0.2

6hif

118753

12726.2

2105.2

534.4

516.2

4.078

372.2

5.656

0.062

1:30.5

2

[34.0, 0.1]

1:48.6

3j7q

140540

16027.2

2529.7

737.8

707.6

3.575

475.6

5.318

0.058

2:28.2

1

[20.3]

2:44.4

3j9m

158384

17995.2

2845.4

772.1

765.8

3.716

525.7

5.413

0.069

3:28.8

1

[21.7]

3:55.9

6gaw

178372

20825.4

3179.9

869.6

862.1

3.688

587.6

5.411

0.071

4:58.1

1

[23.5]

5:39.1

5t2a

200172

22787.6

3253.9

900.8

872.4

3.73

651.7

4.993

0.068

7:8.2

2

[31.1, 1.7]

8:59.1

4ug0

218776

24906.9

3841.4

1066.5

1056.7

3.635

707.3

5.431

0.069

8:34.2

2

[33.8, 1.7]

9:25.5

4v60

241956

24377.8

4207.8

1179.5

1167.2

3.605

730.2

5.762

0.120

9:50.8

2

[45.6, 2.1]

13:48.9

4wro

260090

35661.1

4363.1

1267.9

1246.2

3.501

848.8

5.14

0.086

13:54.0

1

[29.6]

16:6.9

6fxc

281510

31329.0

5067.1

1477.9

1424.2

3.558

917.7

5.522

0.100

15:52.9

2

[34.6, 1.0]

17:11.6

4wq1

299951

40130.9

5042.1

1462.3

1438.0

3.506

968.8

5.204

0.087

19:59.6

2

[34.7, 0.2]

22:39.0

  1. PIC compression algorithm, \(\varepsilon = 2.5\), results. Rounded Coordinates Text Size and Binary Size are the sizes of the text and binary files (in kilobytes, i.e. 1000 \(\times\) bytes, rather than kibibytes), respectively, that contain only the Cartesian coordinates found in the original file, rounded to one decimal place. The binary file (which uses a variable-length encoding) is then gzipped. The gzip and PIC compression ratios (CR) are the ratios of the Rounded Coordinates Text Size to the size the gzip file and PNG image output(s) from the PIC compressor, respectively. Bolded values are the best of gzip and PIC. Compression and decompression times are for the PIC algorithm; note that our code is unoptimized, as the focus is on compression ratios, but we include these times here for completeness. As an aside, (de)compression for gzip takes negligible time for files of this size. We also include RMSD values to measure the lossiness of PIC compression. Image Space Used gives the proportion of the image space that was used to encode the protein coordinate data, or part thereof, in each image constructed by the PIC compressor (for large proteins, more than one image is needed to represent all the atoms)