The Weissman score is a performance metric for lossless compression applications. It was developed by Tsachy Weissman, a professor at Stanford University, and Vinith Misra, a graduate student, at the request of producers for HBO's television series Silicon Valley, a television show about a fictional tech start-up working on a data compression algorithm.[1][2][3][4] It compares both required time and compression ratio of measured applications, with those of a de facto standard according to the data type.

The formula is the following; where r is the compression ratio, T is the time required to compress, the overlined ones are the same metrics for a standard compressor, and alpha is a scaling constant.[1]

The Weissman score has been used by Daniel Reiter Horn and Mehant Baid of Dropbox to explain real-world work on lossless compression. According to the authors it "favors compression speed over ratio in most cases."[5]

Example

edit

This example shows the score for the data of the Hutter Prize,[6] using the paq8f as a standard and 1 as the scaling constant.

Application Compression ratio Compression time [min] Weissman score
paq8f 5.467600 300 1.000000
raq8g 5.514990 420 0.720477
paq8hkcc 5.682593 300 1.039321
paq8hp1 5.692566 300 1.041145
paq8hp2 5.750279 300 1.051701
paq8hp3 5.800033 300 1.060801
paq8hp4 5.868829 300 1.073826
paq8hp5 5.917719 300 1.082325
paq8hp6 5.976643 300 1.093102
paq8hp12 6.104276 540 0.620247
decomp8 6.261574 540 0.63623
decomp8 6.276295 540 0.637726

Limitations

edit

Although the value is relative to the standards against which it is compared, the unit used to measure the times changes the score (see examples 1 and 2). This is a consequence of the requirement that the argument of the logarithmic function must be dimensionless. The multiplier also can't have a numeric value of 1 or less, because the logarithm of 1 is 0 (examples 3 and 4), and the logarithm of any value less than 1 is negative (examples 5 and 6); that would result in scores of value 0 (even with changes), undefined, or negative (even if better than positive).


# Standard compressor Scored compressor Weissman score Observations
Compression ratio Compression time Log (compression time) Compression ratio Compression time Log (compression time)
1 2.1 2 min 0.30103 3.4 3 min 0.477121 1×(3.4/2.1)×(0.30103/0.477121)=1.021506 Change in unit or scale, changes the result.
2 2.1 120 sec 2.079181 3.4 180 sec 2.255273 1×(3.4/2.1)×(2.079181/2.255273)=1.492632
3 2.2 1 min 0 3.3 1.5 min 0.176091 1×(3.3/2.2)×(0/0.176091)=0 If time is 1, its log is 0; then the score can be 0 or infinity.
4 2.2 0.667 min −0.176091 3.3 1 min 0 1×(3.3/2.2)×(−0.176091/0)=infinity
5 1.6 0.5 h −0.30103 2.9 1.1 h 0.041393 1×(2.9/1.6)×(−0.30103/0.041393)=−13.18138 If time is less than 1, its log is negative; then the score can be negative.
6 1.6 1.1 h 0.041393 1.6 0.9 h −0.045757 1×(1.6/1.6)×(0.041393/−0.045757)=−0.904627

See also

edit

References

edit
  1. ^ a b Perry, Tekla (July 28, 2014). "A Fictional Compression Metric Moves Into the Real World". Retrieved January 25, 2016.
  2. ^ Perry, Tekla (July 25, 2014). "A Made-For-TV Compression Algorithm". Retrieved January 25, 2016.
  3. ^ Sandberg, Elise (April 12, 2014). "HBO's 'Silicon Valley' Tech Advisor on Realism, Possible Elon Musk Cameo". The Hollywood Reporter. Retrieved June 10, 2014.
  4. ^ Jurgensen, John; Rusli, Evelyn M. (April 3, 2014). "There's a New Geek in Town: HBO's 'Silicon Valley'". The Wall Street Journal. Retrieved June 10, 2014.
  5. ^ "Lossless compression with Brotli in Rust for a bit of Pied Piper on the backend". Dropbox Tech Blog. Retrieved 2017-06-24.
  6. ^ Hutter, Marcus (July 2016). "Contestants". Retrieved January 25, 2016.