Talk:Move-to-front transform

Latest comment: 1 year ago by StackMoreLayers

I find this statement a bit odd:

"As an example, imagine we wish to compress Hamlet's soliloquy (To be, or not to be...). We can calculate the entropy of this message to be 7033 bits."

Where did 7033 bits come from? I could transmit this using ASCII, in which case it would take 8*len("To be, or not to be...") = 8*22 = 176 bits. Is this referring to the whole soliloquy? --dantheox 06:10, 12 March 2007 (UTC)Reply

It's referring to the whole soliloquy, but even so, the statement is nonsense. The entropy measure of information depends on the model for the information, and no description is provided for the model that results in 7033 bits. It is likely that the author used the ASCII characters as the symbol alphabet, but it's unclear whether they assigned probabilities derived from typical English text or from this specific example of text. —Preceding unsigned comment added by 24.19.209.25 (talk) 00:35, 1 September 2008 (UTC)Reply

I strongly suggest just deleting that paragraph. It's meaningless without detailed explanation of the model used (as the previous comment says) but even if it were fixed to be correct, it would be OR and inappropriate. 207.161.99.42 (talk) 03:53, 22 December 2011 (UTC)Reply


Explanation: It's actually pretty standard - It's the 0-order probability of characters in the paragraph. Once you have all the probabilities( after a full pass), just sum log2(P(b) when b interates on all the characters in the paragraph. — Preceding unsigned comment added by Loyada (talkcontribs) 20:11, 3 August 2012 (UTC)Reply

I tried replicating this, so I used the code

from collections import Counter
import numpy as np

HAMLET = """
<actual text goes here>
""".strip().encode('utf-8')

def calculate_entropy(txt):
    character_counts = np.array(list(Counter(txt).values()))
    character_entropies = -np.log2(character_counts / len(txt))
    return character_counts @ character_entropies

def MTF(txt):
    stack = list(range(256))
    transformed_txt = []
    for ch in txt:
        transformed_txt.append(stack.index(ch))
        stack.insert(0, stack.pop(stack.index(ch)))
    return bytes(transformed_txt)

def BWT(txt):
    txt += b'\0' # Add EOS marker to ensure invertibility of BWT
    rotations = [(2 * txt)[len(txt)-i:2*len(txt)-i] for i in range(len(txt))]
    return b''.join(rotation[-1:] for rotation in sorted(rotations))

print(f'Original size: {calculate_entropy(HAMLET)} bits')
print(f'MTF size: {calculate_entropy(MTF(HAMLET))} bits')
print(f'BWT+MTF size: {calculate_entropy(MTF(BWT(HAMLET)))} bits')

with the first example text from To be, or not to be. The result was

Original size: 7152.4574027105955 bits
MTF size: 8025.503134619397 bits
BWT+MTF size: 6547.965044692664 bits

This does not quite match the article, but is similar. The MTF size is 112% and the BWT+MTF size is 92%. The corresponding percentages based on the article's numbers are 111% and 88%. The following ambiguities make exact replication more difficult:

  • The original text has multiple versions
  • Both MTF and BWT can be applied to any character set. I assumed that the character set is the set of all bytes, but in this context it might well be something different, considering that the "bananaaa" example in the article used the character set "abcdefghijklmnopqrstuvwxyz"
  • This is further complicated by Unicode. The text I used contained the character "’" (Right single quotation mark). I just encoded the text as UTF-8, but one might well do something different, e.g. replacing this with the plain ASCII character "'", or just use some subset of Unicode code points as the character set
  • There are many variants of BWT, differing in how invertibility is ensured and how sorting is done. I arbitrarily picked the first variant from our BWT article, and used the zero byte as the EOS marker
  • Lots of other technical ambiguities also exist, but are rather far-fetched considering that this is meant as a simple example, and practical complications are meant to be ignored. But in actual use, BWT+MTF is typically used with Huffman encoding, which does not quite achieve theoretical entropy, and text is split into blocks of some fixed size which are then compressed separately. One also needs to store the frequency information somehow so that the entropy code can be decoded, either explicitly or implicitly using an adaptive entropy coding scheme, and that further increases compressed size. And if you do that, then the size of the original message might mean 8 times its length in bytes, instead of its entropy coded size, since that's the usual reference point for compression. The name of the section "Use in practical data compression algorithms" could conceivably make one think of these kinds of things, but I thought it was clear enough that these kinds of complications are beside the point. Then again, the first comment here did mention the usual reference point interpretation ("I could transmit this using ASCII").

I think this could be argued to not fall afoul of OR as a routine calculation, since it's clear what is meant, even if it's a bit ambiguous, and BWT, MTF and Shannon entropy are all published, established, and well-known. It's just an example, and which way the ambiguities are resolved don't really matter. It'd be nice for it to be exactly replicable, though. One can of course also argue that the routine calculation exception does not apply. I don't have a strong opinion on this, but I thought that the concrete example and numbers made for a nice illustration. -- StackMoreLayers (talk) 06:27, 28 October 2023 (UTC)Reply

Include variations and alternatives

edit

There are currently several variations to MTF, and numerous alternative algorithms designed to achieve better results. MTF can create inefficiencies since it always moves every character all the way to the front of the line when it occurs, even if that character only occurs once in the entire file.

 One common variation is the "sticky" MTF, which reserves 0 for duplicates, but moves characters down only a specific distance on the list, or creates some cutoff so that characters above that cutoff cannot be moved to the bottom of the list.
 Another variation is the frequency count algorithm family, which use various methods to calculate the relative local frequencies of characters and sort them accordingly, again reserving 0 for runs.

Since this is a popular transform performed in some common compression algorithms, it seems appropriate that some of the variations be noted or briefly defined. — Preceding unsigned comment added by 12.45.169.2 (talk) 14:10, 1 May 2013 (UTC)Reply

Suggestions for the python example

edit

I have the following two suggestsions for the python example:

- Change names to proper, and more descriptive English names (e.g. texte --> plaintext, liste --> dictionary, mot_code --> encodedtext). Btw, was the python code taken from some French version? :)

- Simplify a bit the generation of the alphabet at the beginning. Instead of this:

        # Initialise the list of characters (i.e. the dictionary)
	liste = list()
	# read in each character
	for c in texte:
		# if the character is not already in the dictionary, append it
		if c not in liste:
			liste.append(str(c))
	# sort dictionary
	liste.sort()

I suggest this:

        # Initialise the list of characters (i.e. the dictionary)
        dictionary = sorted(list(set(plaintext)))

Attil-io (talk) 15:13, 5 November 2016 (UTC)Reply

Well, I did the suggested change. Feel free to improve, or revert, if you don't like it :) Attil-io (talk) 18:49, 5 November 2016 (UTC)Reply

Editorial mistakes

edit

The final section includes what appears to be an unfinished sentence: "It can be proved that the time it takes to access a sequence of elements in [4]" but I don't know how the sentence should end -- Axman6 (talk) 12:17, 2 May 2017 (UTC)Reply

Minor error

edit

The article claims that "The MTF transform takes advantage of local correlation of frequencies to reduce the entropy of a message.[clarification" This is incorrect. The entropy of the message is not changed at all (you need lossly coding for that) but rather correlations/signal dependencies are capitalized to reduce the mean word length of the coded data. — Preceding unsigned comment added by 130.75.31.228 (talk) 13:25, 15 October 2019 (UTC)Reply