Talk:Entropy (information theory)/Archive 1

Archive 1Archive 2Archive 3

Cross-entropy

The term "cross-entropy" directs to this page, yet there is no discussion of cross-entropy.

Fixed. There is now a separate article on cross entropy. --MarkSweep 20:13, 13 Apr 2005 (UTC)

Move

It might be better to move this page to Shannon entropy instead of redirecting from there to this page. That way, this page can talk about other formulations of entropy, such as the Rényi entropy, and naming/linking to those pages from this one.

I agree with Vegalabs on this. --V79 19:33, 3 October 2005 (UTC)

zero'th order, first order, ...

The discussion of "order" is somewhat confusing; after reading Shaannon's paper I first thought the explanation here was incorrect, but now I see that the confusion comes from the difference between a "first-order Markov source" and what Shannon calls "the first-order approximation to the entropy."

Shannon says "The zeroth-order approximation is obtained by choosing all letters with the same probability and independtly. The first-order approximation is obtained by choosing letters independently but each letter having the same probability that it has in the natural language."

Thus using only the letter frequencies (that is, only single characters), the first order entropy is

 

which is the exact entropy for a zeroth order Markov source, namely one in which the symbol probababilities don't depend on the previous symbol.

I think the text should make this distinction clear, and will think on a way to edit it appropriately.

Jim Mahoney 20:00, Apr 11, 2005 (UTC)

second axiom of entropy

Regarding the formula:

2) For all positive integers n, H satisfies

 


I would like to see some kind of hint that the left H has n arguments, whereas the right H as n+1 arguments. Perhaps an index at the H could do this, i.e.

 

alternatively I could imagine some kind of \underbrace construction, but I could not make it look right.

Why the term was invented

This page claims that the term "information entropy" was invented as the majority of people don't understand what is entropy and if one used it he/she would always have the advantage in the debates. If this is true, it should be included in the article. --Eleassar777 10:41, 14 May 2005 (UTC)

Well, the act of naming the quantity "entropy" wasn't meant to be amusing or confuse people. The page you link to cites only part of what was said, the full quotation of what Shannon said is the following (Sci. Am. 1971 , 225 , p. 180):

"My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'"

I also disagree with the author of that page when he says that thermodynamic and information entropy are not the same. Indeed, the information theoretic interpretation has shed light on what thermodynamic entropy is. That being said (and what I think is the main point of the page), the information theoretic viewpoint is not always the easiest way to understand all aspects of thermodynamics. --V79 19:19, 3 October 2005 (UTC)

Layman with a question

I remember hearing that the amount of entropy in a message is related to the amount of information it carries. In otherwords, the higher the entropy of a message, the more information it has relative to the number of bits. (For example, would this mean that in evolution, the entropy of a string of base pairs actually increases over time?) Is there any truth to this? Keep in mind I flunked Calc II three times running, so please keep it simple and use short words. crazyeddie 06:58, 19 Jun 2005 (UTC)

There is quite a lot truth to that -- when viewed the right way. When I receive a message each letter gives me some more information. How uncertain I was of the message in the first place (quantified by the entropy) tells me how much information I will gain (on average) when actually receiving the message.

Not knowing the details of DNA and evolution, I assume that each of the four base pairs are equally likely i.e. probability of 0.25. The entropy is then 4 × 0.25 log 0.25 = 2 bits per base (which is intuitive since there are 4 ways of combining two bits, 00, 01, 10, 11, which can represent A, G, U, C). But this cannot be increased, so evolution cannot increase the entropy of the base pair string. This is because information in the information theoretic sense doesn't say anything about the usefulness of the information. The junk DNA that is thought to be merely random base pairs outside the genes contain as much information per base as the genes themselves. You can also say that while evolution "adds" some information by changing some base pairs it also "removes" the information about what was there before, giving no net change. --V79 20:04, 3 October 2005 (UTC)

-Using this (what I consider sloppy) terminology, a larger genome will imply an increase in uncertainty. Some plants have twice as many base-pairs in their genome as humans, for instance. Eric

Suggestion for Introduction

The following sentence - The entropy rate of a data source means the average number of bits per symbol needed to encode it - which is currently found in the body of the article, really ought to be included in some form in the introductory paragraph. This is, essentially, the layman's definition of the concept, and provides an excellent introduction to what the term actually means; it is also an excellent jumping off point, into more abstract discussion of the Shannon's theory.

Graph is wrong

The graph that opens this article is wrong: the max entropy should be ln2 not 1.0. Lets work it out: for p=1/2, we have

H = -(p log p + (1-p) * log (1-p)) = -log(1/2) = log 2

linas 04:12, 9 October 2005 (UTC)

But the logarithm used in the definition of the entropy is based 2, not e. Therefore log 2 = 1. Brona 01:42, 10 October 2005 (UTC)

Problem with definition

I am confused by the definition given in the article, and believe that parts of it are wrong:

Claude E. Shannon defines entropy in terms of a discrete random event x, with possible states 1..n as:
 
That is, the entropy of the event x is the sum, over all possible outcomes i of x, of the product of the probability of outcome i times the log of the probability of i (which is also called s's surprisal - the entropy of x is the expected value of its outcome's surprisal). We can also apply this to a general probability distribution, rather than a discrete-valued event.

First, there's a reference to "s's surprisal", but the variable "s" has not been defined. I suspect that it is supposed to be "i", but I'm not familiar enough with the material to make the change.

Second, the way I read the definition, it doesn't matter what the actual outcome is, all that matters is the collection of possible outcomes. I'm pretty sure that this is wrong. I'm probably just confused by the terminology used, but in that case, someone who understands this topic should try to rewrite it in a way that is more understandable to a layman. AdamRetchless 18:10, 21 October 2005 (UTC)

Indeed all that matters are the outcomes and the probabilities of them. The formula above intends to define the information generated by an experiment (for instance taking a coloured ball out of a vase that contains balls with several colours) before the experiment is actually performed. So the specific outcome is unknown. But what we do know is: if the outcome is i (which happens with probability  ) then the information that we get is  . Bob.v.R 00:16, 13 September 2006 (UTC)


The definition given at MathWorld makes a lot more sense to me.

AdamRetchless 18:17, 21 October 2005 (UTC)

I am confused about the statement in this article that "since the entropy was given as a definition, it does not need to be derived." Surely this implies that it could have been defined differently and still have the same properties - I don't think this is true. The justification given in the article derives it based on H = log(Omega), but the origin of this log(Omega) is not explained. The preceding unsigned comment was added by 139.184.30.17 (talk • contribs) .