Information Theory is one of the most popular courses in Marjar University. In this course, there is an important chapter about information entropy.
Entropy is the average amount of information contained in each message received. Here, a message stands for an event, or a sample or a character drawn from a distribution or a data stream. Entropy thus characterizes our uncertainty about our source of information. The source is also characterized by the probability distribution of the samples drawn from it. The idea here is that the less likely an event is, the more information it provides when it occurs.
Generally, "entropy" stands for "disorder" or uncertainty. The entropy we talk about here was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication". We also call it Shannon entropy or information entropy to distinguish from other occurrences of the term, which appears in various parts of physics in different forms.
Named after Boltzmann‘s H-theorem, Shannon defined the entropy Η (Greek letter Η, η) of a discrete random variable X with possible values {x1, x2, ..., xn} and probability mass function P(X) as:
Here E is the expected value operator. When taken from a finite sample, the entropy can explicitly be written as
Where b is the base of the logarithm used. Common values of b are 2, Euler‘s number e, and 10. The unit of entropy is bit for b = 2, natfor b = e, and dit (or digit) for b = 10 respectively.
In the case of P(xi) = 0 for some i, the value of the corresponding summand 0 logb(0) is taken to be a well-known limit:
Your task is to calculate the entropy of a finite sample with N values.