Principle of maximum entropy
| Part of a series on |
| Bayesian statistics |
|---|
| Posterior = Likelihood × Prior ÷ Evidence |
| Background |
| Model building |
| Posterior approximation |
| Estimators |
| Evidence approximation |
| Model evaluation |
The principle of maximum entropy states that, among all probability distributions consistent with a given set of constraints (such as normalization or specified expectation values), the distribution that maximizes Shannon entropy should be selected. This yields the least committal distribution compatible with the known constraints, introducing no structure beyond what is logically implied by the available information.
The justification is that entropy measures the expected information content (or log-surprise) of outcomes relative to a specified reference measure. Maximizing entropy ensures that no additional structure is imposed beyond the stated constraints. Any lower-entropy alternative would encode extra regularity not required by those constraints and would therefore amount to introducing unsupported information.
It is important that entropy be defined relative to a specified measure or prior. In discrete cases, Shannon entropy is defined relative to the counting measure (or an explicitly specified prior weighting). In continuous cases, differential entropy depends on the choice of coordinates and is not invariant under reparameterization. For this reason, the principled continuous formulation maximizes relative entropy (equivalently, minimizes Kullback–Leibler divergence) with respect to a specified reference measure or prior density m(x), typically by maximizing
subject to the given constraints. This formulation is invariant under change of variables and makes explicit the role of the underlying prior measure.