Information Geometry and Asymptotic Theory for SMML Estimators
Enes Makalic, Daniel F. SchmidtStrict minimum message length (SMML) is an information-theoretic coding principle that represents a continuous statistical model by a finite set of assertions and a partition of the sample space. We show that the SMML objective decomposes into assertion entropy and conditional cross-entropy, balancing the cost of identifying an assertion against the cost of encoding data under the assigned model. For any fixed partition, the optimal codepoint for each cell is the model distribution that minimises Kullback–Leibler (KL) divergence from the data distribution restricted to that cell. Using the local Fisher–Rao geometry of regular parametric models, we show that, under a high-resolution LAN-scale regime, SMML partitions are asymptotically the pullback, through the maximum-likelihood estimator, of weighted Fisher–Rao Voronoi tessellations in parameter space, with assertion probabilities appearing as additive weights. For regular canonical exponential families, SMML codepoints satisfy a moment-matching condition and admit an interpretation as KL/Bregman centroids, while exact SMML cells are pullbacks of convex polyhedra in sufficient-statistic space. Together, these results show that SMML induces a natural information-geometric quantisation linking entropy-based coding, KL projection, and divergence-based Voronoi geometry.