DOI: 10.3390/e28070732 ISSN: 1099-4300

An Information-Geometric Justification for Composite Coherence in Event-Based Narrative Extraction

Brian Keith-Norambuena

Graph-based narrative extraction relies on a coherence function to score transitions between events, but the coherence metrics in current use are defined operationally and lack an information-theoretic foundation. We study the composite metric C=A·T, where A is the angular similarity of document embeddings and T=1−dJS is the topic proximity through the Jensen–Shannon distance of soft cluster memberships, and we provide an information-geometric reading of this metric together with an axiomatic characterization of the geometric-mean combinator. On the product manifold Sd−1×Δ+K−1, the negative log-coherence decomposes additively into an angular and a topic cost. Because the Riemannian metric tensor induced by the Jensen–Shannon distance on the simplex is proportional to the Fisher information matrix, the topic component is locally consistent with the Fisher–Rao metric singled out by Chentsov’s theorem. Within a parametric family of combinators (the compensability spectrum), the geometric mean is the unique combinator consistent with four natural axioms (a boundary/veto condition, symmetry, log-additivity, normalization), and the construction also motivates a proper product metric d× that we use as a reference distance. Experiments on four corpora spanning news and academic domains (40 to 6000 documents), three general-purpose embedding families (GPT-4/ada-002, MPNet, MiniLM-L6) plus citation-aware SPECTER2, and three alternative topic models (LDA, soft k-means, GMM) are consistent with the framework: the Fisher identity holds with R≥0.99, the geometric mean tracks d× closely (ρ=0.999), and a downstream LLM-as-judge consistency check shows that the geometric mean is not empirically dominated by any alternative combinator or single-channel baseline. Sweeping the compensability spectrum, the bottleneck-coherence gap between extracted storylines and random sequences splits into a symmetric component—maximized at the geometric mean on the four corpora above and a fifth, human-navigation corpus—and a displacement term; a cross-modal case study on a human-curated image narrative reproduces the same effect in a second modality. Together, these results provide an information-geometric justification for the composite coherence metric and articulate the conditions under which the geometric mean is the natural choice.

More from our Archive