Knowing What We Don’t Know: Model-Based Uncertainty Decomposition for Categorical Sequences
Marc A. Scott, Fulvia Pennoni, Ignacio BórquezState sequence analysis of longitudinal categorical data seeks to synthesize pathways through different dimensions of the life course for descriptive, associative and predictive purposes. Given the number and variety of patterns in such data, measures of the dynamic features of sequences are used to characterize them. One, based on the information-theoretic notion of entropy, measures the uncertainty in the state that will be active at a given time. We customize its use to establish the extent to which we are ignorant, or unsure, of what happens next in a dynamic process, conditional on its past. Relying on different Markov chain models for nominal state sequences, we establish multiple measures of uncertainty that allow us to adjust expectations to reflect individual-specific differences and historical information. We establish complementary measures to assess the predictive power of the models in the context of this uncertainty. In so doing, we can summarize and contrast the change in uncertainty associated with different models. As is common in this field, we consider ways in which data can be stratified through demographics and clustering, and how this additional level of partitioning builds a more complete narrative of the social process.