DOI: 10.1145/3816773 ISSN: 2573-0142

Controlling Semantic Consensus and Lexical Density in Image-Based Evaluations for Mobile Input Methods EICS021

Andreas Komninos, Ioulia Simou, John Garofalakis

Mobile input methods (IMEs) for text entry in interactive applications, are typically evaluated in lab settings, using phrase transcription tasks as the de-facto procedural element in the validation stage of their development. Transcription tasks offer strong internal validity but weak ecological and external validity, due to the composition constraints imposed on participants. Free composition tasks using image stimuli have been proposed as an alternative, but they lack standardized control to regulate elicited text volume and quality, therefore limiting evaluation internal validity. In this paper, we propose a methodology for rigorous, computational sampling of image stimuli from large image datasets, with an aim to control lexical and semantic similarity in user-generated text during IME evaluations. We further deploy diffusion generative AI models to methodically derive stylistic variations from the sampled images, allowing us to examine the effects of original images, abstractive styles and distractor elements on user input. Our findings from an online image description study with crowdsourced participants ( N = 100) and 3,000 captured input samples, demonstrate that stimulus sampling and restyling can be used a methodological control apparatus for evaluating input methods in lab settings, allowing researchers to systematically control participants’ input density and diversity, mitigating the internal validity challenges of image task evaluations. We present evidence-based guidelines for selecting image stimuli and release our datasets, experiment application, analysis code, and participant descriptions.

More from our Archive