Controlling Semantic Consensus and Lexical Density in Image-Based Evaluations for Mobile Input Methods EICS021
Andreas Komninos, Ioulia Simou, John Garofalakis
Mobile input methods (IMEs) for text entry in interactive applications, are typically evaluated in lab settings, using phrase transcription tasks as the de-facto procedural element in the validation stage of their development. Transcription tasks offer strong internal validity but weak ecological and external validity, due to the composition constraints imposed on participants. Free composition tasks using image stimuli have been proposed as an alternative, but they lack standardized control to regulate elicited text volume and quality, therefore limiting evaluation internal validity. In this paper, we propose a methodology for rigorous, computational sampling of image stimuli from large image datasets, with an aim to control lexical and semantic similarity in user-generated text during IME evaluations. We further deploy diffusion generative AI models to methodically derive stylistic variations from the sampled images, allowing us to examine the effects of original images, abstractive styles and distractor elements on user input. Our findings from an online image description study with crowdsourced participants (