DOI: 10.1017/s0140525x26104609 ISSN: 0140-525X
Rich data drive generalization: Lessons from machine learning for linguistics and cognitive science
Andrew Kyle LampinenAbstract
The diversity of variation captured in data can strongly affect the generalization of a learning system – even when that variation occurs along axes orthogonal to the generalization in question. Thus, I argue that data richness both distinguishes current language models from prior linguistic models and may still underlie their remaining linguistic data inefficiency.