DOI: 10.3390/bdcc10070202 ISSN: 2504-2289

Part-of-Speech Context Vectors: Approximating Distributional Meaning of Syntactic Category Symbols

Xiaona Ma, Carl Vogel

Words occurring in similar contexts have been observed to have similar meanings. A natural and established method within computational linguistics implements this observation by representing words as vectors with dimensions determined by words that are witnessed in fixed positions in relation to the target word. We generalize this context vector approach to part-of-speech (POS) sequences appropriate to word sequences. As with words, the context of a POS tag (considering the POS tags occurring before and after any target tag) reflects its syntactic constraints and may approximate the "meaning'' of the target tag, from a distributional perspective. We use the 111-million-word British National Corpus (BNC) and the sequence of POS labels lifted from those texts to calculate POS context vectors. We observed significant agreement between the clusters of POS context vectors and the supercategories of corresponding POS tags, and examined potential categorization of the POS categories that emerged from the vector clusters. We also found that though vector measures partially align with the predictions of generativist linguistic theories, the approach suggests a more complex relation between syntactic categories. We conclude that a mutual-information-based approach better approximates the distributional "meaning'' of syntactic categories than the conditional probability distribution of POS symbols.

More from our Archive