Rage against the AI machine: Can generative AI understand Donald Trump-related posts as well as humans?
Dhiraj Murthy, Jeremy Shermak, Shaurya Pathania, Kami Vinton, Kelsey Whipple, Lesley Willard, Rachel MadisonGlobal socio-political discourse on social media is often heavily colored by its highly volatile and divisive nature. Nonetheless, social media remains central to political communication, becoming its own public square with the unique distinction of also intersecting with the multiverse of public squares across nations and platforms. While the sociological value of studying this discourse is undeniable, successfully deciphering the meaning of large corpora of polarized political content remains elusive. AI machine learning methods can tackle large datasets but are limited in their abilities to decipher highly convoluted and nuanced meanings associated with language. Using a historical dataset of Donald Trump-related tweets from 2016 to 2017 (immediately before and after his inauguration in January 2017), this study evaluated ChatGPT compared with humans in effectively discerning sentiment from controversial political discourse. Using human coding of sentiment and 10 sociolinguistic attributes, we found that ChatGPT and humans achieved similarly low levels of agreement—not because AI has reached human-level proficiency, but because both struggled considerably with this type of polarized political content. Human coding itself fell well short of acceptable reliability standards, meaning AI’s comparable performance reflects a shared deficiency rather than a shared strength. The agreement between human and machine coders (κ = 0.308) fell well short of expectations and should not be interpreted as evidence of AI adequacy. We highlight the most problematic areas for machine learning such as its inability to analyze tweets because of special characters, tweet length, and links. For humans, things like misspellings and abbreviations brought confusion. Our findings urge caution in deploying a large language model (LLM)-based sentiment analysis of politically divisive discourse in the near future.