Unlike in CL where variation is characterized as “noise,” CS conceptualizes variability in text as “social and cultural data” (Nguyen et al., 2016: 537), which can be analysed in relation to the author's attributes, such as their age or gender. Wieling et al., 2016).Īlong with the increasing convergence of sociolinguistics and computer sciences is the emergence of a distinct theoretical enterprise-”Computational Sociolinguistics” (henceforth CS) (Nguyen, Seza Doğruöz, Rosé, & de Jong, 2016)-which seeks to integrate the methodological approaches of Computational Linguistics (henceforth CL) and the theoretical frameworks of the variationist paradigm. Increasingly, Natural Language Processing (NLP) tools have been used to examine linguistic phenomena in “big” datasets, including multimillion word corpora scraped from social media sites Twitter and Reddit (e.g. These developments have not bypassed sociolinguistics.
Consequently, researchers have been able to examine social phenomena across populations far greater in size than previously thought possible. In recent years, the social sciences have witnessed a methodological shift towards utilizing computational tools to automatically extract and analyse complex datasets.