Quantifying social group and political polarization in on-line platforms

Thanks for visiting nature.com. You might be utilizing a browser model with restricted help for CSS. To acquire the very best expertise, we suggest you utilize a extra updated browser (or flip off compatibility mode in Web Explorer). Within the meantime, to make sure continued help, we’re displaying the positioning with out kinds and JavaScript.


4078 Accesses

382 Altmetric

Metrics particulars

Mass choice into teams of like-minded people could also be fragmenting and polarizing on-line society, significantly with respect to partisan variations1,2,3,4. Nonetheless, our means to measure the social make-up of on-line communities and in flip, to grasp the social group of on-line platforms, is proscribed by the pseudonymous, unstructured and large-scale nature of digital dialogue. Right here we develop a neural-embedding methodology to quantify the positioning of on-line communities alongside social dimensions by leveraging large-scale patterns of combination behaviour. Making use of our methodology to five.1 billion feedback made in 10,000 communities over 14 years on Reddit, we measure how the macroscale neighborhood construction is organized with respect to age, gender and US political partisanship. Inspecting political content material, we discover that Reddit underwent a big polarization occasion across the 2016 US presidential election. Opposite to standard knowledge, nonetheless, individual-level polarization is uncommon; the system-level shift in 2016 was disproportionately pushed by the arrival of latest customers. Political polarization on Reddit is unrelated to earlier exercise on the platform and is as a substitute temporally aligned with exterior occasions. We additionally observe a stark ideological asymmetry, with the sharp enhance in polarization in 2016 being completely attributable to modifications in right-wing exercise. This technique is broadly relevant to the examine of on-line interplay, and our findings have implications for the design of on-line platforms, understanding the social contexts of on-line behaviour, and quantifying the dynamics and mechanisms of on-line polarization.

This can be a preview of subscription content material

Subscribe to Journal

Get full journal entry for 1 12 months

185,98 €

solely 3,65 € per subject

All costs are NET costs.
VAT will likely be added later within the checkout.
Tax calculation will likely be finalised throughout checkout.

Lease or Purchase article

Get time restricted or full article entry on ReadCube.


All costs are NET costs.

All knowledge can be found from the pushshift.io Reddit archive28 at http://information.pushshift.io/reddit/Supply knowledge are supplied with this paper. Reddit neighborhood embedding, social dimension vectors and neighborhood scores are out there at https://github.com/CSSLab/social-dimensions.

All code is on the market at https://github.com/CSSLab/social-dimensions. Analyses had been carried out with Python v3.7, pandas v1.3.3 and Spark v3.0.

Sunstein, C. #Republic: Divided Democracy within the Age of Social Media (Princeton Univ. Press, 2018).

Iyengar, S. & Hahn, Ok. S. Pink media, blue media: proof of ideological selectivity in media use. J. Commun. 59, 19–39 (2009).

Article  Google Scholar 

van Alstyne, M. & Brynjolfsson, E. Digital communities: world villages or cyberbalkanization? In Proc. Worldwide Convention on Data Methods 5 https://aisel.aisnet.org/icis1996/5 (1996).

van Dijck, J. The Tradition of Connectivity: A Important Historical past of Social Media (Oxford Univ. Press, 2013).

McLuhan, M. The Gutenberg Galaxy: The Making of Typographic Man (Univ. of Toronto Press, 1962).

Farrell, H. The implications of the web for politics. Ann. Rev. Pol. Sci. 15, 35–52 (2012).

Article  Google Scholar 

Conover, M. D. et al. Political polarization on Twitter. Proc. Intl AAAI Conf. Net Soc. Media 133, 89–96 (2011).

Google Scholar 

Bail, C. A. et al. Publicity to opposing views on social media can enhance political polarization. Proc. Natl Acad. Sci. USA 115, 9216–9221 (2018).

CAS  Article  Google Scholar 

Martin, T. community2vec: vector representations of on-line communities encode semantic relationships. In Proc. 2nd Workshop on NLP and Computational Social Science 27–31 (2017).

Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Phrase embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115, E3635–E3644 (2018).

CAS  Article  Google Scholar 

Bolukbasi, T., Chang, Ok.-W., Zou, J. Y., Saligrama, V. & Kalai, A. T. Man is to pc programmer as lady is to homemaker? Debiasing phrase embeddings. Adv. Neural Inf. Course of. Syst. 29, 4349–4357 (2016).

Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived routinely from language corpora include human-like biases. Science 356, 183–186 (2017).

ADS  CAS  PubMed  Google Scholar 

Kozlowski, A. C., Taddy, M. & Evans, J. A. The geometry of tradition: analyzing the meanings of sophistication via phrase embeddings. Am. Soc. Rev. 84, 905–949 (2019).

Article  Google Scholar 

Shi, F., Shi, Y., Dokshin, F. A., Evans, J. A. & Macy, M. W. Tens of millions of on-line ebook co-purchases reveal partisan variations within the consumption of science. Nat. Hum. Behav. 1, 0079 (2017).

Article  Google Scholar 

Del Vicario, M. et al. Echo chambers: emotional contagion and group polarization on Fb. Sci. Rep. 6, 37825 (2016).

ADS  Article  Google Scholar 

Pariser, E. The Filter Bubble: What the Web is Hiding from You (Penguin, 2011).

Flaxman, S., Goel, S. & Rao, J. M. Filter bubbles, echo chambers, and on-line information consumption. Public Opin. Q. 80, 298–320 (2016).

Article  Google Scholar 

Bakshy, E., Messing, S. & Adamic, L. A. Publicity to ideologically numerous information and opinion on Fb. Science 348, 1130–1132 (2015).

ADS  MathSciNet  CAS  Article  Google Scholar 

DiMaggio, P., Evans, J. & Bryson, B. Have American’s social attitudes grow to be extra polarized? Am. J. Sociol. 102, 690–755 (1996).

Article  Google Scholar 

Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A. & Bonneau, R. Tweeting from left to proper: is on-line political communication greater than an echo chamber? Psychol. Sci. 26, 1531–1542 (2015).

Article  Google Scholar 

Adamic, L. A. & Look, N. The political blogosphere and the 2004 US election: divided they weblog. In Proc. third Worldwide Workshop on Hyperlink Discovery 36–43 (2005).

An Examination of the 2016 Citizens, Primarily based on Validated Voters https://www.pewresearch.org/politics/2018/08/09/an-examination-of-the-2016-electorate-based-on-validated-voters/ (Pew Analysis Middle, 2018).

Hawley, G. Making Sense of the Alt-Proper (Columbia Univ. Press, 2017).

Simmel, G. Battle and the Net of Group Affiliations (Free Press, 1955).

Breiger, R. L. The duality of individuals and teams. Social Forces 53, 181–190 (1974).

Article  Google Scholar 

Bourdieu, P. Distinction: A Social Critique of the Judgement of Style (Routledge, 1984).

Crenshaw, Ok. W. On Intersectionality: Important Writings (The New Press, 2017).

Baumgartner, J., Zannettou, S., Keegan, B., Squire, M. & Blackburn, J. The Pushshift Reddit dataset. In Proc. Worldwide AAAI Convention on Net and Social Media 14, 830–839 (2020).

Reddit privateness coverage Reddit https://www.redditinc.com/insurance policies/privacy-policy (2021).

Kumar, S., Hamilton, W. L., Leskovec, J. & Jurafsky, D. Neighborhood interplay and battle on the net. In Proc. 2018 World Broad Net Convention 933–943 (2018).

Waller, I. & Anderson, A. Generalists and specialists: utilizing neighborhood embeddings to quantify exercise variety in on-line platforms. In Proc. 2019 World Broad Net Convention 1954–1964 (2019).

Levy, O. & Goldberg, Y. Dependency-based phrase embeddings. In Proc. 52nd Annual Assembly of the Affiliation for Computational Linguistics 2, 302–308 (2014).

Levy, O. & Goldberg, Y. Neural phrase embedding as implicit matrix factorization. Adv. Neural Inf. Course of. Syst. 27, 2177–2185 (2014).

Google Scholar 

Schlechtweg, D., Oguz, C. & im Walde, S. S., Second-order co-occurrence sensitivity of skip-gram with destructive sampling. Preprint at https://arxiv.org/abs/1906.02479 (2019).

Obtain references

This analysis was supported by the Nationwide Sciences and Engineering Analysis Council of Canada (NSERC), the Canada Basis for Innovation (CFI) and the Ontario Analysis Fund (ORF).

I.W. carried out the computational evaluation. A.A. and I.W. designed the analysis, analysed the outcomes and wrote the paper.

Correspondence to Ashton Anderson.

The authors declare no competing pursuits.

Peer overview info Nature thanks Kenneth Benoit, Kate Starbird and the opposite, nameless, reviewer(s) for his or her contribution to the peer overview of this work. Peer reviewer studies can be found.

Writer’s be aware Springer Nature stays impartial with regard to jurisdictional claims in revealed maps and institutional affiliations.

Left: distributions of communities on the age, gender, partisan, and affluence dimensions. Proper: probably the most excessive communities and phrases on these dimensions. Phrase scores are calculated by averaging neighborhood scores weighted by the variety of occurrences of the phrase in the neighborhood in 2017. Neighborhood descriptions could be discovered within the glossary (Supplementary Desk 1).

Scatter plots of the exterior validations of the gender, partisan, and affluence axes. The gender scores for occupational communities are plotted in opposition to the share of ladies in that occupation from the 2018 American Neighborhood Survey. The partisan scores for metropolis communities are plotted in opposition to the Republican vote differential for that metropolitan space within the 2016 presidential election. The affluence scores of metropolis communities are plotted in opposition to the median family earnings for that metropolitan space from the 2016 US Census. The blue line is the best-fit linear regression for the info; the shaded space represents a 95% confidence interval for the regression estimated utilizing a bootstrap. (p)-values for correlation coefficients computed utilizing two-sided take a look at of Pearson correlation assuming joint normality.

Clockwise from left: The hole between college and metropolis communities on the age dimension. The distribution of college and metropolis communities on the age dimension; age is strongly associated to label ((r=0.91), two-sided (p < {10}^{-58}), (n=150), Cohen’s (d=4.37)). The distribution of left and proper wing labelled communities on the partisan dimension; partisan is strongly associated to label ((r=0.92), two-sided (p < {10}^{-21}), (n=50), Cohen’s (d=4.89)). The distribution of explicitly labelled left- and right-wing communities on the partisan-ness axis as in comparison with the overall distribution; there’s a massive distinction of their means (Cohen’s (d=3.27)). For violin plots, white dot represents median; field represents twenty fifth to seventy fifth percentile; whiskers characterize 1.5 occasions the inter-quartile vary; and density estimate (‘violin’) extends to the minima and maxima of the info. (p)-values for correlation coefficients computed utilizing two-sided take a look at of Pearson correlation assuming joint normality.

Distributions of uncooked age, gender and partisan scores, separated by cluster. Outlier communities that lie greater than two customary deviations from the imply are annotated. Dashed traces characterize the worldwide imply on every dimension. Neighborhood descriptions could be discovered within the glossary (Supplementary Desk 1).

Outlier communities that lie greater than two customary deviations from the imply are annotated. Dashed traces characterize the worldwide imply on every dimension. Neighborhood descriptions could be discovered within the glossary (Supplementary Desk 1).

The relationships between the partisan dimension and (a) gender, (b) age, (c) partisan-ness. Each bar represents a bin of communities with partisan scores a given variety of customary deviations from the imply, and the distribution illustrates the scores on the secondary dimension (e.g. gender in (a)). From left to proper, the bars characterize extremely left-wing, leaning left-wing, heart, leaning right-wing, extremely right-wing communities. The leftmost and rightmost bars are annotated with the variety of communities, and examples of the biggest communities, in every group. The hex-plot in (c) illustrates the joint distribution of partisan and partisan-ness scores. Labels correspond to the categorizations used within the polarization evaluation.

(a) The partisan distribution of deleted and non-deleted feedback in political communities. (b) The proportion of exercise that happened in very left-wing ((z < -3)) and really right-wing ((z > 3)) communities over time. (c) Alternate model of Fig. 3a generated utilizing a dataset through which the authorship of all feedback was randomly shuffled. Every particular person bin distribution is extraordinarily just like the general exercise distribution, displaying that the general exercise distribution is a helpful reference level for what bin distributions would seem like if there have been no tendency for customers to remark in ideologically homogeneous communities. (d) Common distributions of political exercise for authors of feedback within the 25 largest political communities on Reddit (by variety of feedback). (e) Correlation of customers’ common partisan scores over time. Every (left(x,yright)) cell represents the correlation between scores of a person in month ({t}_{x}) and that very same person in month ({t}_{y}), for all customers energetic in each time intervals. A person is simply thought-about energetic in the event that they make no less than (10) feedback in a month. (f) The connection between the proportion of customers who polarize and the polarization threshold. The polarization threshold is the variety of customary deviations a person should enhance in polarization to be thought-about polarized. Three traces are plotted corresponding to 3 pairs of months; the pairs of months with the minimal (blue), most (orange), and median (inexperienced) proportion of customers polarized when utilizing a threshold of (1). A threshold of (1) is utilized in all different calculations. (g) The connection between the proportion of customers who polarize and the remark threshold. The remark threshold is the worth used to filter inactive customers from the calculation. Customers will need to have no less than (x) feedback in every of the 2 months to be included within the calculation of the proportion of customers who polarize. The identical three month pairs are plotted as partially (e). There are minimal variations between totally different thresholds. A threshold of (10) is utilized in all different calculations.

The distribution of political exercise on Reddit over time by partisan rating. Every bar represents one month of remark exercise in political communities on Reddit, and is colored in response to the distribution of partisan scores of feedback posted through the month (the partisan rating of a remark is solely the partisan rating of the neighborhood through which it was posted.) The highest plot contains all exercise as in Fig. 3b, whereas the 4 following plots decompose this into the subsets of exercise authored by explicit teams of customers. Customers are categorised based mostly on the common partisan rating of their exercise within the month 12 months prior–into left-wing (having a rating no less than one customary deviation to the left), right-wing (one customary deviation to the appropriate), or heart. Customers with no political exercise within the month 12 months prior use the label of the newest month greater than 12 months prior through which that they had political exercise; if they’ve by no means had political exercise earlier than, they fall into the brand new / newly political class (backside).

(a) Common polarization (absolute (z)-score) of exercise in numerous ideological classes over time. (b) Quantity of exercise (variety of feedback) in numerous ideological classes over time. (c, d) Annual change in polarization within the two partisan exercise classes, decomposed into the change attributable to new ((varDelta n)) and current ((varDelta e)) customers as carried out in Fig. 4.

The connection between explicitly partisan and implicitly partisan exercise (left: left-wing exercise; proper: right-wing exercise.) Of customers who had been first energetic in an explicitly partisan neighborhood at time ({m}_{E}), the proportion of them who had been first energetic in an implicitly partisan neighborhood at time ({m}_{I}) is denoted by the color in cell (left({m}_{E},{m}_{I}proper)). The road graphs on the high present the whole proportion of customers who had been energetic in implicitly partisan communities earlier than they had been energetic in an explicitly partisan neighborhood (i.e. the sum of every column under the diagonal again to 2005, or the whole proportion of customers for whom ({m}_{I} < {m}_{E})).

This file accommodates Supplementary Tables 1 and a couple of.

Reprints and Permissions

Waller, I., Anderson, A. Quantifying social group and political polarization in on-line platforms. Nature (2021). https://doi.org/10.1038/s41586-021-04167-x

Obtain quotation

Obtained: 30 September 2020

Accepted: 19 October 2021

Printed: 01 December 2021

DOI: https://doi.org/10.1038/s41586-021-04167-x

Anybody you share the next hyperlink with will be capable to learn this content material:

Sorry, a shareable hyperlink will not be at present out there for this text.

Offered by the Springer Nature SharedIt content-sharing initiative

By submitting a remark you conform to abide by our Phrases and Neighborhood Tips. For those who discover one thing abusive or that doesn’t adjust to our phrases or pointers please flag it as inappropriate.


Superior search

Nature (Nature) ISSN 1476-4687 (on-line) ISSN 0028-0836 (print)

Join the Nature Briefing publication — what issues in science, free to your inbox every day.


About admin

Check Also

Patrón de ganchillo sin punto de corazón fácil + video

Ya casi es San Valentín, ¿para qué regalos te preparas? Si necesitas …

Deja una respuesta

Tu dirección de correo electrónico no será publicada.