Social Data Science Lab Publish Ethics Guide for Using Social Media Data

7th January 2018


The journal Sociology publishes Lab’s work on ethics .

Communications and connections harvested from social media networks are becoming part of the social scientist’s data diet. Since 2011 the Social Data Science Lab at Cardiff University has been collecting tweets posted around national and global events using the in-house developed COSMOS software. These data, amounting to over five billion individual tweets, have been subject to analysis using an innovative blend of computational and social science techniques.

The research portfolio has focused on the area of risk and safety, in particular social tensions, online hate speech, mental health, demographic estimation and crime and security. Tweets collected around these topics create datasets that contain sensitive content, such as extreme political opinion, grossly offensive comments, and threats to life. Handling these data in the process of analysis (such as classifying content as hateful and potentially illegal) and writing about them has brought the ethics of using social media in social research into sharp focus.

Early on in the research we quickly realised that many of the learned society ethical resources were of little guidance, given their focus on non-digital data. Where addendums on using Internet data were written, they had little to say about social media. Papers were being published in reputable journals with tweets quoted verbatim, with unacceptable and ineffective methods of anonymisation, and without informed consent from users. These researchers may have been satisfied by Twitter’s Terms of Service that specifically state users’ posts that are public will be made available to third parties, and by accepting these terms users legally consent to this. However, given the sensitive nature of some of these data, we argue researchers must interpret and engage with these commercially motivated terms of service through the lens of social science research that implies a more reflexive approach than provided in legal accounts of the permissible use of these data in publications. This necessitates taking account of users’ expectations, the effect of context collapse and online disinhibition on the behaviour of users, and the functioning of algorithms in generating potentially sensitive personal information.

Research on users’ views of the repurposing of their social media data consistently shows that the majority wish to be asked for informed consent if their content is to be published outside of the platform which it was intended for. This expectation may be at odds with the perceived ‘public’ nature of these networks, but we know that users’ conceptions of what is public and private is blurred in online communications. Internet interactions are shaped by ephemerality, anonymity, a reduction in social cues and time–space distanciation. The disinhibiting effect of computer-mediated communication means Internet users, while acknowledging the environment as a (semi-)public space, often use it to engage in what could be considered private talk. Twitter folds multiple audiences into a flattened context. This ‘context collapse’ creates tensions when behaviours and utterances intended for an imagined limited audience are exposed to whole actual audiences.

Online information is often intended only for a specific (imagined) public made up of peers, a support network or specific community, not necessarily the Internet public at large, and certainly not for publics beyond the Internet. When it is presented to unintended audiences it has the potential to cause harm, as the information is flowing out of the context it was intended for. Informed consent to publish is further warranted given the abundance of sensitive data that are generated and contained within these online networks. Potential for harm in social media research increases when sensitive data are published along with the content of identifiable communications without consent. In some cases, such information is knowingly placed online, while in other cases, sensitive information is not knowingly created by users, but it can often come to light in analysis where associations are identified between users and personal characteristics are estimated by algorithms. If published alongside identifiable posts without consent, these classifications may stigmatise users and potentially cause further harm.

In line with the points raised here we propose that researchers conduct a risk assessment ahead of publishing tweets in research outputs. The decision flow chart here is designed to assist researchers in reaching a decision on whether or not to publish a tweet, and in what contexts informed consent (opt-in or opt-out) may be required.

Text taken from: Williams, M. L., Burnap, P. & Sloan, L. (2017) ‘Towards an ethical framework for publishing Twitter data in social research: taking into account users’ views, online context and algorithmic estimation’, Sociology, Vol. 51(6) 1149–1168. Available here