Lab Online Guide to Social Media Research Ethics
Ethics is an issue that is becoming increasingly salient in research using social media data. The digital revolution has outpaced parallel developments in research governance and agreed good practice. Codes of ethical conduct that were written in the mid twentieth century are being relied upon to guide the collection, analysis and representation of digital data in the twenty-first century. Social media is particularly ethically challenging because of the open availability of the data (particularly from Twitter). Many platforms’ terms of service specifically state users’ data that are public will be made available to third parties, and by accepting these terms users legally consent to this. However, researchers must interpret and engage with these commercially motivated terms of service through the lens of social science that implies a more reflexive approach than provided in legal accounts of the permissible use of these data.
Social media researchers have experimented with data from a range of sources, including Facebook, YouTube, Flickr, Tumblr and Twitter to name a few. Twitter is by far the most studied of all these networks. This is because Twitter differs from other networks such as Facebook that are organised around groups of ‘friends’, in that it is more ‘open’ and the data (in part) are freely available to researchers. This makes Twitter a more public digital space that promotes the free exchange of opinions and ideas. Twitter has become the primary space for online citizens to publicly express their reaction to events of national significance, and also the primary source of data for social science research into digital publics. The Twitter streaming Application Programming Interface (API) provides three levels of data access: the free random 1% that provides ~5M tweets daily and the random 10% and 100% (chargeable or free to academic researchers upon request). Datasets on social interactions of this scale, speed and ease of access have been hitherto unrealisable in the social sciences, and have led to a flood of conference papers and journal articles, many of which include tweets with full text content and author identity without informed consent. This is presumably because of Twitter’s ‘open’ nature which leads to the assumption that ‘these are public data’, and are therefore not entitled to the rigor and scrutiny of an ethics panel. Even when these data are scrutinized, ethics panels may be convinced by the ‘public data’ argument, due to the lack of a framework to evaluate the potential harms to users. The Social Data Science Lab takes a more ethically reflexive approach to the use of social media data in social research, and carefully considers users’ perceptions, online context and the role of algorithms in estimating potentially sensitive user characteristics.
A recent Lab survey conducted into users’ perceptions of the use of their social media posts (Williams et al. 2017) found the following:
These survey findings show that there may be a disjuncture between the current practices of social researchers in relation to publishing the content of Twitter posts, and users’ views of the fair use of their online communications in publications and their rights as research subjects. Much of this disconnection seems to stem from what is perceived as public in online communications, and therefore what can be published as data without consent or protection from anonymisation. Existing ethical guidelines that provide principles for research in public places focus on traditional forms of data and data collection. Most guidelines (e.g. BPS, BSA, International Visual Sociology Association) stress that consent, confidentiality and anonymity are often not required where the research is conducted in a public place where people would reasonably expect to be observed by strangers. However, the perceptions of the majority of users of Twitter clearly differ with this viewpoint. This is most likely because Twitter blurs the boundary between public and private space.
A social media researcher’s point of view must take to account the unique nature of this online public environment. Internet interactions are shaped by ephemerality, anonymity, a reduction in social cues and the realisation of time-space distanciation, leading individuals to reveal more about themselves within online environments than would be done in offline settings, blurring the public and the private (Joinson 1998: Lash 2001; Williams 2006). Research has highlighted the disinhibiting effect of computer-mediated communication, meaning Internet users, while acknowledging the environment as a (semi) public space, often use it to engage in what would be considered private talk. Online information is often intended only for a specific networked public made up of peers, a support network or specific community, not necessarily the Internet public at large, and certainly not for publics beyond the Internet (boyd 2014). When it is viewed by unintended audiences it has the potential to cause harm, as the information is flowing out of the context it was intended for (Nissenbaum 2008, Barocas & Nissenbaum 2014). In the final analysis, we may be satisfied with the AoIR (2012: 7) guidelines state that social, academic and regulatory delineations of the public-private divide may not hold in online contexts and as such ‘privacy is a concept that must include a consideration of expectations and consensus’ within context.
Informed consent and anonymity are further warranted given the abundance of sensitive data that are generated and contained within these online networks. Lab research shows associations between sexual orientation, ethnicity and gender and feelings of concern and expectations of anonymity. A principle ethical consideration in most learned society guidelines on digital social research is to ensure the maximum benefit from findings whilst minimising the risk of actual or potential harm during data collection, analysis and publication. Potential for harm in social media research increases when sensitive data are estimated. These data can include personal demographic information (such as ethnicity and sexual orientation), information on associations (such as memberships to particular groups or links to other individuals known to belong to such groups) and communications of an overly personal or harmful nature (such as details on morally ambiguous or illegal activity and expressions of extreme opinion). In some cases such information is knowingly placed online, whether or not the user is fully aware of who has access to this information and how it might be repurposed. In other cases sensitive information is not knowingly created by users, but it can often come to light in analysis where associations are identified between users and personal characteristics are estimated by algorithms (van Dijck 2013; Sloan et al. 2015).
If we are to balance the privacy of Twitter users (the disinhibiting nature of the environment and the abundance of sensitive information accepted) with the needs of research, a sensible way forward would be to collect data without explicit consent and seek informed consent for all directly quoted content in publications. The alternative of providing anonymity to directly quoted users is not practical in this form of research, due to Twitter guidelines and the issue of online search (where quoted text is easily searchable rendering users and their partners in conversation identifiable). In the case of the reproduction of tweets (public display of tweets by any and all means of media) Twitter (2016) Broadcast guidelines state publishers should:
If researchers are to abide by these guidelines informed consent should be sought from each tweeter to directly quote their post in research outputs, given anonymity is not advised. This is particularly important considering Twitter’s view that users retain rights to the content they post. The issue of deletion, and the ‘right to be forgotten’ further buttress the need for consent to directly quote. Twitter (2015) terms of service for the use of their APIs by developers require that data harvesters honour any future changes to user content, including deletion. As academic papers cannot be edited continuously post publication, this condition further complicates direct quotation without consent (needless to mention the burden of checking content changes on a regular basis). However, researchers should not conclude that conventional representation of qualitative data in social media research is precluded. As in conventional qualitative research, researchers can make efforts to gain informed consent from a limited number of posters if verbatim examples of text are required.
References and Further Reading:
AOIR (2012) Ethical Decision-Making and Internet Research: Version 2.0 – Recommendations from the Association of Internet Researchers Working Committee. Available here
Barocas, S. and Nissenbaum, H. (2014) ‘Big Data’s End Run Around Procedural Privacy Protections’ Communications of the ACM, 57(11): 3-33
Bassett, E. & O’Riordan (2002) ‘Ethics of Internet Research: Contesting the human subjects research model’, Ethics and Information Technology, 4:3 Available here
Beninger, K, Fry, A., Jago, N., Lepps, H., Nass, L. and Silverster, H. (2014) Research Using Social Media: Users Views, London: NatCen.
boyd, d (2014) It’s Complicated: Social Lives of Networked Teens, New Haven, CT: Yale University Press
Burnap, P. Williams, M. L. Rana, O., Edwards, A., Avis, N., Morgan, J. et al. (2014) COSMOS: Towards an integrated and scalable service for analysing social media on demand, International Journal of Parallel, Emergent and Distributed Systems.
D’Arcy, A. & Young, T. M. (2012) ‘Ethics and Social Media: Implications for sociolinguistics in the networked public’, Journal of Sociolinguistics, 14:4 Available here
Duncan-Daston, R., Hunter-Sloan, M., Fulmer, E. (2013) ‘Considering the ethics implications of social media in social work education’, Ethics and Information Technology, 15:1. Available here
Eurobarometer Survey 359. (2011) Attitudes on Data Protection and Electronic Identity in the European Union. Brussels, June.
Henderson, M., Johnson, N., Glenn, A. (2013) ‘Silences of ethics practice: Dilemmas for researchers using social media’, Educational Research and Evaluation,19:6. Available here
International Visual Sociology Association (2009). “IVSA Code of Research Ethics and Guidelines”. Visual Studies, Vol. 24(3), p. 250-257.
Markham, A. (2013) ‘Fieldwork in social media: What would Malinowski do?’, Journal of Qualitative Communication Research, 2:4. Available here
Markham, A. (2013) ‘Remix culture, remix methods: Reframing qualitative inquiry for social media contexts’, In Denzin, N., & Giardina, M. (Eds.). Global Dimensions of Qualitative Inquiry, Walnut Creek, CA: Left Coast Press.
Markham, A. & Lindgren, S. (2013)’ From Object to Flow: Network Analysis, Symbolic Interaction, and Social Media’, Studies in Symbolic Interactionism. Available here
Markham, A. (2012) ‘Moving into the flow: Using a network perspective to explore complexity in Internet contexts’ In S. Lomborg (Ed.) Network Analysis: Methodological Challenges, Aarhus, Denmark: University of Aarhus Center for Internet Research Monograph Series. Available here
Markham, A. (2012) ‘Fabrication as ethical practice: Qualitative inquiry in ambiguous internet contexts’, Information, Communication and Society, 15:3. Available here
Markham, A. (2011) ‘Internet Research’, In Silverman, D. (Ed.). Qualitative Research: Theory, Method, and Practices, 3rd Edition. London: Sage.
Metcalf, J. and Crawford, K. (2016) ‘Where are Human Subjects in Big Data Research? The Emerging Ethics Divide’, Big Data and Society, 1-14.
Murthy, D. (2008) ‘Digital Ethnography: An Examination of the Use of New Technologies for Social Research’, Sociology, 42:5. Available here
Narayanan, A., & Shmatikov, V. (2009) ‘De-anonymizing social networks’, IEEE Symposium on Security & Privacy, Oakland, CA. Available here
Narayanan, A., & Shmatikov, V. (2008) ‘Robust de-anonymization of large sparse datasets (How to break anonymity of the Netflix prize dataset.)’, IEEE Symposium on Security & Privacy, Oakland, CA. Available here
Neuhaus, F. & Webmoor, T. (2012) ‘Agile Ethics for Massified Research and Visualisation’, Information, Communication and Society, 15: 1. Available ;here
NatCen (2014) Researchers Using Social Media; Users’ Views, London: NatCen. Available here
NatCen (2013) Social Media, Social Science & Research Ethics, London. Available here
Nissenbaum, H. (2010) Privacy in context: Technology, Policy, and the Integrity of Social Life. Stanford: Stanford University Press.
O’Connor, D. (2013) ‘The apomediated world: Regulating research when social media has changed research’, Journal of Law, Medicine and Ethics, 41:2. Available here
O’Neill, N. (2013) ‘Who care? Practical ethics and the problem of underage users on social networking sites’, Ethics and Information Technology, 15:4. Available here
Parker, M. (2010) ‘Ethical and moral dimensions of e-research’, in Dutton, W. & Jeffreys, P. (Eds), World Wide Research: Reshaping the Sciences and Humanities, Cambridge, MA: MIT Press.
Pentland, A. (2009), ‘Reality Mining of Mobile Communication: Toward a New Deal on Data’, in Soumitra Dutta and Irene Mia (eds.) Global Information technology Report 2008-2009: Mobiloty in a Networked World, World Economic Forum.
Ruppert, E. (2015) ‘Who Owns Big Data’, Discover Society, 23.
Ruppert, E., Law, J. and Savage, M. (2013) ‘Reassembling social science methods: the challenge of digital devices’, Theory, Culture & Society, 30(4), 22-46.
Sloan, L., Morgan, J., Williams, M.L., Burnap, P., Rana, O. et al. (2013) ‘Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter’, Sociological Research Online, 18:3. Available here
Stewart, K. F. and Williams, M. L. 2005. Researching online populations: The use of online focus groups for social research. Qualitative Research 5(4), pp. 395-416.
Tinati, R., Halford, S., Carr, L. and Pope, C. (2014) ‘Big Data: Methodological Challenges and Approaches for Sociological Analysis’, Sociology, 48(4): 663-681.
Townsend, L. and Wallance, C. (2016) Social Media Research: A Guide to Ethics, University of Aberdeen.
Twitter (2015) Developer Agreement, San Francisco: Twitter, Available at: https://dev.twitter.com/overview/terms/agreement-and-policy
Twitter (2016) Broadcast Guidelines, San Francisco: Twitter, Available at: https://about.twitter.com/en-gb/company/broadcast
Wasike, J, (2013) ‘Social Media Ethical Issues: Role of a Librarian’, Library Hi Tech News, 30:1. Available here
Williams, M. L., Burnap, P. & Sloan, L. (2017) ‘Towards an ethical framework for publishing Twitter data in social research: taking into account users’ views, online context and algorithmic estimation’, Sociology, Available here
Wilson, R. E., Gosling, S. D., and Graham, L. T. (2012) ‘A Review of Facebook Research in the Social Sciences’, Perspectives on Psychological Science, 7:3. Available here
Zimmer, M. (2010) ‘”But the data is already public”: on the ethics of research in Facebook’, Ethics and Information Technology, 12:4. Available here