Social Data Science Lab Publish Ethics Guide for Using Social Media Data
7th January 2018
The journal Sociology publishes Lab’s work on ethics .
Communications and connections harvested from social media networks are becoming part of the social scientist’s data diet. Since 2011 the Social Data Science Lab at Cardiff University has been collecting tweets posted around national and global events using the in-house developed COSMOS software. These data, amounting to over five billion individual tweets, have been subject to analysis using an innovative blend of computational and social science techniques.
The research portfolio has focused on the area of risk and safety, in particular social tensions, online hate speech, mental health, demographic estimation and crime and security. Tweets collected around these topics create datasets that contain sensitive content, such as extreme political opinion, grossly offensive comments, and threats to life. Handling these data in the process of analysis (such as classifying content as hateful and potentially illegal) and writing about them has brought the ethics of using social media in social research into sharp focus.
Early on in the research we quickly realised that many of the learned society ethical resources were of little guidance, given their focus on non-digital data. Where addendums on using Internet data were written, they had little to say about social media. Papers were being published in reputable journals with tweets quoted verbatim, with unacceptable and ineffective methods of anonymisation, and without informed consent from users. These researchers may have been satisfied by Twitter’s Terms of Service that specifically state users’ posts that are public will be made available to third parties, and by accepting these terms users legally consent to this. However, given the sensitive nature of some of these data, we argue researchers must interpret and engage with these commercially motivated terms of service through the lens of social science research that implies a more reflexive approach than provided in legal accounts of the permissible use of these data in publications. This necessitates taking account of users’ expectations, the effect of context collapse and online disinhibition on the behaviour of users, and the functioning of algorithms in generating potentially sensitive personal information.
Research on users’ views of the repurposing of their social media data consistently shows that the majority wish to be asked for informed consent if their content is to be published outside of the platform which it was intended for. This expectation may be at odds with the perceived ‘public’ nature of these networks, but we know that users’ conceptions of what is public and private is blurred in online communications. Internet interactions are shaped by ephemerality, anonymity, a reduction in social cues and time–space distanciation. The disinhibiting effect of computer-mediated communication means Internet users, while acknowledging the environment as a (semi-)public space, often use it to engage in what could be considered private talk. Twitter folds multiple audiences into a flattened context. This ‘context collapse’ creates tensions when behaviours and utterances intended for an imagined limited audience are exposed to whole actual audiences.
Online information is often intended only for a specific (imagined) public made up of peers, a support network or specific community, not necessarily the Internet public at large, and certainly not for publics beyond the Internet. When it is presented to unintended audiences it has the potential to cause harm, as the information is flowing out of the context it was intended for. Informed consent to publish is further warranted given the abundance of sensitive data that are generated and contained within these online networks. Potential for harm in social media research increases when sensitive data are published along with the content of identifiable communications without consent. In some cases, such information is knowingly placed online, while in other cases, sensitive information is not knowingly created by users, but it can often come to light in analysis where associations are identified between users and personal characteristics are estimated by algorithms. If published alongside identifiable posts without consent, these classifications may stigmatise users and potentially cause further harm.
In line with the points raised here we propose that researchers conduct a risk assessment ahead of publishing tweets in research outputs. The decision flow chart here is designed to assist researchers in reaching a decision on whether or not to publish a tweet, and in what contexts informed consent (opt-in or opt-out) may be required.
Text taken from: Williams, M. L., Burnap, P. & Sloan, L. (2017) ‘Towards an ethical framework for publishing Twitter data in social research: taking into account users’ views, online context and algorithmic estimation’, Sociology, Vol. 51(6) 1149–1168. Available here
Social Data Science Lab is a Shortlisted Finalist in the 2017 Cardiff University Celebrating Excellence Awards
7 November 2017
Lab Directors Professor Matt Williams and Dr. Pete Burnap recognised for excellence in Innovation and Enterprise.
The Social Data Science Lab is a finalist in the 2017 Cardiff University Celebrating Excellence Awards. The awards recognise groups and individuals who go the extra mile, and acknowledge the contributions made by colleagues from across the University. The Lab is one of three finalists in the Innovation and Enterprise category that recognises an individual or team that has made an outstanding contribution to a thriving innovation culture. This year 195 nominations were received across 15 categories of award. The winners will be announced at an awards dinner on November 9th.
The Lab was nominated by the Schools of Social Science and Computer Science & Informatics because of its innovative work with big social data applied to the fields of crime and security and risk and safety. The Lab has received research funding from EPSRC, ESRC, JISC, DoJ US, DoH, Met Police, Google, Admiral Insurance, and Airbus Group (amounting to 26 grants worth £7.64 million, bringing in over £4.1 million directly to Cardiff University).
The Lab has provided social media analytics tools and services to a range of organisations in over 20 countries including the Metropolitan Police Service, London Mayor’s Office for Policing and Crime, National Police Chiefs’ Council, Police Foundation, Crown Prosecution Service, College of Policing, Ministry of Defence, Foreign and Commonwealth Office, Detroit Crime Commission, US Army and NATO.
Engagement with industry is a key part of the Lab’s research and innovation strategy, and recent partnerships include two ESRC KTPs with Admiral on using Big Data in insurance and the Airbus Centre of Excellence in Cyber Security Analytics. Lab technology co-produced with Airbus won the Lloyds Science of Risk Runner Up Prize in 2015.
Within the past year the Lab has received funding amounting to £700K from the ESRC to study the use of Open Source Communications in academic, government and third sector research settings. Lab staff are also working with the Rand Corporation (LA) on a US Department of Justice grant ($885K FEC) that examines the use of social media communications to estimate offline hate crime patterns in LA County.
The Social Data Science Lab forms part of the ESRC £64M Big Data Network and is part of the University’s Data Innovation Research Institute. The Lab will form part of the University’s Social Science Research Park (SPARK) in its new £300m Innovation Campus.
Professor Matt Williams Presents Exclusive Lab Brexit-related Hate Crime Findings on BBC One’s Panorama
10 October 2017
ESRC funded Lab research forms key evidence in Panorama’s ‘Hate on the Streets’
BBC One’s Panorama programme ‘Hate on the Streets‘ takes a hard look at the rising tide of hate since the referendum on the UK’s future in the EU. Producers made contact with the Social Data Science Lab to get expert opinion on the statistical trends on hate crime both on and offline. Home Office and social media data being analysed as part of the Lab’s ESRC New and Emerging Forms of Data (NEFD) Policy Demonstrator Grant formed the empirical evidence for the programme.
On the focus of the programme, Professor Matt Williams said “Official police recoded crime figures show a clear rise in hate offences around the time of the referendum. While increased reporting and better recording practices may be partly accountable, they are unlikely to be solely responsible for the highest recorded spike in hate crime since records began. The significant rise in social media hate speech around the same time period also indicates that rates of preparation increased. When we observe acts of hate speech online, we see perpetration in action, meaning we are not reliant upon reports from victims or witnesses to inform trends.”
The Lab’s ESRC NEFD project “Centre for Cyberhate Research and Policy” is developing a Hate Speech Dashboard in collaboration with the Metropolitan Police Service, the London Mayor’s Office for Policing and Crime (MOPAC) and the Home Office. The Dashboard will assist analysts in identifying areas that require policy attention and will help improve interventions to stop hate speech from spreading online following ‘trigger events’.
“What recent events have shown us, such as terror attacks and political votes, is that hate crime has a significant temporal component. The referendum acted as a tipping point for some people who hold prejudiced views, but may have never acted upon them. The pro-Brexit, xenophobic narrative from some campaigners, and the vote result, helped galvanise and legitimise prejudiced beliefs, mobilising an otherwise silent minority to take to the streets.” Professor Williams said.
The Lab has already undertaken several preliminary studies regarding the spread of hate speech on social media, most notably around the murder of Lee Rigby in Woolwich in 2013. Within the past year the Lab has received funding amounting to £700K from the ESRC to study the use of Open Source Communications in academic, government and third sector research settings. The Social Data Science Lab forms part of the ESRC £64M Big Data Network and is part of the University’s Data Innovation Research Institute. The Lab will form part of the University’s Social Science Research Park (SPARK) in its new £300m Innovation Campus.
Social Data Science Lab awarded ESRC grant to set up new Centre to monitor Brexit-related hate crime on social media
6 February 2017
Matt Williams and Pete Burnap have been awarded a half a million pounds to help the UK Government monitor Brexit-related hate crime on social media.
The new Centre for Cyberhate Research and Policy, funded by the Economic and Social Research Council, will focus on the development of a monitoring tool that displays a live feed of the propagation of hate speech as it happens on Twitter.
It is hoped the UK Government will be able to use the tool to identify areas that require policy attention and to improve interventions to stop hate crime from spreading.
Professor Matthew Williams, the principal investigator on the project and Co-Director of the Social Data Science Lab, said: “Hate crimes have been shown to cluster in time and tend to increase, sometimes significantly, in the aftermath of “trigger” events. The referendum on the UK’s future in the European Union has galvanized certain prejudiced opinions held by a minority of people, resulting in a spate of hate crimes. Many of these crimes are taking place on social media. Over the coming period of uncertainly relating to the form of the UK’s exit, decision makers, particularly those responsible for minimising the risk of social disorder through community reassurance, local policing and online governance, will require near-real-time information on the likelihood of escalation of hateful content spread on social media. This new funding will provide the system and evidence needed to achieve this.”
The research team will use Brexit as a demonstrator of how a certain “trigger” event can quickly lead to the spreading of hate related to religion, immigration and xenophobia online.
The team are collecting data over a 12-month period, starting from 23 June 2016 when the UK voted to leave the European Union. They will use state-of-the-art machine learning technologies to classify, analyse, and evaluate tweets in real-time.
The key innovation stemming from the project is an online monitoring tool that can identify hate speech on social media as soon as it happens after a certain “trigger” event.
This tool will include a dashboard for policy makers and analysts that will provide details of precursors to hate speech, such as type of social media user, characteristics of their network, the type of hate expressed, the content that is posted (such as URLs and hashtags) and external factors such as mass media reporting.
The Centre is working in close partnership with the UK Head of the Cross-Government Hate Crime Programme at the National Police Chiefs’ Council (NPCC), the Online Hate Crime Hub at the London Mayor’s Office for Policing and Crime (MOPAC) and the Metropolitan Police Service, and several leading hate crime charities including Tell MAMA, Faith Matters and Community Security Trust.
Dr Pete Burnap, computational lead on the project and Co-Director at the Social Data Science Lab, said: “To date the information available to government on topics such as hate speech around Brexit has been post-hoc and descriptive. What is needed are open and transparent methods that are replicable, interpretable and applicable in real-time as events are unfolding. We will be enhancing our existing language models using cutting edge computational methods to mine massive amounts of public reaction and provide meaningful insights into hateful and antagonistic commentary within minutes of an event occurring”
Reports of hate crime both on and offline since the referendum on the UK’s future in the European Union have increased dramatically. In response and as part of the UK Government’s Hate Crime Action Plan, additional resources are being made available to protect places of worship and a review of policing hate crime will be conducted by Her Majesty’s Inspectorate of Constabulary.
Cardiff University’s Social Data Science Lab has world-leading expertise in the use of social media to monitor crime, and have successfully partnered with the Metropolitan Police Service, the Ministry of Justice, the Home Office and the Los Angeles Police Department.
The team have already undertaken several preliminary studies regarding the spreading of hate speech on social media, most notably around the murder of Lee Rigby in Woolwich in 2013. Within the past few months the Lab has received funding amounting to £700K from the ESRC to study the use of Open Source Communications in academic, government and third sector research settings. The Social Data Science Lab forms part of the ESRC £64M Big Data Network and is part of the University’s Data Innovation Research Institute. The Lab will form part of the University’s Social Science Research Park (SPARK) in its new £300m Innovation Campus.
Social Data Science Lab nominated for a National Hate Crime Award
17 November 2016
The Social Data Science Lab has been shortlisted to one of three nominees for the inaugural National Hate Crime Awards in the Upstanding Research & Innovation Award category. This Award if for a researcher or academic department at a University who has brought significant awareness, as well as new concepts and innovative ideas to the hate crime cause.
The Lab was shortlisted because of its recent work on cyberhate across multiple hate crime types. This includes research funded by the Big Lottery Fund, Economic and Social Research Council, Google, Welsh Government, and US Department of Justice.
Dr. Pete Burnap, Director of the Lab said “We are delighted that the work of the Lab has been applauded in this way, outside of the research sector. This recognition means a lot to us because it was members of the public, practitioners and police who nominated us.”
The National Hate Crime Awards was organised by Faith Matters – in partnership with a wide-range of agencies – to recognise individuals, activists and organisations who have ‘stood out and spoken up’ against prejudice and intolerance. The awards ceremony was attended by police, practitioners, government officials and academics. Communities Minister Nick Bourne MP and Shahid Malik MP opened the ceremony. The late Jo Cox MP and Paul Giannasi, the UK Hate Crime Programme Lead both won awards.
Professor Matthew Williams, Director of the Lab said “These awards have been launched at a time when intolerance and hate are on the rise and manifesting in new insidious forms. The Internet is a new frontier for spreading bigotry and vitriol. Social media companies, policymakers, law enforcement, charities and academics need to come together to develop strategies for protecting victims and bringing offenders to justice. These awards represent a step in that direction, and I wish them every success going forward.”
The National Hate Crime Awards ceremony took place on Thursday 17 November 2016 in Central London with the theme ‘Upstanders not Bystanders’. The awards were sponsored by Tell MAMA, the Community Security Trust, GALOP, Stonewall and the ‘No to Hate Crime’ campaign.
Social Data Science Lab Awarded Core Funding from ESRC
11 November 2016
The Social Data Science Lab has been made part of the £64M Big Data Network for the Social Sciences via a £700,000 grant
The new grant from the Economic and Social Research Council will fund the core activities of the Lab over the next 3 years. The Lab brings together social and computer scientists to study the methodological, ethical, theoretical, and technical dimensions of New and Emerging Forms of Data in social and policy contexts. The Lab was established in 2015 and builds on the successful COSMOS programme of research that ran between 2011-2015.
Co-Director of the Lab, Professor Matt Williams, from the School of Social Sciences said “The majority of individuals currently under 20 years of age in the Western world were ‘born digital’ and will not recall a time without access to the Internet. Combined with the migration of the ‘born analogue’ generation onto the Internet, fueled by the rise of social media, we have seen the exponential growth of online spaces for the mass sharing of opinions and sentiments. No study of contemporary society can ignore this dimension of social life. However, there currently exist methodological and infrastructural barriers that prevent the widespread use of ‘big social data’ in the social sciences, and this new funding will help the Lab realise its mission to democratise access to big social data among the academic, public and third sectors.”
The Government’s Policy Paper ‘Seizing the data opportunity: A strategy for UK data capability’ identified big data as one of the UK’s ‘eight great technologies’. The Labs’s empirical social data science programme is complemented by a focus on the development of new methodological tools and technical/data solutions for the UK academic and public sectors to enhance the UK’s capability in big social data analytics. They are assisting several government departments, law enforcement agencies, private corporations and charities to realize the potential of these data via RCUK research projects and knowledge transfer partnerships.
Dr. Pete Burnap, Co-Director of the Lab and Social Computing Research Priority Area lead in the School of Computer Science and Informatics, said “New forms of digital online social data, handled by computational methods, allow social and computer scientists to gain meaningful insights into contemporary social processes at unprecedented scale and speed. How we marshal these new forms of data present key challenges for researchers. The potential for world leading computational social science research that uses new forms of data is currently limited by the lack of existing reliable research infrastructure, such as software tools designed for social scientists. This core funding will allow Lab staff to dedicate the required time to develop and test these tools in research and policy contexts”.
The new grant will provide funds for a new Lab Research Fellow, dedicated computer and social science investigator time to develop new big data tools, and an advanced training programme in social data science analytics, that will educate researchers from academic, public and third sectors on how to use both quantitative and qualitative techniques to analyse new and emerging forms of online data.
The funding will see the Lab formalise its strategic research and training partnerships with a range of ESRC investments, including WISERD, ADRC Wales, CLOSER, NCRM, UK Data Archive and the Big Data Network. It will also provide resources to support its existing research partnerships with the Metropolitan Police Service, London Mayor’s Office for Policing and Crime, Office for National Statistics, Food Standards Agency, Department for International Development, Welsh Government, Airbus, Admiral Insurance, and US Department of Justice.
Lab Directors Professor Matthew Williams and Dr. Pete Burnap were appointed to the ESRC Phase 3 Big Data Network Working Group in 2014. The Lab forms part of Cardiff University’s Data Innovation Research Institute and will be located within the Social Science Research Park (SPARK).
Tackling Hate Online: New Guides for the Public
10 October 2016
A team of experts from Cardiff University have joined forces with the Welsh Government to develop three new on-line guides to help stem the rising tide of hate online.
Professor Matthew Williams from the University’s Social Data Science Lab and Dr Pete Burnup from the School of Computer Science and Informatics have worked with the Welsh Government to develop three new bi-lingual on-line hate speech guides targeted at young people, adults and practitioners.Each of the guides have been informed by research into hate crime and cyberhate conducted at the University over the past five years.
The new on-line guides provide information on the nature and patterns of online hate speech, the laws that are used to punish wrongdoers, the impacts of cyberhate on victims, effective ways for the public to use counter-speech, and how to make reports to the police.
“The Internet is the new frontier in crime,“ according to Professor Matthew Williams, Director of the Social Data Science Lab. “Cybercrime is one of the few forms of crime to see an increase year on year, while most other crimes are on the decline. Our research into the social media reaction to the murder of Lee Rigby in 2013 shows how people take to social media to spread hateful sentiments following events. A similar online reaction was observed following the referendum on the UK’s future in the European Union. The internet acts as a force amplifier for hate, and these guides, informed by cutting edge research at Cardiff, are aimed to give young people, adults and practitioners the information they need to combat the rising tide of hate online.”
Professor Williams led The All Wales Hate Crime Project (2011-2013) which formed the basis for the Welsh Government’s Framework for Action on Tackling Hate Crime. The project remains the largest study of hate crime in the UK, funded by the Big Lottery Fund for £570,000. Following on from this study, a team from Cardiff University won a grant from the Economic and Social Research Council and Google to study hate speech online. Professor Williams and Dr Pete Burnup, Co-Director of the Social Data Science Lab, have also secured a prestigious grant from the US Department of Justice to study cyberhate in Los Angeles.
Dr Burnap adds: “Every 60 seconds, nearly 300 thousand comments are posted on Facebook worldwide. In the UK alone, 30 million tweets are posted every day. To collect and analyse these data in our studies of hate speech online requires high performance computing and sophisticated machine learning algorithms to identify cyberhate at scale. We have been funded by the Economic and Social Research Council to develop these technologies with social scientists, resulting in outcomes that can have an impact in the lives of real people. Using technology to identify cyberhate, practitioners can now better target harm reduction initiatives to those most in need,” he added.
Cardiff University’s Social Data Science Lab forms part of the University’s Data Innovation Research Institute and will be located in the University‘s new Social Science Research Park (SPARK). The Social Data Science Lab brings together social, computer, political, health, statistical and mathematical scientists to study the methodological, theoretical, empirical and technical dimensions of New Forms of Data in social and policy contexts.
The three online guides have been lunched during this week’s Hate Crime Awareness Week which runs from 10th – 17th of October.
Lab wins Prestigious Grant from US Department of Justice
16 September 2016
Partnership between Cardiff University’s Social Data Science Lab and the RAND Corporation awarded National Institute for Justice grant
Together with the RAND Corporation (Santa Monica, LA) and Rand Europe (Cambridge, UK) Professor Matt Williams and Dr. Pete Burnap have been awarded a research grant by the National Institute of Justice, part of the US Department of Justice, for the project ‘Online Hate Speech as a Motivator for Hate Crime’. Over half of the funds are dedicated to supporting the Social Data Science Lab’s involvement in the $885,820 three year study. The project will investigate the utility of Twitter data for understanding what types, for whom, and where online hate speech acts a ‘signature’ of offline hate crime. Drawing extensively on previous social media research by Williams & Burnap (2016), Burnap and Williams (2016) and Williams et al. (2016) the study will develop a predictive statistical model that links online hate speech and offline hate crime, using tweets and reported hate crimes in Los Angeles County as a test case.
Professor Matt Williams said “Developing a better understanding of hateful sentiments online and their relationship with crimes on the streets could push law enforcement to better identify, report, and address hate crimes that are occurring offline. The ability to identify locations at greater risk of hate crime would also allow law enforcement and other agencies to identify emerging issues and take a proactive, preventive approach to hate crime. The insights provided by our work will help US localities to design policies to address specific hate crime issues unique to their jurisdiction and allow service providers to tailor their services to the needs of victims, especially if those victims are members of an emerging category of hate crime targets.”
Research conducted at the Social Data Science Lab has shown that Twitter data can be used to identify hot spots—states or cities—of hate sentiment where hate crime victims, such as recent immigrants fearful of deportation, may be unlikely to make official reports of hate crime. Analysing open source communications may also be useful in areas where hate crimes are targeting an emerging category of victims but too few official reports have been made to allow the identification of the emerging trends.
Dr. Burnap, computational lead on the project, said “This is the first study in the United States to use social media data in predictive policing models of hate crime. Predictive policing is a proactive law enforcement model that has become more common partially due the advent of advanced analytics such as data mining and machine learning methods. New analytic approaches and the ability to process very large data sets have increased the accuracy of predictive models over traditional crime analysis methods and this project will evaluate if police departments can leverage these new data and techniques to reduce hate crimes.”
Social media holds promise as a new source of information on emerging patterns of hateful sentiment that may be linked to offline hate crimes and identify more specific and accurate geographic patterns of changing risk in near-real-time. This project will allow law enforcement, government officials, and victim service providers to become aware of changing trends in online hate sentiment in their service areas, allowing for early prevention efforts and provision of services to victims.
Lab Directors Speak at UK Government Data Science Conference
22 June 2016
Data Science and Government Conference hosted by Oxford’s Blavatnik School of Government and Harvard’s Kennedy School of Government brings together leading experts
On June 22nd the Data Science and Government Conference held at the University of Oxford’s Blavatnik School of Government brought together policymakers and experts from industry and academia to discuss how emerging techniques in data science can best be used to support policy agendas in a range of areas. The conference was organised by the Behavioural Insights Team (or the Nudge Unit) who work with government to apply a better understanding of behavioural science and rigorous evaluation methods to a variety of policy challenges.
Over 150 delegates attended sessions that combined the latest academic findings with results from real-world projects, as well as collaborative sessions on data driven solutions to problems faced by government.
Professor Matt Williams and Dr. Pete Burnap talked about the Lab’s risk and safety and cybersecurity research programmes, presenting the latest empirical findings on cyber hate speech, crime prediction with open source communications, detection of suicidal language on Twitter and malware detection on social media.
Other speakers included David Halpern (What Works National Advisor and CEO of the Behavioural Insights Team), David Spiegelhalter (Winton Professor for the Public Understanding of Risk) and Michael Luca (Harvard Business School).
Tracking Cybercrime at Euro 2016
08 June 2016
Social Data Science Lab will deploy ‘intelligent system’ to track the spreading of malicious viruses across Twitter during Euro 2016
In an attempt to crack down on the spreading of malicious computer viruses, experts from the Social Data Science Lab will be trawling through thousands of suspicious links that are spread across social media during this year’s European Football Championships.
The researchers will be deploying a trained computer, known as an ‘intelligent system’, to trawl through thousands of inconspicuous URLs that are tacked on to a vast array of tweets relating to the ‘Euro 2016’ tournament in France.
The team are using the event as a test bed to further refine their computer system, funded by the Engineering and Physical Sciences Research Council (EPSRC). It will gather more information about the types of malicious viruses and software, collectively known as malware, that are being spread across Twitter.
It is hoped the information can be used to help law enforcement authorities develop a future warning system that can flag a malicious link to the computer user in real-time, which the researchers hope can also be rolled out in the form of an app to mobile users.
The European Football Championships have been specifically chosen to trial the detection systems due to the large volume of tweets that are sent during this time. According to data taken in 2014, the football World Cup that took place in Brazil was the most tweeted about event ever. In the same year, eight of the 10 most tweeted about events were sports related.
As such, sporting events are an ideal opportunity for cybercriminals to target the masses and spread malicious viruses through social media.
The intelligent computer system developed by the researchers at Cardiff will be searching for ‘drive-by downloads’, a name given to malicious malware that is downloaded onto a computer after a user simply visits, or ‘drives by’, a website.
The system has already been tested around two major sporting events – the Cricket World Cup and the Superbowl – and can currently pinpoint the exact time malicious viruses or software attack a computer, within a 30 second window, with 89% accuracy.
Dr Pete Burnap, Director of the Social Data Science Lab and lead scientist on the research, said: “It is well known that people use online social networks such as Twitter to find information about an event. URLs are often shortened on social media due to character limitations in posts, so it’s incredibly difficult to know which are legitimate.”
“Once infected the malware can turn your computer into a zombie computer and become part of a global network of machines used to hide information or route further attacks. At the moment many existing anti-virus solutions identify malware using known code signatures, which make it difficult to detect previously unseen attacks. Our system is making decision using code behaviour, which is more difficult for cyber criminals to mask.”
“We are trying to build systems that can help law enforcement authorities make decisions in a changing Cyber Security landscape. Social media adds a whole new dimension to network security risk. This work contributes to new insight into this and we hope to take this forward and develop a real-time system that can protect users as they search for information about real-world events using new forms of information sources”
Lab Invited to Cambridge Institute of Criminology
19 October 2015
Professor Matt Williams will present work on the social media reaction to the killing of Lee Rigby.
Professor Matthew Williams as been invited to the Cambridge Institute of Criminology to present the ESRC and Google funded work on the propagation of cyberhate in social media networks in the aftermath of the Woolwich terror attack. The research was recently published in the British Journal of Criminology and was covered in the national press. The Institute, founded in 1959, was one of the first criminological institutes in Europe and has exerted a strong influence on the development of the discipline. The presentation will take place on 11th February 2016.
Lab Invited to Duke University, US
19 October 2015
Social and computer scientists at the Social Data Science Lab at Cardiff University will present their work on the estimation of crime patterns using Open Source Communications.
Professor Matthew Williams and Dr. Pete Burnap have been invited to Duke University’s Department of Sociology to present their Economic and Social Research Council and National Centre for Research Methods funded work on the estimation of crime patterns using Open Source Communications. The Lab has identified a link between Social Media posts that relate to signs of neighbourhood degeneration and police recorded crime rates in London boroughs. This is the first study globally to establish measures in social media data based on criminological theory, and to relate them to offline crime patterns. Duke is one of the most prestigious private universities in the US placed in the top 10 by the National University Rankings. The Lab’s fully funded visit will take place on 27th February 2016.
Scientists combat cyber-attacks on Twitter
30 September 2015
University researchers develop intelligent system to identify malicious links spread through social media
Cyber-criminals are taking advantage of real-world events with high volumes of traffic on Twitter, such as the Superbowl and Cricket World Cup, in order to post links to websites which contain malware.
To combat the threat posed in this ‘perfect environment’, researchers from the University have created an intelligent system to identify malicious links disguised in shortened URLs on Twitter. They will test the system in the European Football Championships next summer. The research is co-funded by the Engineering and Physical Sciences Research Council (EPSRC) and the Economic and Social Research Council (ESRC).
In a recent study the research team, from the Social Data Science Lab, identified potential cyber-attacks within five seconds with up to 83% accuracy and within 30 seconds with up to 98% accuracy, when a user clicked on a URL posted on Twitter and malware began to infect the device.
The scientists collected tweets containing URLs during the 2015 Superbowl and Cricket World Cup finals, and monitored interactions between a website and a user’s device to recognise the features of a malicious attack. Where changes were made to a user’s machine such as new processes created, registry files modified or files tampered with, these showed a malicious attack.
The team subsequently used system activity to train a machine classifier to recognise predictive signals that can distinguish between malicious and benign URLs.
Dr Pete Burnap, Director of the Social Data Science Lab at Cardiff University, and lead scientist on the research, said: “Unfortunately the high volume of traffic around large scale events creates a perfect environment for cyber-criminals to launch surreptitious attacks. It is well known that people use online social networks such as Twitter to find information about an event.
“Attackers can hide links to malicious servers in a post masquerading as an attractive or informative piece of information about the event.
“URLs are always shortened on Twitter due to character limitations in posts, so it’s incredibly difficult to know which are legitimate. Once infected the malware can turn your computer into a zombie computer and become part of a global network of machines used to hide information or route further attacks.
“In a 2013 report from Microsoft these ‘drive-by downloads’ were identified as one of the most active and commercial risks to Cyber security.
“At the moment many existing anti-virus solutions identify malware using known code signatures, which make it difficult to detect previous unseen attacks.”
Professor Philip Nelson, Chief Executive, EPSRC said: “Using social media is an integral part of modern life, vital to organisations, businesses and individuals. The UK needs to operate in a resilient and secure environment and this research will help combat these criminal Cyber-attacks.”
Social Data Science Lab finds link between social media and crime patterns
25 August 2015
Social and computer scientists at the Social Data Science Lab at Cardiff University have analysed tweets to help predict crime patterns in London.
In 2013 the Lab was awarded an Economic and Social Research Council grant to examine if ‘big data’ – data sets so large or complex that traditional data processing applications are inadequate – can predict crime. It took 12 months to collect 180 million geocoded tweets and close to 600,000 Metropolitan Police recorded crime incidents, and a further nine months to transform the data in order to build their predictive models.
The inter-disciplinary project team includes academics from Cardiff University’s School of Social Sciences and the School of Computer Science and Informatics.
The research developed new data fusion techniques and improved upon existing mathematical models that have used social media data to predict voting patterns, the spread of disease, the revenue of Hollywood movies, and the estimates of the centres of earthquakes.
Project leader and Lab Director Professor Matthew Williams, who came up with the hypothesis that social media communications are related to offline crime patterns said: “These studies illustrate how social media generates naturally occurring socially relevant data that can be used to complement and augment conventional curated data to predict offline phenomena.
In our project, we hypothesised that crime and disorder related tweets would be associated with actual crime rates. Our preliminary statistical results that are driven by criminological theory show that tweets about certain crime types and signatures of crime and disorder help estimate actual patterns of crime, often over and above conventional correlates such as unemployment and proportion of young people in an area”.
The outcomes of the project will be of use to such organisations as the Metropolitan Police Service, the Home Office, the Association of Chief Police Officers, the College of Policing, Police and Crime Commissioners, the Office for National Statistics and various voluntary organisations.
Dr Luke Sloan, Deputy Director, said: “The potential value added by social media data is that it is user-generated in real-time in voluminous amounts. As such it can provide insight into the behaviour of populations on the move; the ‘pulse of the city’. This is in contrast to the necessarily retrospective snapshots of social trends and populations provided by conventional methods such as household surveys and officially recorded data.”
“We have employed advanced statistical analysis that takes into account variation in time and space given that new forms of big data, like social media communications, occur in real-time, unlike conventional data that the police are used to using. These models allow us to re-test classic criminological theories, bringing their explanatory power into the 21stcentury” said Professor Williams.
He continues: “Recent claims have been made that big data make theory and scientific method obsolete. Yet high profile failures of big data, such as the inability to predict the US housing bubble in 2008 and the spread of influenza across the US using Google search terms, have resulted in many questioning the power of these new forms of data.”
Dr. Pete Burnap, Director of the Lab and computational lead on the project, commented: “To date the default approach in big data research seems to have been wholly data driven in the effort to predict. However, without theory driven data collection, transformation and analysis we cannot answer the substantive questions about social processes and mechanisms that concern us. Purely data driven approaches tend to produce models and algorithms that are over fit to the idiosyncrasies of a particular data set, leading to spurious results that often do no not reflect reality. This is why we have put a series of strict checks and balances in place, such as augmenting big data with conventional sources and using theory to drive our analytical process.”
This work was made possible by a National Centre for Research Methods Methodological Innovations grant and was recently featured in their summer newsletter. Funds from various Economic and Social Research Council programmes including Digital Social Research, Google Data Analytics, Global Uncertainties and National Centre for Research Methods, have also enabled the Lab researchers to detect online racial tension following sporting events, model the propagation of cyberhate following a terrorist attack, and detect the presence of counter-speech as form of online community based regulation.
Detecting crime using social media
12 August 2015
‘World-leading’ predictive tools will be used by Met Police to monitor real-time crime events
Researchers from the Social Data Science Lab have received research grants from the Centre for Scientific and Engineering Excellence at the Metropolitan Police Service and the ESRC Impact Acceleration Fund, to embed their world-leading and internationally recognised research on predictive analytics using social media into police operational processes.
The Lab Directors, Professor Matthew Williams from the School of Social Sciences and Dr. Pete Burnap from the School of Computer Science & Informatics, have previously developed social media computational predictive models to estimate the emergence of disruptive crime events and the propagation of cyberhate.
Social media is generating high-volume data through multiple forms of online behavior. Estimates put social media membership at approximately 2.5 billion non-unique users. The data produced by these users have been used to predict elections, movie revenues and even the epicenter of earthquakes.
Project lead Dr. Burnap said: “Previous research that has examined the use of social media data in crime and policing contexts has been based in large metropolitan areas, such as Chicago and San Francisco. London, with its 2.5 million Twitter users, is the ideal city to further develop our social media and policing research using the COSMOS software platform. Social media data can be considered a form of open source intelligence that can assist the police in their real-time decision making practices.
“These new grants will allow us to achieve a measurable impact within the Metropolitan Police Service, embedding our predictive social media models into their operational processes. MPS have kindly agreed to provide access to datasets that will allow us to validate our models against real-world crime incidents.”
Professor Williams said: “There is a clear need in industry and the public and third sectors for a greater understanding of how these new forms of big social data can be marshaled to add value to existing practices. With the right statistical checks-and-balances in place and guided by criminological theory, social media data can complement and augment conventional police information to estimate crime patterns.
“The Social Data Science Lab is committed to generating world-leading research in the areas of crime, safety and wellbeing to inform a knowledge-base that can be embedded in commercial, policy and practice domains.”
A senior representative from the Metropolitan Police Service, said: “We have valued greatly our engagement with the Social Data Science Lab at Cardiff University to date as they have provided much needed insight into how social media can be exploited for the benefit of security and policing. They have already carried out some extensive research around identifying hate speech and the new project takes this further to develop and critique methodologies and tools for the identification of disruptive events and behavior. We look forward to an innovative future with the Lab”
Police need to focus on first 24 hours following terrorist event to ‘de-escalate’ cyberhate
22 May 2015
Intervention by police within the first 24 hours of a terrorist event could be key to de-escalating and halting the spread of cyberhate, according to a new study by Cardiff University social and computer scientists
Published in the British Journal of Criminology on the second anniversary of the murder of Lee Rigby, the study, which used Big Data to measure the online reaction to the murder of the Fusilier in 2013, showed that cyberhate has a ‘half-life’ which has significant implications for police and policy interventions in terrorist events.
It found that cyberhate in the aftermath of the Rigby murder peaked in the first 24 hours following the attack before declining sharply over the 15-day analysis window, suggesting that police need to focus their interventions on this stage to maximise the fight against online hate.
The project was carried out using the ESRC-funded Cardiff Online Social Media Observatory (COSMOS) software developed at Cardiff University. The new research focuses specifically on the production and spread of racial and religious cyberhate and the Twitter battle between police and far-right political groups in the first 36 hours following the attack.
The research showed that tweets from police and the media were around five times more likely to be retweeted compared to all other tweets from other users following the attack.
It further suggests that the dominance of traditional media and police information flows in social media indicates these are likely effective channels for the countering of rumour, speculation and hate.
Published in the British Journal of Criminology, the article builds upon previous research by Professor Matthew Williams, Cardiff University School of Social Sciences, and Dr. Pete Burnap, Cardiff University School of Computer Science and Informatics, which showed that Tweets containing positive sentiment were more likely to be retweeted than those containing negative sentiment following the attack.
Professor Matthew Williams said: “This pattern reflects offline cases of hate crime following similar events, such as the Bali and Madrid bombings. We concluded that cyberhate has a ‘half-life’ following crime events of national interest. The sharp de-escalation of hate can be explained by post-event media and police Twitter messages that have a defusing effect and counter-speech from everyday Twitter users that challenge abusers.
“Given the recent criminal justice response to cyberhate, our findings have several potential operational and policy implications. The ‘half-life’ of cyberhate and its rapid de-escalation suggests the police need to focus their interventions within this impact stage to increase the rate of de-escalation further. Further, the dominance of traditional media and police information flows in social media indicates these are likely effective channels for the countering of rumour, speculation and hate.”
Dr Pete Burnap said: “The ability to observe a large portion of the population in near real-time via social media networks provides those responsible for ensuring the safety of the public a new window onto mass social reaction. Evidence from our research shows that cyberhate can form part of a social reaction in relation to a terrorist event – therefore these technologies may act as early warning systems for the amplification of deviance beyond the event itself.”
“The small but sustained nature of these hateful tweets indicates that they receive limited endorsement, but where there is support it emanates from core group of Twitter users who seek out each other’s messages over time. Therefore, contagion of cyberhate on Twitter is contained and unlikely to spread widely beyond such groups.”
The research also showed that far-right political groups and individuals were quick to use the attack to further their cause, and were more likely to produce tweets containing religious and racial cyberhate. Tweets from these groups and individuals were more likely survive (be retweeted over longer periods) in the first 36 hours following the event, but were less likely to be retweeted by a large amount of Twitter users.
The College of Policing are currently putting 6,000 officers through their Mainstreaming Cybercrime Training Course, which covers cyberhate and harassment online.
A pre-print version of the paper can be downloaded here
Academics have recently seen their paper published in an international 4* journal.
Luke Sloan and Matt Williams were two of the authors of ‘Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data’ which was published in PLOS ONE on March 2.
This paper is a response to a simple question: who uses Twitter? Cardiff is currently leading the field on establishing demographic proxies for Twitter data including age, occupation and class in this paper but also location, gender and language in a previous publication.
Speaking about the publication Luke Sloan (pictured) said: “Deriving this information about users adds value to the data and enables us to ask important social scientific questions which would otherwise be impossible e.g. does political party support on class lines manifest online? Do people in deprived areas express greater fear of crime online? Demographic information makes Twitter data useful.”
Both Luke and Matt knew there was a knowledge gap in this area but didn’t anticipate the amount of attention it generated.
“By March 19 the paper had nearly 4,000 page views and 150 pdf downloads,” added Luke.
“I guess we’re reaping the rewards of open access publishing via an online journal and I certainly know that organisations outside of academia have read this work, most notably it’s feeding in to some of the work that the Office for National Statistics is doing via their Big Data Project.”
‘Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data’ is available to view online.
Twitter helps predict hung parliament
27 March 2015
Researchers from Cardiff and Manchester have used Twitter to predict election result.
A group of researchers from Cardiff and Manchester Universities have used Twitter to predict a hung parliament on 7 May.
Burnap, Sloan, and Williams from Cardiff, along with Gibson and Southern from Manchester University, make up one of twelve groups of leading academics who are showcasing their findings today at a major LSE conference (see table below).
The Cardiff / Manchester group is the only one to have used Twitter to predict the election result. All 12 groups agree no single party will win the 326 seats required for an overall majority.
The Cardiff / Manchester team forecasts the Conservatives will poll 286 votes, Labour 306, the Liberal Democrats 21 and UKIP 5, with the SNP on 9, Plaid Cymru on 3, and the Greens on 1.
Dr Pete Burnap, Director of the Social Data Science Lab said:
“This is one of the first ventures by the UK political academic community into using Twitter to forecast Elections.
“Predictions have been published with varying levels of accuracy elsewhere in the world, but have received some criticism due to methodological issues.
“We are currently producing long-range forecasts, and will study whether accuracy improves/decreases in the lead-up and after the result emerges.”
|Stegmaier and Williams
|Ford, Jennings, Pickup and Wlezien (Polling Observatory)
|Clarke, Stewart and Whiteley
|Rallings, Thrasher and Borisyuk
|Hanretty, Lauderdale and Vivyan (electionforecast.co.uk)
|Burnap, Gibson, Sloan, Southern and Williams
|Lewis-Beck, Nadeau and Belanger
29 January 2014
Professor Matthew Williams and Dr. Pete Burnap invited to ESRC’s Newton Fund seminar and Trans-Atlantic Platform workshop in Washington DC.
Professor Matthew Williams and Dr Pete Burnap were invited to represent the Social Data Science Lab at the ESRC’s Newton Fund seminar in London earlier this month, where he presented to a delegation from the Indian Council of Social Science Research.The seminar was aimed at exploring cross council strategy in the area of big data. Both were also invited by the ESRC to attend the Trans-Atlantic Platform workshop in Washington DC on Jan 27th. The aim of the workshop was to establish global links for collaborative funding in the area of social media research. The meeting was attended by representatives from Horizon 2020 and the National Science Foundation.
Discovering the impact of the horse meat scandal using social media
21 August 2014
Cardiff University researchers will discover public perceptions of the recent horse meat scandal for the first time by analysing social media data.
The horse meat scandal last year revealed a major breakdown in the traceability of the food supply chain and the adulteration of meat. The extensive media coverage revealed not only widespread fraud but also the complexity of the UK meat supply chain and the extent of meat imports.
The project will investigate how the growing complexity of international food supply chains is giving rise to a new generation of risks and concerns.
The Social Data Science Lab has been awarded an Economic and Social Research Council (ESRC) grant under their Global Food Security Programme; a joint initiative with the Food Standards Agency (FSA). The project is in collaboration with NatCen, the University of Warwick and the University of Westminster.
Dr Luke Sloan from the Cardiff School of Social Sciences, said: “We are delighted to be working on this trail-blazing project funded under the Understanding the Challenges of the UK Food System call. The research will generate new empirical findings on public perceptions of UK food supply chains, what people’s concerns are, what influences these and how they may be best managed in the future.”
Professor Matthew Williams, who was recently appointed to the ESRC’s Social Media Experts working group to represent the Lab, said: “We are honoured to be part of the Global Food Security Programme and look forward to deploying the COSMOS platform and lending our social science expertise in relation to big data to this innovative project.”
The project team are Dr Luke Sloan, Professor Matthew Williams (COSMOS, Cardiff School of Social Sciences), Dr Pete Burnap (COSMOS, Cardiff School of Computer Science and Informatics), Caireen Roberts (NatCen), Professor Elizabeth Dowler (University of Warwick) and Dr Alizon Draper (University of Westminster).
Twitter shows love for Lee Rigby
22 May 2014
Scientists studying the social media activity in the immediate aftermath of Lee Rigby’s murder have found that messages loaded with racial tension and hate were far less likely to spread than those infused with love.
By collecting half a million tweets related to the attack via Twitter, academics from the Social Data Science Lab were able to statistically model how the public reacted and have published their findings in the international peer-reviewed journal Social Network Analysis and Mining. The authors were particularlyinterested in identifying the tweets that were most likely to spread following the event.
“The results were surprising, and did not support the common conception that social media platforms are havens for those spreading hateful and social disruptive online content,” said Professor Matthew Williams, from the School of Social Sciences.
“To the contrary, we found that tweets that included high levels of racial tension, such as those spreading hateful content towards those of Muslim faith, were statistically less likely to be retweeted than messages containing positive sentiment, such as tweets of good-wishes to the family of Lee Rigby,” he added.
Messages of love, the academics found, were statistically more likely to be frequently retweeted and form large and long lasting information flows. This was based on a methodology that focused on the emotive content of messages such as negative and positive sentiment, and racial tension, as well as content linking features within messages, such as hashtags and URLs.
Dr Pete Burnap from the School of Computer Science and Informatics said:
“Social media has often been associated with the spread of malicious and antagonistic content that could pose a potential risk to community relations. We frequently hear about trolling and social media being used to harass members of the public or certain groups in society. However, this research provides some evidence that suggests it is actually the more positive and supportive messages that spread to a significant extent following events of this nature.”
The findings are the first to indicate that social media platforms, in particular Twitter, may self-regulate, stemming the flow of negative and hateful information following terrorist and similar events of national interest. The next phase of the research for the Lab team is to investigate if and how social media users engage in counter speech, to stem the spread of negativity online.
Professor Williams continued: “Social scientists at Cardiff University have been conducting research into how people behave online for over three decades. Some of this work on virtual communities has shown how self-regulation, or what criminologists have called responsibilization, is evident online. It seems plausible that this pattern of behaviour is present in social media networks.”
Research Fellow, Dr Claire Wardle from the Tow Center for Digital Journalism at the Columbia Journalism School, said: “It’s tempting for judgments to be made about online conversations, which are based on little more than personal perception, hunches or well-publicised incidents of abuse. This research, based on a rigorous methodology, provides real insight into the conversations and content shared daily on social networks.
“I look forward to the same methodology being applied to other events so we can understand whether these results were a one-off caused by such a shocking incident, or in fact represent a pattern of behaviour around comments on social networks. It would be wonderful to know that, while hate speech and abuse does remain a problem, that actually the vast majority of people use social networks to support one another and their communities.”
The terrorist attack on Lee Rigby was the first in the UK to foster a significant social media reaction. In less than 20 minutes of the incident being reported to the police, eye-witnesses were using Twitter to spread information about the event as it unfolded. Disturbing images and video clips emerged online shortly after. These snippets of information were rapidly diffused through the social media eco-system by the act of ‘retweeting’. The Lab also examined whether the speed at which tweets were re-tweeted affected the eventual number of retweets, and if Twitter users with more followers gained more influence in the spread of messages.
Researchers now plan to apply the same statistical model to several more events, including the Boston bombings, the coming out of Olympian Tom Daley on YouTube, the Paralympic opening ceremony, and the online harassment of Caroline Criado-Perez.
The research is being conducted as part of the Economic and Social Research Council (ESRC) Google Data Analytics grant: ‘Hate’ Speech and Social Media: Understanding Users, Networks and Information Flows’.