Introducing COSMOS 2.0

Social media platforms have become an essential outlet to express individuals’ opinions, experience and promote ideas in society. According to estimates, approximately 45 million people in the UK are social media users or 66% of the total population. Daily social media usage is 1 hour 42 minutes in the UK (‘UK Social media statistic for 2020’, 2020). This results in large amounts of communications data and metadata being generated and processed by social media platforms. For instance, 350.000 tweets are posted per minute globally (‘The Number of tweets per day in 2020’, 2020).

Naturally, social media platforms are very interesting data sources for social scientists who are trying to understand and explain the ways in which society works. 

One of the biggest challenges social scientists face these days is the ability to access and analyze big data generated by social media platforms. While some platforms make their data publicly available, they require computational skills to access this valuable resource.  COSMOS is developed and supported by the Social Data Science Lab to address this very issue.

COSMOS aims to democratise access to big data among academic, private and public sectors, providing ethical access to social media data for researchers.

What is COSMOS?

COSMOS (Collaborative Online Social Media Observatory Software) is a social media analysis tool that can be accessed free of charge by academic and non-profit organisations.  COSMOS is an ESRC investment that is funded as part of the Big Data Network.

COSMOS’ main objective is to help researchers who lack technical and computational skills to collect, store, analyse, and visualise huge social media datasets for their research. COSMOS is an all-in-one solution to support these activities. The aim of COSMOS is to lower the barrier to entry into social media data analytics for social scientists and others.

The first release of COSMOS software dates back to 2015. At the Social Data Science Lab, we are very excited to release a revamped version – COSMOS 2.0.

COSMOS Functionality

COSMOS can be used for collecting both real-time Twitter data or importing previously collected datasets. It can collect real-time data via the Twitter filter stream API. In most cases, researchers need to analyse social media data within specific parameters such as gender, language, sentiment, keyword or geographical place.  They do this to understand the changes in these parameters (and correlation with the public mood, tension, cohesion, etc.) around a particular topic or trigger event.

 

COSMOS provides filtering and querying features with its new user-friendly interface to meet this need. Its filtering feature can be used for collecting real-time Twitter data while the querying feature can be useful for creating subsets of interests from the data. For example, a COSMOS user can collect data with the keyword ‘pandemic’ and extract the tweets containing the ‘vaccine’ keyword and positive sentiments within the specific time frame selected.

COSMOS provides data analysis at both individual tweet and corpus level. Currently, the types of analyses COSMOS supports are gender and language detection, sentiment analysis, qualitative overview, geospatial location analysis, keyword analysis, longitudinal tweets frequency analysis, social network analysis (Pete Burnap, Javier Conejero, 2014). Here are some details for these analysis types:

  • Gender detection: COSMOS derives the gender of the user who posted a tweet using an algorithm that maps the user name of the account on 44,000 names that have been manually classified as male, female or unisex. COSMOS visualizes the percentage of tweets posted by female, male, unisex or unknown gender users in the corpus using the pie-chart view. 
  • Language detection: COSMOS is able to detect the language of tweet text among 52 languages. Language detection analysis can be used when filtering and querying data in COSMOS. It shortens the time of analysis when handling large datasets.
  • Sentiment analysis: Sentiment analysis is a process that helps to determine whether a text is positive, negative or neutral. Social scientists and data analysts get help from sentiment analysis while monitoring public opinion for specific topics or events in social media. Because of that, COSMOS integrates the SentiStrength sentiment analysis tool and provides sentiment scores between -5 and +5 for each tweet text (SentiStrenght, 2017). Also, it enables the querying of Twitter data based on sentiment analysis.  
  • Qualitative overview:  COSMOS provides a table view so that users can have an overview of the collection with parameters such as tweet text, gender, sentiment analysis, location, etc.
  • Keyword analysis: COSMOS visualises keyword occurrence in the word cloud view. Users can create a word cloud based on tweet text and drop extremely frequent words that appear in almost all tweets. This feature provides a simple text analysis tool and helps to visualise frequent and dominant words within the tweet corpus.
  • Geospatial distribution: COSMOS visualises geographic data points on the map based on the location of tweets posted. This feature provides insight into how the population is responding to a specific event or occasion. Furthermore, the geo-map view displays hotspots of tweets within the collection. COSMOS’s new interface makes it very easy to create subsets based on the location using a map view.
  • Social network analysis: COSMOS provides social network analysis within the network view. COSMOS currently allows the user to characterise network structure based on retweets and mentions. Nodes help to visualise connection and interaction between users in the network view. It also helps to identify dominant user accounts and their influence level within the information propagation network.
  • Frequency analysis: This view displays the tweet density of tweets over a specific timeframe. COSMOS provides three bar charts for visualising the frequency of tweets on a daily, hourly and minute basis. These charts help to spot tweet spikes over time easily. Chart sliders can be used to create a subset around these specific spike dates which helps to remove noise when analysing the data.

Given the above, COSMOS is a very compact and useful software to collect and analyse social media data. It allows users to collect and analyse large-scale social media datasets with its simple and accessible user interface without writing a single line of code!

To install COSMOS, please visit http://socialdatalab.net/COSMOS website and request a download link.

To learn more about how to use COSMOS, please check out the instruction videos on the http://socialdatalab.net/instruction-videos website. In the coming months, we will organise online demo sessions and workshops to demonstrate the capabilities of COSMOS. These sessions will be advertised on our website and social media channels so make sure to keep an eye on our website and follow our Twitter @socdatalab if you are interested in attending these sessions. 

If you have any questions, please feel free to contact us at cosmosprojectuk@gmail.com.

Pete Burnap, Javier Conejero (2014) ‘COSMOS: Towards an integrated and scalable service for analysing social media on demand’.

‘The Number of tweets per day in 2020’ (2020). Available at: https://www.dsayce.com/social-media/tweets-day/.

‘UK Social media statistic for 2020’ (2020) UK Social media statistic for 2020. Available at: https://www.avocadosocial.com/uk-social-media-statistics-for-2020/.

SentiStrenght (2017). Softpedia. Available at: http://sentistrength.wlv.ac.uk/.

Serenay Ozalp

Engagement Lead

Social Data Science Lab