CeRch seminar: Webometric Analyses of Social Web Texts: case studies Twitter and YouTube

Herewith a slightly belated report of the recent talk in the CeRch seminar series given by Professor Mike Thelwell of Wolverhampton University. Mike’s talk, Webometric Analyses of Social Web Texts: case studies Twitter and YouTube concerned getting useful information out of social media, primarily social science means: information, specifically, about the sentiment of the communications on those platforms. His group produces software for text based information analysis, making it easy to gather and process large scale data, focusing on Twitter, YouTube (especially the textual comments), and the web in general and the Technorati blog search engine, also Bing. This shows how a website is positioned on the web, and gives insights as to how their users are interacting with them.

In sentiment analysis, a computer programme reads text and predicts whether it is positive or negative in flavour; and how strongly that positivity or negativity is expressed. This is immensely useful in market research, and is widely employed by big corporations. It also goes to the heart of why social media works – they function well with human emotions, and tracks what role sentiments have in social media. The sentiment analysis engine is designed for text that is not written with good grammar. At its heart is a list of 2,489 terms which are either normally positive or negative. Each has a ‘normal’ value, and ratings of -2 – -5. Mike was asked if it could be adapted to slang words, which often develop, and sometime recede, rapidly.  Experience is that it copes well with changing language over time – new words don’t have a big impact in the immediate term. However, the engine does not appear to work with sarcastic statements which, linguistically, might have diction opposite to its meaning, now with (for example) ‘typical British understatement’. This means that it does not work very well for news fora, where comments are often sarcastic and/or ironic (e.g. ‘David Cameron must be very happy that I have lost my job’). There is a need for contextual knowledge – e.g. ‘This book has a brilliant cover’ means ‘this is a terrible book’, in the context of the phrase don’t judge a book by its cover. Automating the analysis of such contextual minute would be a gigantic task, and the project is not attempting to do so.

Mike also discussed the Cyberemotions project. This looked at peaks of individual words in Twitter, e.g. Chile, when the earthquake struck in February 2010. As might be expected, positivity decreased. But negativity increased only by 9%: it was suggested that this might have been to do with praise for the response of the emergency services, or good wishes to the Chilean people. Also, the very transience of social media means that people might not need to express sentiment one way or another. For example, simply mentioning the earthquake and its context would be enough to convey the message the writer needed to convey. Mike also talked about the sentiment engine’s analysis of YouTube. As a whole, most YouTube comments are positive, however those individual videos which provoke many responses are frequently negatively viewed.

Try the sentiment engine (www. http://sentistrength.wlv.ac.uk). One wonders if it might be useful in XML/RDF projects such as SAWS, or indeed to book reviews on publications such as http://www.arts-humanities.net.