Scientists analyse millions of news articles

27-11-2012 — / —  A study led by academics at the University of Bristol’s Intelligent Systems Laboratory and the School of Journalism at Cardiff University has used Artificial Intelligence (AI) algorithms to analyse 2.5 million articles from 498 different English-language online news outlets over ten months.

The researchers found that:

  • As expected, readability measures show that online tabloid newspapers are more readable than broadsheets and use more sentimental language. Among 15 US and UK newspapers, the Sun is the easiest to read, comparable to the BBC’s children’s news programme, Newsround, while the Guardian is the most difficult to read.  ‘Sport’ and ‘Arts’ were the most readable topics while ‘Politics’ and ‘Environment’ were the least readable.
  • The Sun is also the most likely to use adjectives with sentiment, while the Wall Street Journal uses the fewest emotional adjectives.
  • The study found that men dominated the content of newspapers during the period analysed.  The ranking of topics based on the gender bias of the articles found ‘Sport’ and ‘Financial’ articles were the most male biased, with sports news mentioning men eight times more often than women.  ‘Fashion’ and ‘Arts’ were the least biased, with ‘Fashion’ articles being one of the few topics featuring equal proportions of men and women.
  • The most appealing topics to online readers were ‘Disasters’, ‘Crime’, and the ‘Environment’ while the least appealing topics were ‘Fashion’, ‘Markets’ and ‘Prices’.  The researchers also found that the popular articles tend to be more readable and more linguistically subjective.

Nello Cristianini, Professor of Artificial Intelligence at the University of Bristol, speaking about the research, said: “The automation of many tasks in news content analysis will not replace the human judgement needed for fine-grained, qualitative forms of analysis, but it allows researchers to focus their attention on a scale far beyond the sample sizes of traditional forms of content analysis.”

Professor Justin Lewis, Head of the School of Journalism, Media and Cultural Studies at Cardiff, added: “Even some of the more predictable findings give us pause for thought. The extent to which news is male dominated shows how far we are from gender equity across most areas of public life. The fact that articles about politics are the least readable might also explain widespread public disengagement.”

The study is published online in Digital Journalism.

Paper: Research methods in the age of digital journalism, Ilias Flaounas, Omar Ali, Thomas Lansdall-Welfare, Tijl De Bie, Nick Mosdell, Justin Lewis and Nello Cristianini, Digital Journalism, published online ahead of print 01 Nov 2012.

More information about this research, including images and a link to the paper, is available at:

Further information:

About the study

The study focused on two units of analysis: topics and outlets. The researchers compared topics according to their writing style and the male/female ratio of the most frequently mentioned people in that topic. They also compared 15 major US and UK newspapers according to the same criteria, as well as the popularity, in terms of readers’ preferences, of a sub-set of articles.

State-of-the-art AI techniques, including data mining, machine learning and natural language processing, were used to analyse the news media content. The outlets tracked were mainstream traditional media, which offer their content online in news feeds format. The researchers monitored the main feed advertised in the home pages of the outlet. The overall effect of this technology, the researchers suggest, can complement the skills of human scholars, allowing the social sciences to be both more ambitious and more comprehensive in scale.

Professor Nello Cristianini is supported by the PASCAL 2 Network of Excellence, and the CompLACS FP7 project.

About the Intelligent Systems Laboratory (ISL)

The University of Bristol’s Intelligent Systems Laboratory (ISL) is in the Merchant Venturers School of Engineering.

The University has a long tradition of excellence in Artificial Intelligence, with research groups in Engineering dating back to the 1970s and 1980s. Research activities at the ISL include foundational work in machine learning (many of the ISL members work in this central area of research), and applications to web intelligence, machine translation, bioinformatics, semantic image analysis, robotics, as well as natural intelligent systems. More information about the ISL is available at


Comments are closed.