About assessing academic performance and the Research Excellence Framework (REF)

Couple of weeks ago, I received a message informing me that the outcome of the latest round preparing the University of Manchester for the up-coming Research Excellence Framework (REF) is on line and I can view my results on a specific, dedicated part of the IT system. It shouldn’t come as a surprise that this part of the IT system is called REF Preparation Exercise.

Looking at my scores, made me consider again the way in which academic research and research outputs are assessed. Since in doing this credibility is important, let me assure you that I don’t see myself as being a victim of the system and I am doing rather well with it. Any criticism voiced here is because I’d like to see it working better rather than a ‘sour grapes’ kind of complaint.

But let me set out the scene for my readers outside the UK. British universities are in midst of preparing for the REF that will take place in 2014 and will assess ‘research that has taken place’ between 2008 and 2013. In fairness and honesty, British universities have been continuously preparing for one or other round of this exercise ever since its inception in the mid-1980s. The REF is the latest manifestation of the Research Assessment Exercise (RAE) that started in 1986 and through it a set of changes in the relationships between government (the state) and science were initiated. This exercise was to run every five years and assess the research performance of university units through a panel based peer review system; the Panels assign units a certain number of ‘stars’ (currently between 1 and 4). Base line funding for research depends on this assessment and so it is perceived as important by the universities to ‘do well’. In effect, this exercise is as much about reputation as it is about funding. For more on this you can see this article.

Universities have learned to play different games in light of these exercises; from going to great length to get their staff represented on the different panels to ensuring that members of faculty publish regularly in the ‘right’ journals and their publications are of the ‘right’ quality. The University of Manchester is no exception.

We, for instance, have an annual REF preparation exercise where members of staff select a number of their papers (all pre-entered in eScholar – a web based system used to keep a record of our publications); these are read by senior members of staff (Senior Lecturers and Professors) and assigned number of stars corresponding to these assigned by the REF panels (this mind, was done even when we were not entirely clear about the rules and criteria of assessment). There have been plenty of criticisms levelled on the practice mainly relating to the possibility that peer review works at the level of organisations and organisational units, the substantial costs involved and the vague criteria such assessments by necessity employ. Apart from the criticisms directed to nature of assessment there have been concerns regarding the use of the outcome from the preparation exercise (in promotions, for instance) and the Union has called for boycotting participation. Here, I don’t intend to discuss any of these.

What I would like to do is to share my observations regarding some features of the scores of my articles and share what I believe to be the fundamental problem in this kind of assessment system, namely a system focusing on the assessment of published output.

My observations are two:

  1. Any paper published in the top journal of the broad field (Research Policy) received 4*. This is probably one more example that reputation matters.
  2. Output published in other outlets (not top ranking broad domain journals) received lower rankings and it is likely that they were read more carefully. In this case, since knowledge of the narrow field is essential, the timeline of the publication obviously mattered – earlier publications that already have citations were ranked much higher. Epistemic difference also likely played a large role – the bigger the epistemic difference between the narrow research interests and approach of the assessor and the assessed the lower the ranking. I suppose, additional considerations came to play.

This, irrespective of any other concerns, led to couple of anomalies. These are that:

  1. A chapter in a book edited by Polish colleagues was rated 1* (this is the lowest score). This article, although probably not one of my best, was a key to opening a new and progressive personal (and possibly collective) research line. It was solicited by the editors of the book and set out for the first time the notion of science as a relationship between research spaces and research fields. This notion informed writing the proposal for EURECIA, a research project that got funded by the European Research Council (ERC) at slightly less than 500K euro; it provided the intellectual foundation for a later Research Policy paper; and is already being used by other colleagues.
  2. A co-authored article of mine (I am the lead author) in Science was rated 2* (and 1* by the external to the unit assessor). I do believe that anything published by Science warrants higher score than that. This particular article was unique in that it managed to put forward a coherent social science argument (that European level policy is moving from ‘Science in Europe’ to ‘European Science’ mode) in about 3,000 words. The article was reviewed by four peers and went through two rounds of review. Apart from that, it is clearly a part of the budding research line mentioned above.

What is the problem?

I believe that distortions in the ratings originate in the fact that what is being assessed are discrete outputs (published articles, chapters and others); these are assessed as independent events rather than as part of the continuous research lines that we as researchers build and the way in which these research lines intersect with the development of the research field (or fields). Concentrating the assessment on discrete outputs has different problematic implications but here I’d like to mention the following:

  • It can lead to inaccuracies that are far too important in terms of individual’s careers to be allowed;
  • It works against starting new individual research lines and taking risk in research.
  • This in turn reduces epistemic variety in research fields and works against the possibility for intellectual innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *