Author Archives: simonburgess

Interpreting the numbers of school admissions – is the first preference offers rate too high?

Figures were released yesterday showing how many families received an offer from their first preference school. The headline number was 84% for secondary schools: that is, 84% of families were offered a place in the school that they put top of their application form. The coverage yesterday mostly focussed on the overall supply of school places. Maybe there is also a sense that “only” 84% got their first choice.

Acknowledging the disappointment or worse that individual families will feel at missing out, how should interpret that 84%? It’s obviously affected by two things, both the choices made, and the number of places available. ‘Demand and supply’ if you like – and that’s not inappropriate here as the school admissions algorithm is what acts as the market clearer.

Imagine a country a bit like England, but much simpler, more abstract. And also imagine that all that families care about in schools is their academic quality. In that country, as in England, 79% of schools are rated as Outstanding (26%) or Good (53%). If all schools are about the same size, then 79% of school places are in Outstanding or Good schools. Suppose that everyone can access at least one Outstanding or Good school, and that there everyone applies to one of those schools. If schools and people are spread around the country in a reasonably even and regular way, then 79% of them will get in and the remainder will be offered places in schools less than Good.

In that situation, 84% getting their first choice seems ok.

But what if everyone was a bit more ambitious in their choice, and everyone put an Outstanding school as their top choice? Why not?

Only 26% would get their first choice. Suddenly, in this abstract, regular country, 84% seems to suggest a lot of unambitious choices, ones that are likely to succeed rather than a ‘true’ first choice.

Of course, our country is not simple and regular like that. There is geographical clustering of Outstanding schools, so some families will face a much higher chance of getting into an Outstanding school. In other places, there may be no Outstanding schools, so even families looking for the very best school academically can only out a Good school top, and they too have a pretty high chance of getting that. So taking that into account, even “everyone chooses an Outstanding school” would result in more than 26% getting their first preference, but not as high as 84%.

So I think we can say that the 84% might be too high for comfort. Maybe it reflects a school admissions system that favours those who can buy access to the best schools through their home’s location. The key role of proximity in resolving who gets into the popular schools keeps many of the Outstanding schools out of the choice set of poorer families. We need to change this, as I discussed here.

Part of the solution to the teacher shortage?

What can we do about teacher recruitment? Yesterday’s National Audit Office Report has provided some useful clarity and confirmed that the Department for Education has missed its recruitment targets for four years now. A number of commentators have highlighted the shortfall of new teachers through the year, though the DfE has argued to the contrary. The range of views on the shortfall suggests the gap may be between zero and 18% (quite a range!)

A number of suggestions for policy responses have been made: pay all teachers more; pay some teachers more; review the teacher workload; reform the Ofsted process to make it less stressful. These may well be great ideas; but they are all expensive or very expensive, and slow or very slow to implement.

Here’s another idea, which could contribute to raising recruitment, though undoubtedly would not wholly solve the problem. This is more or less costless (apart from the cost of the extra teachers obviously) and easy to implement.

We have imposed a major but pointless restriction on the pool of potential teachers – we can just drop that restriction. We can, effectively, stop shooting ourselves in the foot.

The point is this: there is a general view threading through the teacher recruitment system that applicants with better degrees will make better teachers. I’ll illustrate that in a moment. But all the statistical evidence we have on teacher effectiveness says that that is not true: a teacher’s ability to raise the attainment of her pupils is unrelated to her own academic qualifications.

There are a number of explicit points in the system in which the boundary between getting a II.1 degree and a II.2 degree is crucial. I would argue that these create a mindset on appropriate qualifications for good candidates that pervades the system much more widely. For example, in terms of bursaries for teacher training, these are only available for people holding a II.1 or better in some subjects. The official ‘Get into Teaching’ website makes this clear. This is not true in all subjects: for sciences, maths and languages, the applicant’s degree class makes no difference to the bursary. So I repeat that this proposal would have only an indirect effect on those subjects. Another example is Teach First, which requires a II.1 or better for its applicants.

So while the II.1 restriction certainly does not apply universally for all teacher recruitment, it is likely to have a much broader impact on the views of recruiters and selectors on what a good teacher looks like.

Over the last decade or so, economists have focussed a lot of research effort on teacher effectiveness. The research evidence shows clearly that teacher effectiveness is unrelated to the teacher’s own academic qualifications. Teachers who themselves got a First class or a II.1 degree are no more effective teachers than those who got II.2s. The NAO Report hints at this too.

The one study for England that measures this (our own) makes this point. The much more numerous research studies in the US show this (see my review here). Among researchers, it is an uncontroversial finding. Even researchers who set out to show that having a Master’s degree should help, end up finding it doesn’t (Ladd and Sorensen, 2015, reference in here).

So the explicit or implicit restriction of teacher recruitment to those getting at least II.1s is pointless – it does not achieve its aim of raising average effectiveness. But it is harmful, it restricts the hiring pool significantly. At the risk of repetition: this is not about a quantity – quality trade-off in hiring – by relaxing this constraint we can seek more quantity at no cost in quality.

How big a difference might this make? It’s very hard to say at a high level of generality. The NAO commented that uncoordinated data sources make this a difficult area to track.

Reaching for a very small envelope, turning it over and starting to scribble: the percentage increase in recruits is equal to the percentage increase in the hiring pool times the relative likelihood of applying from the new group times the relative likelihood of someone in the new pool being acceptable. If we assume the current hiring pool is all with a II.1 or better, and we are proposing to expand this to include people with II.2s as well, using HESA data (chart 9) that’s an increase of 35%. Of course there are other routes in as well as from the flow of new graduates, as well as people from outside the UK, so let’s call it 30%.

We know that students with II.2 degrees have lower rates of return than those with higher classes, so presumably face worse alternative job opportunities. They might therefore be more likely to apply to teaching posts. But to be cautious, and underestimate the likely effect, I assume a relative application rate of 1. For relative acceptability, I need to account for the fact that in some subjects, applicants with II.2s are already accepted. I will also assume that the marginal II.2 candidate is less acceptable than a II.1 candidate. Overall, let’s try a relative acceptability rate of 0.4.

Multiplying these numbers together (0.3 x 1.0 x 0.4) yields a potential increase in recruits of 12%. The previous paragraphs make clear how very rough an estimate that is, but you can choose your own numbers to try. So it might be that this proposal might increase recruits by around 10%, and recruits of the same expected effectiveness at teaching.

Why not remove the II.2 restriction? And work to counteract the view that teachers with II.2s will be ineffective teachers. It won’t reduce average teacher effectiveness and it will increase the applicant pool.

Just to finish, it is worth re-emphasising that while teacher numbers are important, much more important is the average effectiveness of teachers. All the evidence shows that being taught by an effective teacher relative to an ineffective teacher has a dramatic impact on attainment. To illustrate: having all effective teachers relative to all ineffective teachers for just one GCSE year wipes out half of the poverty gap in attainment. Getting more teachers into our classrooms matters, but understanding how to raise average effectiveness is the big prize.

Teacher performance pay without performance pay schemes

Leave a Reply

Author: Simon Burgess

Teacher performance pay without performance pay schemes

Amid the macroeconomic gloom, the Autumn Statement contained a line about teachers’ pay. The School Teachers’ Review Body recommends “much greater freedom for individual schools to set pay in line with performance”. Consultations and proposals are expected in the near future.

But simply giving schools the freedom to do this may be a rather forlorn hope of anything much happening. It is not clear that there is a substantial demand from schools for performance-related pay (PRP) schemes that has only been thwarted by bureaucratic restrictions. It is hard to see high-powered, tough-minded PRP schemes being introduced by more than a handful of schools, not least because we have not seen large scale deviations from national pay bargaining in academies in England despite their new freedoms to do so.

If that path seems unpromising, there are other ways of facilitating a greater reflection of performance in pay, discussed shortly. But first – is PRP for teachers a good idea in the first place? Does it raise pupil attainment? What are the ‘side effects’?

This is a question that economists have produced a good deal of research on. And to summarise a lot of diverse work briefly, the international evidence is mixed. Those on both sides of the argument can point to high quality studies by leading researchers that find substantial positive effects, or no effects. In both cases, interestingly, there appeared to be little evidence of gaming or other unwanted effects of the incentives.

There is little evidence specifically for England. Our own research found a substantial positive effect of the introduction of a PRP scheme, but given the varied results found elsewhere it would seem unwise to place too much weight on this one study. The underlying performance pay scheme was poorly designed but nevertheless had a positive effect on the progress of pupils taught by eligible teachers relative to ineligible ones.

And design is key. There are many reasons why a simple high-powered incentive pay scheme might be detrimental to pupil progress, which we have discussed here and here. These include the fact that teachers have multiple tasks to do, the problems of measuring the outcomes of some of those tasks, the complex mixture of team and individual contributions, and the potential impacts on implicit motivation. The overall message is that incentives work, but schemes have to be very carefully designed to achieve what the schemes’ proponents truly intend.

There is another way to facilitate a closer link between pay and performance that does not require any school to introduce a performance pay scheme.

Published performance information in a labour market can change the way that the market rewards that performance. The critical features are first that the organisation’s own output depends in an important way on this performance characteristic of an individual; second that the organisation has some discretion in the pay offers it can make to new hires; and thirdly that the performance information is public – is available and verifiable outside the current employer. In this case, the pay structure of the market will reflect the performance rankings: high-performing individuals will be paid more.

In teaching, the first two of these three conditions are met: teacher quality matters hugely for schools, and schools have some discretion over pay. Now, suppose we had a simple, useful and universal measure of each teacher’s performance in raising the attainment of her pupils (obviously we don’t at the moment; I come back to this below), and that this was published nationally, primarily for the attention of Headteachers. The idea is that Headteachers trying to improve the attainment of their pupils would be on the look-out for high performing teachers when they had a vacancy to fill. Armed with this performance information, they might try offering a higher wage (or something else – it doesn’t have to be money) to tempt them to join their own school. Equally, the teacher’s current school may respond by raising the offer there. Over time, this process will tend to raise the relative pay of high-performing teachers relative to low-performing ones, whom no-one is trying to bid for.

This idea should not be a strange one. A number of professions have open measures of performance. Just today it is reported that performance measures for more surgeons will be made public in the summer of 2013; this is already true for heart surgeons.

It is well-known that PRP does two things: it motivates and it attracts. The outcome for pay described here will tend to make teaching more attractive to people who are excellent teachers and less attractive to those who aren’t.

There are a number of problems with this idea, though perhaps less than might appear at first glance. First, it could be argued that a performance measure derived from teaching in one school is not relevant to teaching in another school. Obviously each child and each school is unique, but it seems very unlikely that there is no commonality of context between one school and the next. Observation suggests this: teachers moving from one school to another are not counted as having zero experience, and Headteachers are often appointed from outside a school.

Second, there might be a fear that the teacher labour market would become chaotic, with everyone churning around from school to school in search of a quick gain. We have to recognise that there is substantial turnover of teachers now < http://www.bristol.ac.uk/cmpo/publications/papers/2012/wp294.pdf >. But the main point is that it does not require much actual movement to make the market work. Schools can make counter offers to try to retain their star teachers and the end result is the same – higher salaries for high-performing teachers.

Third, any measure would be noisy, partial and imperfect. Of course, all such measures are. Whether a measure is perfect is not really the question, the question is how noisy and imperfect is it, and whether it contains enough information to be useful. One advantage in this case is that the consumers of these performance indicators are the people best able to judge their usefulness and their shortcomings: Headteachers. If such metrics are not useful, Headteachers will simply ignore them; there would be no compulsion to use them. Even in labour markets with some of the most detailed and finely measured performance indicators (for example, football or baseball) there are many moves between employers that do not work out. It is worth re-emphasising that these performance measures are bound to be imperfect and incomplete, but broad measures of performance may nevertheless be very useful.

There are useful parallels to be drawn from another profession: academics. For academics, the combination of very detailed and public performance information and a context where research performance matters a great deal to universities seems to have had a substantial effect on academics’ pay.

The Research Assessment Exercise (RAE) and more recently the Research Excellence Framework (REF) have made a strong research performance very important to a university’s standing and its income. But the critical factor for academics is that an individual’s research performance is public knowledge, through very detailed recording of the impact of their research papers. Departments and universities aiming to improve their ranking seek out star researchers and attempt to bid them away with higher salaries (plus other things such as research facilities). These offers may well be matched by their current employer, but the end result is that salaries now seem to be much more closely correlated with research productivity than before the RAE/REF (I say “seem” as there does not appear to be any evidence on this, so this is casual empiricism). This is a lot of what drives many young researchers to put in very long work hours: having a paper published in a top scientific journal early in a career has a substantial lifetime payoff even in a world with few or low-powered incentive schemes. If you check out academics’ websites you will invariably see their academic output prominently displayed.

Again, an important feature is that these indices of research output are largely consumed by other academics who are aware of their strengths and weaknesses. So although they are far from perfect, they are used by precisely the people best placed to calibrate their usefulness appropriately.

If we are to go down a path of tying teacher pay more closely to performance, and yet respect the rights of increasingly autonomous schools to determine their own pay systems, then this might be an option to consider. The challenge is to devise a measure that is simple, useful and universal. It would measure the progress made by the pupils that teachers taught, it would have to deal with normal variations in performance by averaging over a number of classes and a few years, and be on a common metric. This is not straightforward, but if it gave rise to a robust broad measure of performance it could form a part of performance pay for teachers, and performance management more broadly. It could also have substantial effects on the pay of high-performing teachers.

Changing the exam system

Leave a Reply

Author: Simon Burgess

It will take a while to fully understand the scope of the proposed changes to the exam system in England. The exams are obviously very important for students, but also for schools and teachers. Here are some initial thoughts.

A key pledge is to ‘restore rigour’: exams are going to be more rigorous (of course, England has always had at least one exam noted for its rigour. More rigorous might mean different things: it might simply mean more failures or more broadly, greater differentiation between students.

It seems inevitable that one implication of this (in the short run at least) is greater educational inequality. One outcome of making an exam more rigorous is that more students will fail it. It seems hard to see how that could not be true. This is not fatalism; it may well be that over time, teaching brings more students above the line, but there will be a number of cohorts first with more failures. A tougher exam will not be credible if more people pass it. This coupled with a hint of no retakes seems particularly harsh. ‘No second chances’ is an uncompromising view of how to run an education system, and perhaps at odds with much of the rest of public life.

An alternative sense of ‘more rigorous’ is that the new exam will allow for greater differentiation between students. This is as much to do with the marking and reporting as the nature of the exam itself. Rather than proliferating letters and multiple stars, one way to do this that has been mooted is to simply report percentages. More or less differentiation – separating or pooling in economics terms – has a number of implications. On the plus side, it may enable more efficient matching of workers and jobs, raising average productivity; it is likely to incentivise greater effort at school as students currently more-or-less assured of an A* grade push on to shoot for 90%+. On the down side, there will be greater inequality of attainment and consequently greater inequality of earnings. It is not obvious how sizeable these pros and cons are, but there is no guarantee that the outcome will be a happy one.

We have a school system described by the Department for Education as increasingly autonomous. Diversity of supply is the maxim, and there is reduced oversight of what schools actually do. But what brings the system together is the common exam system. Autonomous schools can do their own thing with less outside ‘interference’ … as long as its students achieve good grades. Schools’ freedom to operate is constrained by close public scrutiny of their performance in common high stakes exams. If there is less monitoring of the process of education, then regulation of the outcome (attainment measures) becomes even more important.

School performance tables, the ‘league tables’, and Ofsted are the central performance components of the regulatory framework. While Ofsted is important, it is largely driven by the school performance tables reporting exam scores. So getting the exams right is very important. Schools will react to changes in the regulatory environment as they will react to any incentives (implicit or explicit) embedded in the system. So in addition to its implications for students, the exam system will very strongly influence how schools operate. Changing the exam system is a big opportunity to get things right or to get things wrong.

One possibility would be to use the reform to move away from the current sharp threshold recording attainment, the fraction of students in a school getting at least 5 good passes. If the students are given percentage scores not letter grades (see below), the metric for the school could simply be the mean score. There is a lot to be said for not having a sharp threshold, or for locating it at the point that policy makers truly do want schools to focus. But this could be done now under the present system – GCSEs have continuous scores that are then converted to letter grades (controversially this year). Switching from a threshold to a continuous average is a separate issue to the rest of the proposed changes, and is not dependent on them.

The proposal to have one exam board per subject seems like a good idea. While monopolies may well raise costs, the nature of the competition between the exam boards was dysfunctional. And the idea of ‘competition to be a monopoly’ is well established (train companies, power transmission and other infrastructure companies) and reasonably well understood.

Focussing now on the students, what is the best form of tests for students? The proposals favour a return to a single exam for each subject at the end of the course, and the end of coursework, retakes and modules. Doing well in such exams will require a particular type of skill, so these are what students will try to master and what schools will try to teach.

What sort of skills should education foster? This is not a straightforward question. One plausible answer is that education should test and certify the levels of competence that students have achieved in the skills that employers want.

No one knows what skills business will need in ten years time, and so we can only speculate as to what we should grade students on. But given the ubiquity of the web, it seems very unlikely that ‘remembering large amounts of information and writing it down quickly’ will figure high up on the list. It is hard to see this as a highly prized capability. Some other much-lauded education systems (keen on rigour) test abilities more likely to be of value. A great deal of research now focusses on different cognitive and non-cognitive skills, how they are built up and how they relate to inequality. I don’t know if there is any evidence whether the ability to remember large amounts of information is more or less socially graded than broader ranges of skills.

One argument for one-shot, high-stakes, closed-book exams is that anything more open is susceptible to parental help, and thus more likely to favour the middle class. But there are other ways to make sure that the summative assessment is solely based on the student’s work alone. Furthermore, parents support their children’s education in so many ways (conversation; providing books and computers; a quiet place to study; role models; trips) and this is unlikely to be the most important. It is worth emphasising that parents helping their children to learn is a good thing, and should be encouraged where lacking, not banned where present.

This preliminary set of points suggests that the proposed reforms of assessment will not promote skills likely to be valued in the labour market. They are likely to lead to more students failing and leaving school with nothing, and/or delaying their one-shot take, or just dropping out. This spells greater inequality in educational attainment and so in life chances.

Novices and Veterans: What new data tells us about teacher turnover and school deprivation

Leave a Reply

Rebecca Allen and Simon Burgess

A new school year has just started, new students have just arrived – what about new teachers? Are there a lot of new faces in the staffrooms? One of the stories frequently told about schools serving poor communities is that they suffer from very high and damaging staff turnover. Few teachers stay a long time, and, relative to schools in the affluent suburbs, there is a constant ‘churn’ of staff. This lack of experienced teachers reduces the chances of new teachers learning the trade on the job, and means that both students and school leaders are forever coping with new names, personalities and teaching styles.

Is this true or urban myth? For the first time, we can start to answer this question systematically, moving beyond a collection of local anecdotes. New data collected from all schools about their workforce has the potential to hugely improve our understanding of teachers and teacher labour markets.

We use this data to analyse the length of time that teachers stay in schools, i.e. their job tenure in a particular school. This is the form in which the problem faces headteachers: how many novices do they have, how many veterans and so on. While there is a good deal of research on teachers leaving the profession as a whole, the issue here is how long teachers stay in a specific school.

Like many urban myths, there is a hint of truth in this view that deprived schools experience greater teacher turnover, but not much.

Overall, it seems that teaching is not a low turnover profession. In a typical school, about a fifth of teachers have been there for less than 2 years, and over half of the teachers have been in that school for less than 5 years. On the other hand, nearly a fifth have been there at least ten years, and in fact over 5 percent have stayed over 20 years. Of course, teachers vary and we compare different groups in Table 1. There is very little difference in tenure between female and male teachers, nor between primary and secondary schools. More detail still is available here

Table 1: Years in current school (%)

	All teachers	Male	Female	Primary	Secondary

0-2 years	19.4	20.4	19.2	19.5	19.4
2-5 years	36.8	37.0	36.7	36.6	37.0
5-10 years	24.8	23.5	25.2	24.8	24.8
10 years or more	18.9	19.1	18.9	19.1	18.8
Number	343,547	80,704	262,843	172,137	171,410

Note: classroom teachers only, excluding assistant, deputy and headteachers

Averaging over all teachers, the mean time in a school is 6.7 years. Here we need to introduce a technical issue. The data come from teachers in schools, so this is job tenure so far, elapsed tenure. Obviously, a teacher who has just arrived at a school may go on to spend the rest of her career there. Under certain circumstances, the statistical model implies that the average completed tenure is double the average elapsed tenure.

That is the overall picture, what of the differences between disadvantaged and affluent neighbourhoods? We find systematic and statistically significant differences in turnover: schools with many poor pupils do have more short-tenure teachers and fewer experienced teachers. However, on average the differences are small: 18% of teachers in the least disadvantaged schools have tenure of 0-2 years, compared to 22% in the most disadvantaged. At the other end of the scale, 20% of teachers in schools in the most affluent neighbourhoods have tenure of at least 10 years, whereas the figure in the most deprived neighbourhoods is 17%.

Figure 1: Distribution of teacher tenure by school deprivation

Figure 1 gives a flavour of the results. It shows the 10^th percentile of tenure in school in days (the lowest line in the figure), across the full range of communities in England, from the richest 2% in the far left-hand side point to the poorest 2% in the far right-hand point. The 10^th percentile comes out at somewhat less than two years, but more interestingly, is flat. The number barely changes across the entire distribution. There is a very slight slope in the 25^th percentile and in the median values, reinforcing the point that there are systematic differences but that they are quantitatively small. There is a more noticeable difference in the 75^th percentile: in schools serving poor communities, there are slightly fewer experienced teachers.

We also use the richness of the data to decompose the relationship between turnover and poverty. We show that part can be accounted for by pupil characteristics, perhaps because students in schools in more deprived areas are harder to teach. Part also is accounted for by differences in the local teacher labour market around each school. Controlling for school, student and teacher labour market factors reduces the association between school poverty and turnover, but does not eliminate it.

The remaining association between teacher turnover and poverty is largely accounted for by teacher characteristics, with the poorer schools hiring much younger teachers on average. How should we understand this?

We interpret this as either deriving from the preferences of young teachers, or as reflecting the low market attractiveness of disadvantaged schools. There are a number of possibilities. First, it could be that this is a desired career path for young teachers. New teachers may look for their first jobs near to where they trained, which implies predominantly urban and therefore on average deprived, schools. Alternatively it could be a desired career path deriving from younger teachers possibly having more idealistic preferences, and welcoming the opportunity to work in deprived schools. Under these interpretations, the allocation reflects the desire of younger teachers to work in deprived schools, and the higher turnover in such schools derives from this.

The alternative interpretation is a matching story in which the more effective teachers sort on average into the more affluent schools, and the disproportionate number of inexperienced teachers in the poorer urban schools reflects the realities of the market that these schools face. Distinguishing between these interpretations is a task for future work; it will need further sweeps of the data and possibly attitudinal data from teachers as well.

It is now widely acknowledged that teacher effectiveness is the single most important factor in raising attainment. Attainment gaps arise in part from students’ exposure to teachers of differing effectiveness. The process by which different teachers end up at different schools in front of different children is little understood. We intend to spend the next few years utilising this new data to address this research programme.

Reforming teacher training

3 Replies

Rebecca Allen and Simon Burgess

This week the House of Commons Education Select Committee published its report on the teaching profession. This post gives the main points of our evidence to the Committee.

We think of Initial Teaching Training (ITT) as encompassing both the initial training and the probationary year. How should this be set up to produce the most effective teachers who will have the greatest impact on pupil progress? ITT plays two roles for the profession – training and selection with the emphasis typically placed on the former. Both are important and neither should be neglected, but we argue that the evidence suggests that if anything, selection is the more important, and this is our focus here. An important role for selection is completely standard for any professional accreditation system in either public or private sectors.

The key argument is this: the sharpest selection should be made at the point when the evidence on ability is strongest. The final decision on who can become a teacher should be made when we have accumulated enough evidence on the candidate’s teaching effectiveness. Where is this point in teaching? The two central relevant facts are that variations in teacher effects on pupil progress are very substantial, and that the future effectiveness of a potential teacher is hard to judge from their own academic record.

We believe that the current operation of selection in ITT (tight at the beginning, negligible thereafter) is the wrong way round. Instead, we should let a broader group try out to be teachers, but enforce a much stricter probation policy based around measures of teacher effectiveness in facilitating pupil progress. Full certification and an open-ended first job would only be granted once performance data showed a teacher to be effective. The expectation would be that only the most effective teachers would make it through to full certification.

Selection into ITT is about gaining a place on a course. The difficulty faced in identifying people likely to be good teachers is very relevant here. It is very hard to tell who will be a good teacher and therefore a high degree of agnosticism would be appropriate when faced with applicants. This is certainly true for selection based on objective criteria from the applicants’ own academic records. We know that these are unrelated to teaching ability, and so should be irrelevant in selection into ITT. Beyond that, even if selectors are highly skilled at spotting potential, and it is not clear that they are, it is impractical to ask each applicant to teach a practice lesson. Therefore, selection into ITT should be very broad, with a relatively low academic entry requirement. This of course is not the situation now, nor the direction of travel of current policy. The tightening of academic entry requirements into teaching is not helpful: it will restrict the quantity of recruits and have no impact at all on average teaching effectiveness.

Graduation from ITT should also be tough. Given that much of an ITT course is now school-based, time spent in the classroom will form an important part of the assessment. Arguably the classroom experience is the key part of the course. However, in such a short space of time it will not generate sufficient data for a robust and objective view of the trainee’s effectiveness. It will nevertheless allow the trainee to discover whether teaching is for them.

Once in a job in a school, the progression to being a qualified teacher should be very different to the typical experience now. The key decision on final certification should be made after a probation period of say three years and ideally, the probation should involve classes of varying ability and year group. The period probably cannot be less, though the appropriate length of the probation would need to be analysed properly, depending on the statistical reliability of any pre-hire indicators, school-based performance data, and the cost of being wrong. This is the point when enough data is available to make a reliable judgement on the effectiveness of the teacher. There should be an expectation that not all will make it through to final certification, and indeed only the most effective should be retained. The key judgement should be a minimum threshold of progress that the probationer’s pupils make. Obviously, the measurement of that progress and the parameters of the threshold require a great deal of careful work. Like any statistical data, estimates of teacher effectiveness will never be perfect, and a good deal of evidence over a number of years will be necessary to reach a decision, but this is clearly necessary to raise the average effectiveness of the teaching profession in England.

Another innovative route into teaching is through Teach First. In some ways this is a positive development, as it allows a lot of people to try out teaching and also gives the schools which employ them an ‘out’ at the end of the two years. On the other hand, it restricts entrants based on their academic background.

It is important to see the teacher labour market as a whole, and to see how the different stages of a teacher career fit together. It seems to be very hard to fire ineffective teachers. While the regulations on this have recently changed, generating a culture that encourages headteachers to take a more proactive stance seems harder. While this may change, it may be that the best way to reduce the problem of low-performing teachers is to make it very difficult for ineffective teachers to get into the profession in the first place.

These changes would make starting out on a teaching career much more risky financially. In order to maintain the same average lifetime expected income from the profession, the pay rate of those making it through to final full certification will need to be higher. And the lower is the chance of making it through, the higher is the full professional pay.

In summary, we think that the evidence shows that the selection aspect of ITT is completely the wrong way round. Selection is tight to get into ITT in the first place, but once in, progression to full certification is normal and expected. The process needs to be more appropriately agnostic about likely teaching ability in the first place. It should also allow a broader group of people to try out teaching, but have a much tougher probation regime before trainees be given final certification. It makes much more sense to make final decisions later once more evidence on effectiveness has accrued.

Who fails wins? The impact of failing an Ofsted Inspection

2 Replies

Rebecca Allen and Simon Burgess

What is the best way to deal with under-performing schools? This is a key policy concern for an education system. There clearly has to be a mechanism for identifying such schools. But what should then be done with schools which are highlighted as failing their pupils? There are important trade-offs to be considered: rapid intervention may be an over-reaction to a freak year of poor performance, but a more measured approach may condemn many cohorts of students to under-achieve.

This is the issue that Ofsted tackles. Its inspection system identifies failing schools and supervises their recovery. How effective is this? Is it even positive, or does labelling a school as failing push it to ever lower outcomes for its students?

It’s not clear what to expect. Ofsted inspections are often dreaded, and a fail judgement seen as being disastrous. It has been argued it triggers a ‘spiral of decline’, with teachers and pupils deserting the school, leading to further falls in performance. But it might also be a fresh start, with renewed focus on teaching and learning, leading to an improvement in exam scores. Equally, we might expect nothing much to happen: after all, the policy ‘treatment’ for those schools given a Notice to Improve is very light touch. It is neither strongly supportive (typically no or few extra resources) nor strongly punitive or directive (schools face no sanctions nor restrictions on their actions). Schools are instructed to focus intensively on pupil performance, and are told to expect a further inspection within a year. In addition – and possibly the most important factor – the judgement that the school is failing is public one, usually widely reported in the local press.

Our research shows that the Ofsted inspection system works. Schools that just failed their Ofsted significantly improved their performance over the next few years, relative to schools that just passed. The impact is statistically significant and sizeable. In terms of the internationally comparable metric of effect sizes, our main results suggest an improvement of around 10% of a standard deviation of pupil scores. This is a big effect, with a magnitude similar to a number of large-scale education interventions. Translated into an individual pupil’s GCSE grades, this amounts to a one grade improvement (for example, B to A) in one or two GCSEs. From the school’s perspective, the gain is an extra five percentage points in the proportion of pupils gaining five or more GCSEs at grades A*-C.

Our findings suggest that the turn-around arises from proper improvements in teaching and learning, not gaming to boost exam performance through switching to easier courses. First, the impact is significantly higher in the second year post visit than the first, and remains level into the third and fourth year after the inspection. So it is not simply a quick fix to satisfy the inspectors when they return twelve months later. Second, we find a stronger effect on the school’s average GCSE score than on the headline measure of the percentage of students gaining at least 5 good passes; if the schools’ responses were aimed at cosmetic improvement, we would expect the reverse. We also find similar positive effects on maths results and on English results.

It could be argued that these results are implausibly large given that the ‘treatment’ is so light touch and schools are given no new resources to improve their performance. The instruction to the school to improve its performance may empower headteachers and governors to take a tougher and more proactive line about school and teacher performance. This may not be a minor channel for improvement. Behavioural economics has provided a good deal of evidence on the importance of norms: the school management learning that what they might have considered satisfactory performance is unacceptable may have a major effect. The second part of the treatment derives from the fact that the judgement is a public statement and so provides a degree of public shame for the school leadership. Ofsted fail judgements are widely reported in local press and this is usually not treated as a trivial or ignorable announcement about the school. It seems plausible that this too will be a major spur to action for the school.

Where do we go from here? Our results suggest Ofsted’s identification of just-failing schools and the use of Notice to Improve measures is an effective policy, triggering the turn-around of these schools. We need to be clear that our research does not address the question of what to do about schools that comprehensively fail their Ofsted inspection. Possibly this light-touch approach can be extended. Since leaving the Headship of Mossbourne school to become the new Director of Ofsted, Sir Michael Wilshaw has argued that schools just above the fail grade should also be tackled: that ‘satisfactory’ performance is in fact unsatisfactory. Such interventions in ‘coasting’ or ‘just-ok’ schools are very likely to be of the same form as Notice to Improve. Our results suggest that this is potentially a fruitful development with some hope of significant returns.

This research is available on the CMPO website and the IoE website.

Why the new school league tables are much better … but could be better still

Leave a Reply

Rebecca Allen (IOE) and Simon Burgess (CMPO)

Tomorrow the new school league tables are published, with the usual blitz of interest in the rise and fall of individual schools. The arguments for and against the publication of these tables are now so familiar as to excite little interest.

But this year there is a significant change in the content of the tables. For the first time, GCSE results for each school will be reported for groups of pupils within the school, groups defined by their Keystage 2 (KS2) scores. Specifically, for each school the tables will report the percentage of pupils attaining at least 5 A* – C grades (including English and maths) separately for low-attaining pupils, high attaining pupils and a middle group. This change has potentially far-reaching implications, which we describe below.

This is a change for the better, one that we have proposed and supported elsewhere. Why? We believe that in order to support parents choosing a school, league tables need to be functional, relevant and comprehensible. The last of these is straightforward (though not all league table measures in the past have been comprehensible: Contextualised Value-Added (CVA) being the perfect example). ‘Relevant’ means that a measure has some relevance to the family’s specific child. A simple school average, such as the standard whole-cohort % 5 A* to C, is not very informative about how one specific pupil is likely to get on there. By ‘functional’ we mean a measure that does actually help a family to predict the likely GCSE attainment of their child in different schools. If a measure is not functional it should not be published at all.

The new group-specific component is comprehensible and is more relevant than the whole-cohort %5 A* to C measure. In our analysis of functionality, we show that it is as good as the standard measure, and much better than CVA.

It also addresses in a very straightforward way the critique of the standard league tables that they simply reflect the ability of the intake into schools, and not the effectiveness of the school. By reporting the attainment of specific groups of students of given ability, this measure automatically corrects for prior attainment, and in a very transparent way. This is therefore much more informative to parents about the likely outcome for their own children than a simple average. This of course is what value-added measures are meant to do, but they have never really become popular, and as we show they are not very functional.

However, the details of the new measure now published are problematic in one way. The choice of groups is important. We defined groups by quite narrow ten percentile bands, the low attaining group lying between the 20^th and 30^th percentiles in the KS2 distribution, the high attaining group between the 70^th and 80^th percentiles, and the middle group between the 45^th to 55^th percentiles. While clearly there is still variation in student ability within each band, it is second order and the main differences between schools in performance for any group will come from variation in schools’ teaching effectiveness.

However, the DfE has chosen much broader bands, and have defined the groups so that they cover the entire pupil population: the low attaining group are students below the expected level (Level 4) in the KS2 tests; the middle attaining are those at the expected level, and the high attaining group comprises students above the expected level.

This has one significant disadvantage, set out in detail by Rebecca Allen here. The middle group contains around 45% of all pupils, and so there is very significant variation in average ability within that group across schools. This in turn means that differences in league table performance between schools will reflect differences in intake as well as effectiveness, even within the group, thus partly undermining the aim of group-specific reports.

The chart below illustrates this for the middle attainment group (see here for more details). Each of the three thousand or so tiny blue dots shows the capped GCSE attainment for a group of mid-attaining pupils (on the DfE’s measure of achieving at the expected level at KS2) against the average KS2 score (i.e. prior attainment) of pupils at the school. The red dots plot the same relationship for our narrow group of middle attainers (the 45^th to the 55^th percentile). The chart shows very clearly that the performance among our narrow band is essentially unrelated to prior attainment, but the DfE measure for the very broad group does still favour schools with higher prior ability pupils.

We can speculate as to why the DfE chose to have much broader groups. There may be statistical reasons, pragmatic reasons or what can be termed “look and feel” reasons. Using narrow KS2 bands will correctly identify the effectiveness of the school, but will almost always be averaging over a small number of students. So the estimates will tend to be “noisy”, and induce more variation from year to year than averaging over bigger groups. The trade-off here is then between a noisy measure of something very useful against a more stable measure of something less useful. Our original measure was intended to balance these, the DfE have gone all the way to the latter.

A pragmatic reason is that some schools may not have any pupils in a particular narrow percentile band of the KS2 distribution. The narrower the band the more likely this is to be true. This would mean either null entries in the league tables, which might be confusing, or some complex statistical imputation procedure, which might be more confusing. The broad groups that cover the entire pupil population are likely to have very few null entries. Finally, the broad groups feel more ‘inclusive’, they report the performance of all of a school’s students. This is a red herring – the point of the tables is to inform parents in choosing a school, not to generate warm glows.

The new measures hold out the promise of improvements in two areas: choices by parents and behaviour by schools. Parents will have better information on the likely academic attainment of their child in a range of schools. Second, parents will be able to see more directly whether school choice actually matters a great deal for them: whether there are worthwhile differences in attainment within the ability group of their child.

The key point for schools is that performance measures have consequences for behaviour. If this new measure is widely used, it will give schools more of an incentive to focus across the ability distribution. It is still the %5 A* – C measure that is the focus of attention for each group, but now schools will have to pay attention to improving this metric for high and low ability groups as well as simply the marginal children with the highest chance of getting that crucial fifth C grade.

If one believes that gaming and focussing of resources within schools is a very big deal (and there is little quantitative evidence either way) then the new measures could have a major impact on such behaviour. Even if such resource focussing is second order, performance measures send signals on what is valued. These new league table measures will explicitly draw widespread media and public attention to the performance of low- and high-ability children in every school in England.

“All we want is a good local school”

2 Replies

Rebecca Allen and Simon Burgess

Two articles in the Times Education Supplement (TES) last Friday nicely illustrate the debate on school choice and school competition.

The first reports results from the British Social Attitudes Survey (BSAS), citing research by Sonia Exley, at the LSE, showing that most respondents thought that school choice was not a priority.

A familiar refrain in the school choice debate is that “all we want is a good local school”. There should be little doubt that this is indeed what most parents do want. We have used data from the Millennium Cohort Study to estimate the relative weights that parents place on the characteristics of primary schools. Unsurprisingly, school academic quality is positively valued, and distance between home and school is highly negatively valued. This makes a lot of sense: many parents have to make this journey four times a day. So, yes, people, do want a good local school.

But where does this take us? It is often said to imply that school choice is a distraction, an irrelevance. There is a side issue of whether choice is a good thing per se, as opposed to being functionally good. This is the thrust of the point above, that choice itself was not a priority, though the study also reports that 68% agreed that parents should have a basic right to choose their child’s school. Choice per se may become valuable once contrasted with the alternative of no choice.

But the main issue should be whether using school choice is a better way to allocate children to schools than alternatives. One alternative is implicit in the statement – children should go to their local school. In fact this gets a lot of support in the survey: the TES reports that 85% of respondents in the BSAS believe parents “should send their children to their local school”.

This idea would work well if families were not permitted to move house after the school admissions rule was changed. It is surely obvious why. We know that parents care a great deal about the school their child goes to. If the school allocation rule was simply “you will attend your local school”, then parents who were able to would ensure that their local school was the one they wanted by moving house.

It is quite possible that this would in fact lead to no less social segregation in schools, and almost certainly greater social segregation by neighbourhoods. While we found the relationship between school quality and moving house to be weaker than many might expect, this would undoubtedly be stronger in a world where your residence determined your school. It also does not do away with the concern about having to actually exercise a choice – it simply transfers it to a choice of neighbourhood and school combined.

So neighbourhood-based schooling would be very unlikely to resolve the issues of social segregation and choice-angst associated with choice-based schooling. It would also hand each school a local monopoly and, in the case of poorer families at least, a captive audience with no escape.

This connects to the second TES article, a leader on school competition. As the article notes, “Few things exercise critics of education policy more than the spectre of increased competition in our school system.” The argument balances the “un-school”-like, unorganised, chaotic and generally messy nature of competition with the potential for this to improve outcomes for students.

In fact, there is some evidence on this trade-off and what the net result of competition is (the article is mostly about competition for 6^th form entrants and allegations of mis-information, but the available evidence is about compulsory schooling). While the international evidence is mixed, the UK evidence suggests that there is at the very best a weak and small positive effect of competition on student outcomes; a review is here. The interesting question is why competition doesn’t appear to do much. The answer appears to lie in market failures in the schools market. If these could be addressed, it may be that a competitive threat might do more to raise standards in poorly performing schools.

Much of the furore about school ‘choice’ or ‘competition’ is misplaced. It is not choice between schools per se, relative to other allocation rules, that causes the perceived unfairness. The focus for objections should be the way that places in over-subscribed schools are allocated. The proximity criterion – who lives closest gets in – is operated in almost every non-selective school. This directly relates the chances of getting in to the most popular schools to family income, damaging social mobility in a very clear way. If some or all places at an over-subscribed school were filled by a random ballot, then school choice would seem a very different beast.

Finally, the competition article talks of ruined lives: “If no authority oversees admissions, plots likely pupil numbers or configures special needs support, the results won’t just be missed targets or dicey operating margins, but ruined, real pupil lives.” It is also true that that poor communities trapped with low-performing schools ruins lives, that unaccountable and coasting schools also ruin lives. The debate is about how best to avoid ruined lives, not whether or not they should be ruined.

A Report of Two Halves

1 Reply

Simon Burgess

We published some research last Friday showing that students perform less well in their crucial GCSE exams in years when there is a major international football tournament taking place at the same time. For example, the FIFA World Cup in the summer of 2010, or the UEFA European Championship next summer, both overlap in part with the GCSE exam timetable.

With the draw for the groups in the European Championship taking place earlier that day, much of the comment naturally and sensibly focussed on the specific issue of the impact of next year’s tournament on exam scores. This is important: we estimate that the concurrence of the exams and saturation media coverage of the football reduces exam scores on average by around 0.12 standard deviations of pupil performance and by a lot more for some groups who reduce their effort a lot. These groups tend to be from poorer areas and predominantly (but by no means exclusively) male students. Since these groups are already lower-performing groups, this means that education gaps will widen. We think of this impact arising through a reduction in student effort, with that time being spent instead on watching the football tournament. The variation in impact arises because of differing tastes for football, arising in turn from cultural norms and idiosyncratic factors, and from the differential effectiveness of an hour of study on exam performance.

However, there is also a broader significance to the research: finding that effort matters matters.

Recent research by economists has broadened out from the previous focus on cognitive ability, and a great deal of work has investigated the role of non-cognitive factors in educational attainment. Non-cognitive factors can be identified with personality traits (see Heckman), and one of the ‘big 5’ personality traits is ‘conscientiousness’, with the related traits of self-control, accepting delayed gratification, and a strong work ethic. Conscientiousness has been shown to be an excellent predictor of educational attainment and course grades. These aspects of self control and ability to concentrate are clearly related to the broad notion of effort we are using here. Our results on the importance of effort strengthen this evidence by isolating the effect of decisions on effort and time allocation in addition to the general ability to concentrate and exert self-control.

There is a great deal of policy interest in England arising from recent studies of US Charter schools with what is called a “No Excuses” ethos. This includes the KIPP (Knowledge Is Power Program) network of schools and schools in the Harlem Children’s Zone. These schools all feature a long school day, a longer school year, very selective teacher recruitment, strong norms of behaviour, as well as other characteristics. Some of the profession’s very top researchers have produced evidence showing that such schools produce very powerful positive effects on student achievement. While this overall effect could be due to different aspects of the KIPP/HCZ ethos, part of it is very likely to be increased effort from the students. Our results complement this by showing the impact of just a change in effort, and that that can have very substantial effects.

This matters for a number of reasons. First, unlike genetic characteristics, cognitive ability or non-cognitive traits, effort is almost immediately changeable. Our results suggest that this could have a big effect. The fact that we find changes in student effort to be very potent in affecting test scores suggests that policy levers to raise effort through incentives or changing school ethos are worth considering seriously. Such interventions would be justified if the low effort resulted from market failures due to lack of information on the returns to schooling, or time-inconsistent discounting. Second, the importance of a manipulable factor such as effort for adolescents’ educational performance provides evidence of potentially high value policy interventions much later than “early years” policies. This is encouraging, offering some hope that low performing students’ trajectories in life can perhaps be effectively improved even after a difficult environment early in life.