Who Gets Heard? Permission to Appeal Decisions
Who Gets Heard? Permission to Appeal Decisions
Abstract and Keywords
This chapter examines the “permission to appeal” (PTA) process at the Supreme Court. Each year more than two hundred litigants seek permission to appeal from the Supreme Court. Around one-third of these applications are successful. This chapter tries to explain rates of success. The key factors are the importance of the case the litigants are appealing, and the number of judges the appellants have convinced in lower courts. This matches the court’s own description of the cases it selects (“cases that raise arguable points of law of general importance”). However, the chapter also finds that governmental actors are more likely to gain permission to appeal even when controlling for importance and the balance of judicial opinion in lower courts.
On Thursday, April 19, 2016, lawyers acting for a man known only as PJS appeared before the Supreme Court. They wanted the Supreme Court to agree to hear PJS’s case. In the language of the court, they sought permission to appeal (PTA).
Usually, PTA decisions arrive several months after appellants have lost their case in the High Court or the Court of Appeal. PJS’s case was different. PJS started litigation in January to stop a tabloid newspaper (the Sun on Sunday) from publishing details about his sex life. When those details (and PJS’s name) surfaced in American media, the Sun asked the Court of Appeal to lift the injunction on printing the story, on the grounds that the information was now effectively in the public domain. The Sun was successful: the Court of Appeal decided in its favor on April 18. The Sun’s success was fleeting. PJS’s lawyers obtained permission to appeal the next day,1 and the permission to appeal hearings became a full hearing on the merits of PJS’s appeal. One month later, the Supreme Court found in PJS’s favor, preventing the Sun from publishing its story.
This case was exceptional. Most applications for permission to appeal are not successful. Few indeed are heard within a week of the original judgment being handed down. Those that are normally involve life-or-death decisions. This case did not. Lord Mance recognized that “[s]ome may . . . question whether the case merits the weight of legal attention which it has received” ( UKSC 26, at 2).
Why then did the Supreme Court grant permission to appeal? One interpretation is that the court granted PTA because this case raised a “arguable point of law of general public importance.” This, according to the court, is the sole criterion used in deciding whether to grant PTA.2 In this case, this might mean that PJS’s case raised important points concerning the balance between the right of (p.56) the press to freedom of expression and the right of individuals and their children to be respected in their private lives.
Another interpretation is that the court granted PTA because the applicant PJS was a wealthy man who was able to use his wealth to secure better legal representation. This better legal representation may have had a direct effect on the court, because it was impressed by the legal talent assembled, or it may have had an indirect effect, through the better legal arguments that more expensive lawyers adduced. Ultimately, PJS’s success would have depended on resources not available to many other litigants apart from governmental or large corporate litigants.
It is not possible to say, for any particular case, what general factors led to permission to appeal being granted. It is, however, possible to test, across a large number of PTA applications, which factors are associated with success. The previous chapter showed how different political, organizational, and legal factors can be measured. Although I cannot test every factor that may be relevant to success in PTA applications (in particular, I cannot test the impact of litigants’ personal wealth), I am able to test a broad range of accounts of the PTA process. To test these accounts, it is first necessary to describe the process by which cases arrive at the Supreme Court.
The Outcome, and Its Prevalence
Almost all of the cases heard by the Supreme Court are appeals cases. The court does hear arguments in reference questions. These are cases where the court has been asked to decide a point of law relating to devolution. Although these cases are important, they make up a small proportion of the court’s caseload. In eight years, the court has answered reference questions in just ten cases—or 1.5 percent of its overall caseload.
The majority of appeals cases heard by the court are cases it has decided to hear. In England and Wales, and in Northern Ireland, the process is as follows. Decisions of the Court of Appeal, and the Northern Ireland Court of Appeal, may be appealed, but require permission. Permission may be granted by the Supreme Court or (more rarely) from the Court of Appeal itself. Cases from the High Court (or courts of equivalent rank) may be appealed, but these “leapfrog” appeals must meet additional requirements: they must either involve a point of statutory construction, or must involve a point of law in respect of which the High Court judge is bound by a decision of the Court of Appeal or Supreme Court.3 Not only must the High Court judge certify that the case meets these conditions, but the Supreme Court must also decide to hear the appeal.
(p.57) The process in Scotland is different. Litigants may only appeal civil cases, and may not appeal criminal cases, except where the case raises a devolution issue, or an issue relating to the compatibility of the actions of public authorities with the European Convention on Human Rights or EU law. In practice, this has led to a number of criminal appeals that deal with the right to a fair trial.
Scottish appeals are different for a second reason. Until 2015, the Supreme Court exercised no discretion over which Scots civil cases it heard. All appellants needed was certification by two Scots counsel that the appeal was “reasonable.” On occasion, this rather generous approach led to cases that were almost entirely without merit. In Anderson v. Shetland Islands Council and another ( UKSC 7), the pursuer (Mrs. Anderson) sought damages against the local council for failing to deal with sewage and drainage issues. Mrs. Anderson lost at first instance and on appeal. The first-instance judges “had some difficulty in making sense of Mrs. Anderson’s averments [but were] satisfied that they were completely irrelevant” (Lord Hope, 5). Despite this, Mrs. Anderson was eventually able to find two advocates prepared to certify that her case was reasonable. Given that one of the two advocates was Mrs. Anderson’s son, this was not strong evidence that Mrs. Anderson had a reasonable case.
This situation—which had historically plagued the House of Lords (Brodie 2009, 282–84; Chalmers 2004, 10–11)—was resolved by the passage of the Courts Reform (Scotland) Act 2014, which required permission for appeals to be granted either by the Inner House of the Court of Session, or by the Supreme Court itself. This act applies to decisions of the Court of Session delivered after the September 22, 2015, and so does not apply for much of the period considered in this book.
In this chapter, I examine cases where an application for permission to appeal was made. I ignore cases where the Court of Appeal or (more recently) the Court of Session has granted leave to appeal. Of the cases heard by the Supreme Court from 2011 onward, around 69 percent were granted permission to appeal by the court itself. This average is slightly misleading, as the figure for appeals from England, Wales, and Northern Ireland is much higher than the figure for appeals from Scotland (78 percent compared to 6 percent). Accordingly this percentage is likely to increase following the 2015 reforms to the way in which Scots appeals arrive at the Supreme Court.
The Supreme Court therefore exercises less control over its caseload than the High Court of Australia or Supreme Court of the United States, both of which have almost complete control of their docket.4 The court is comparable to the Supreme Court of Canada, where around three-quarters of cases are (p.58) discretionary rather than cases heard as of right (Supreme Court of Canada 2018, 6–7). The court does, however, exercise more discretion than the House of Lords did. In the early postwar period, only 22 percent of cases heard by the House of Lords were granted permission to appeal by the Lords themselves (Drewry, Blom-Cooper, and Blake 2007, 146). Explaining the PTA process is therefore more important than it once would have been.
The PTA process has become more transparent over time. Applications for permission to appeal are ordinarily heard by panels of three judges. Applications are grouped into batches of three to six cases and assigned to a panel. These panels are not fixed at the start of the legal year, but change over time. The practice is for each panel to be headed by one of the four most senior judges on the court, with the remaining space allocated according to “specialism, availability, and conflicts” (Paterson 2013, 69). Justices who are not included in panels may communicate their thoughts to their colleagues, though this is described as rare (ibid.). In eight cases, extra interest was accommodated by adding one or two judges to the PTA panel.
Panel members decide on the basis of written applications (no more than ten pages in length),5 and (at least partly) on the basis of shorter (three- to four-page) summaries produced by the court’s judicial assistants. Judicial assistants do not recommend an outcome, but individual judges sometimes ask their judicial assistant their opinion on the application (Nesterchuk 2013, 171, 173).
Dickson (2007) studied the PTA process in the House of Lords, and concluded that “Law Lords are given no precise criteria against which to measure the petitions they examine” (586–87). The criterion used by the Supreme Court are not much better. The court’s practice directions refer to cases which “raise an arguable point of law of general public importance that ought to be considered by the Supreme Court at that time,” but the key terms (“arguable point,” “general public importance”) are not further defined. Applicants are, however, asked to state clearly whether or not they are asking the Supreme Court to “depart from one of its own decisions or from one made by the House of Lords or the Court of Appeal of England and Wales,” issue “a declaration of incompatibility under the Human Rights Act 1998,” or make “a reference to the Court of Justice of the European Union.”
The decision-making rule on PTA panels is also unclear. “Even to describe the arrangements as a ‘system’ may be misleading”: Dickson (2007), 586. There is certainly no analogue of the “rule of four” found on the Supreme Court of the United States (Lax 2003). The normal decision-making rule seems to be majority (p.59) rule, except that a single judge may be able to persuade her colleagues if she is “pretty determined” (Paterson 2013, 249, 30).
Panels may decide to refuse permission to appeal, to grant permission, or to grant permission in specific terms. These decisions are not accompanied by reasoned judgments. Between 2009 and 2011, no explanation of any kind was given for the panel’s decision. From 2012 onward, the court has given reasons for refusing permission, but these are often formulaic. Three common formulae, from least to most conciliatory, are that the case:
• raises “no arguable point of law”;
• raises “no arguable point of law of general public importance”;
• raises “no arguable point of law of general public importance which ought to be considered by the Supreme Court at this time.”
In the past two years, the court has on occasion given longer reasoned judgments that explain the reasons for refusing permission to appeal. In In the matter of Secretary of State for Exiting the European Union (Appellant) v. Wightman and others (Respondents) (UKSC 2018/0209), the court gave a six-paragraph judgment explaining its reasons for refusing the Secretary of State permission to appeal the orders of the First Division of the Inner House of the Court of Session, in which the court had referred to the Court of Justice of the European Union the question whether Article 50 could be revoked by a member state exiting the European Union. In In the matter of Charlie Gard, the court published a five-page judgment (a transcript of a judgment handed down orally by Baroness Hale) in which it explained its decision to refuse permission to appeal an order of the Court of Appeal. The Court of Appeal’s order permitted staff at the Great Ormond Street Hospital to discontinue treatment for Charlie Gard, a severely brain damaged child. These two cases are exceptional, in that they deal with politically controversial matters and matters of life and death respectively. However, the court has dealt with exceptional cases before, and the fuller descriptions of PTA decisions given in these two cases are consistent with a general trend toward more detailed explanations.
On average, slightly more than one in three cases that seek permission to appeal are successfully granted permission. The court releases information on which applications have been successful roughly nine times a year. Information on the success rate of applications is plotted in Figure 3.1, which shows the success rate for each release (say, the release for January to February 2015, or the release for May 2016). As the figure shows, there is no obvious trend in success rates over time.
Figure 3.2 shows the rates of success by different areas of law, together with the number of applications made. In particular, it shows the much lower success rates (p.60) of applicants in Scots and Northern Irish cases. The low success rate in Scots appeals can be partly explained by the differences in procedure already discussed. Since (prior to 2015) non-criminal cases did not need permission to appeal, the figure reflects low rates of success in Scots criminal appeals. Not only do these (p.61) appeals require permission, they require the appellant to demonstrate that the case involves a devolution issue—typically an issue relating to the human rights of defendants. Although this requirement can be met, many of these applications have raised human rights claims that can politely be described as being speculative in nature.
The high success rate of applications in criminal appeals is unusual given the desire for certainty in criminal law, a desire that can sometimes manifest itself as an inclination to leave lower court decisions undisturbed. However, these applications are themselves the result of a selection process, and the criminal cases that gave rise to applications need not be representative of the broader mass of criminal cases.
Finally, Figure 3.3 shows the rates of success according to the measure of importance of the case below, using the measure introduced in the previous chapter. The rates of success in permission to appeal applications increase steadily as one moves from cases of lower importance to cases of higher importance—which is what one would expect if the court’s own account of the permission to appeal process was correct.
The Applicable Theory
In the previous section, I showed how likely applicants were to be granted permission to appeal, both in general and according to particular characteristics of (p.62) the case and of the appellant. In this section I give reasons why some applicants are more likely to succeed than others. I group these reasons into three categories: legal, organizational, and political. Before I do so, I must justify not discussing strategic decision-making.
The Role of Strategy
In Chapter 1, I claimed that the study of judicial behavior is dominated by the study of the Supreme Court of the United States. This is particularly true of the study of judges’ decisions concerning which cases to hear. There have been two studies of the “leave to appeal” process in Canada (Flemming 2005; Flemming and Krutz 2002), one in Australia (Stewart and Stuhmcke 2019), and one (unpublished) study of the process in New Zealand (Evans and Eissler 2015). By contrast, studies of the certiorari process in the United States have dealt with the role of dissent in lower courts (Cameron, Segal, and Songer 2000), the presence of amici curiae (Caldeira and Wright 1988) and economic underdogs (Ulmer 1978), under both sincere and strategic voting (Black and Owens 2009).
The current state of the art (Black and Owens 2009) suggests that judges consider the likely outcome of a case in a full hearing, when deciding whether or not to grant certiorari. If the appeal is likely to succeed, and if the judge prefers the outcome of a successful appeal to the status quo, s/he will vote to hear the case. This requires judges to have a sophisticated mental model of other judges’ behavior (“if this goes to the full court, then Kennedy will find for the appellants, and I don’t want that—so I’ll not let it go to the full court”).
This kind of strategic action would be difficult on the UK Supreme Court. First, it is not clear that judges’ views are predictable in the way that the views of justices of the United States Supreme Court are predictable. Second, even if judges’ views were predictable, the use of panels of judges means that judges who wanted to act strategically would not only have to forecast their colleagues’ views, but would also have to forecast which five, seven, or nine judges would decide the case on the merits. In other words, strategic judges would have also to second-guess the panel formation process. Because the panel formation process is complicated (see Chapter 5), I follow the route taken by Flemming (2005), and discount strategic models of the case selection process. Whatever preferences judges enact through their case selection, or whatever roles they adopt when hearing applications, they do so sincerely. Of course, excluding strategic accounts still leaves room for a great many potential explanatory factors. Next, I set out three different accounts, which draw upon previous accounts of case selection decisions and judicial decisions more generally.
The first group of factors I examine is the group of legal factors. I discuss three factors: case importance, opinion below, and the area of law. Each of these factors is, at least in theory, independent of the other: cases can be important without being finely balanced. Cases can also be important whether they deal with public law or private law.
Case importance is an obvious factor to consider when attempting to explain PTA decisions. It is explicit in the language that the court itself uses to describe its decision-making: the court selects cases that raise “points of law of general public importance.” I understand the court to mean by this that it selects cases that are of public importance in the sense that they have consequences that go beyond the consequences for the particular litigants involved (and which therefore go beyond merely “private” interests), and that it selects cases that raise important points of law, rather than cases that are important in some other sense (because it affects a large number of people, or because lots of money is involved).
I argue that the weight given to this factor in granting PTA can be included in a model of PTA decisions by using the measurement of case importance described in the previous chapter. That measure was a measure of the legal importance of the originating case. In using this measure, I am therefore assuming not just that the court is interested in legal importance, but that PTA applications that raise important points typically come from cases that raise important points. To the extent that PTA applications raise different points from those heard in the originating court, this measure will be less effective.
The hypothesis that more important cases are more likely to be granted permission to appeal may seem obvious. However, I am unable to find other direct tests of this hypothesis in the literature. Early studies of certiorari on the US Supreme Court examined the effect of different case “cues” on the chances of being selected for a full hearing (Tanenhaus et al. 1963; Teger and Kosinski 1980; Ulmer, Hintze, and Kirklosky 1972), including the presence of dissent, or the presence of amici curiae. These cues, however, are only proxies for importance, and since these cues are “only surrogates . . . cue theory ends up saying that the Justices tend to accept cases that they think are important” (Teger and Kosinski 1980, 845). My measure of case importance, however, allows a direct test.
Although case importance is the most obvious legal factor in explaining PTA success, importance is not sufficient for an application to be granted PTA. The application must also raise an arguable point of law. Some applications have been (p.64) rejected on the sole grounds that they raised no arguable point, still less one of importance.
Whether or not a PTA application raises an arguable point can be assessed by looking at what happened in the courts below. If the central issue in a permission to appeal application has always been the central issue in the case, then the balance of opinion in lower courts can be a useful guide to PTA outcomes: if the applicants were not able to convince any judges of their case, then it is less likely that their case is arguable. If the applicants were able to convince at least some judges, then it is much more likely that the PTA application involves an arguable point.
Like the measure of case importance, this measure assumes that there is a connection between the originating case and the PTA application. If the PTA application raises entirely novel issues, then this measure will be less effective.
Because my measure is novel, there are no existing tests of the impact of opinion below. My measure does extend one measure that has already been widely used in the study of certiorari decisions on the US Supreme Court (Tanenhaus et al. 1963; but note Teger and Kosinski 1980, 838) and the Supreme Court of Canada (Flemming 2005; Flemming and Krutz 2002), namely the presence of dissent in the court below. Dissenting judges alter the balance of opinion below, but my measure also incorporates disagreement between different levels of the judicial hierarchy.
Area of Law
The Supreme Court is a court of last resort, and not (just) a constitutional court. In comparison to many other courts of final appeal in common law jurisdictions, the court hears fewer cases of a purely constitutional nature. It does, however, attach importance to the public law and constitutional law cases it does hear. Although the court’s practice directions refer only to points of law “of general public importance,” the court website states that the court “concentrates on cases of the greatest public and constitutional importance.”6
This language suggests that the Supreme Court will be more likely to select a public law case rather than a commercial law case even where the two cases are equal in terms of the balance of opinion below and their general legal importance. I therefore include the area of law variable in my analysis. A second reason for including the area of law variable is that it allows for Scottish appeals to have different rates of success, before and after the introduction of a “leave to appeal” process in 2015.
(p.65) I have described this variable as a legal factor, but it also appears in other organizational and political models. A preference for a particular area of law can be regarded as a form of political preference: if the Supreme Court were to hear the same diet of tax and shipping cases that the House of Lords heard in the sixties and seventies, such a choice would be regarded as an act of political quietism. The choice would not be entirely in the hands of the Supreme Court, because there court is called upon to decide cases in the areas of human rights and devolution, which could not have existed in the sixties and seventies. Yet across different PTA panels, it ought to be possible to discern the different preferences of judges for areas of law.
The next group of factors includes two factors: workload and specialization. This grouping is slightly inconsistent with the grouping given in the previous chapter, where (in a discussion of areas of law) I placed specialization among the group of legal factors.
Workload, Individual and Collective
The decision to grant permission to appeal is a decision that has workload implications for the court and for its members. The more cases that are granted permission to appeal, the more cases the court eventually must hear, and the more work that members of the court will have to do in deciding those cases.
Judges, although they work hard, do not have infinite capacity. Each additional hour of work subtracts from the time available for other activities. At some point, judges must prefer an additional hour of leisure (understood in the broadest way to include all nonwork activities, including sleeping and eating) to an additional hour of work. Where current workloads are already high, this point is more likely to be reached. If judges take future work/leisure trade-offs seriously, then higher current workloads ought to make creating future work for oneself less attractive. This is another way of saying that higher workloads ought to make judges less likely to grant permission to appeal.
I have described the impact of this relative preference for leisure in terms of a judge’s individual workload: but it may be that the effect operates through the workload of the court collectively. As Posner (1993) has argued in the slightly different context of opinion writing, “[a]ny effort by one judge to hear more than his proportional share of cases or snag more than his proportional share of writing assignments is not only rebuffed but resented.” Similarly, any attempt to grant permission to appeal at higher rates when the court is already overworked (p.66) may be resented by other judges. This would have as a consequence that judges with low workloads would act on the basis of the court’s workload, rather than their own.
Judges are, for understandable reasons, reluctant to admit workload as a factor in their decision-making. When Donald Songer and colleagues interviewed judges about the leave to appeal process on the Supreme Court of Canada, one judge “was most emphatic: to his knowledge, the court had never turned away a case just because its docket was full” (Songer, Johnson, and Ostberg 2012, 76). The models that follow will not test whether judges turn cases away just because of workload—but they will be able to say whether workload is one factor among others.
When I discussed the effects of area of law, I discuss the preference that the Supreme Court as an institution may have for certain types of case. Individual judges can also have preferences over types of case. One source of preferences is judges’ specialization in different areas of law. While it is possible for judges to specialize in criminal or family law, while all the while wishing one were deciding cases of a different type, this would be extremely unusual.
Using a measure of judges’ relative specialization, we can test whether success in PTA applications depends on the specialization of the panel, and whether (for example) a panel that features three “public law lawyers” is more likely to select public law cases than is a panel with just one or no “public law lawyers.”
There is some preliminary evidence to suggest that this might be the case. Paterson (2013, 68) provides a gloss on the analysis of PTA decisions in human rights cases found in Poole and Shah (2009): the rise in the success rate of PTA applications was probably due to the involvement of “the ‘A’ team for human rights cases (and Lord Bingham in particular).” If this pattern applies to human rights cases, it is possible that it might also apply to the seven areas of law I have identified.
The final “group” of factors is the group of political factors. However, I consider only one factor under this heading, namely the type of litigants.
Types of Litigants
Most political accounts of judging reduce political preferences to preferences over case outcomes in a unidimensional policy space that runs from left to right (p.67) (Epstein and Knight 2013, 13–14). The dominance of these policy preferences is unfortunate, because there are other political preferences that can affect case outcomes. Preferences over types of litigants, including litigants of higher and lower status, and government litigants versus all others, are particularly important.
Many authors have found that higher status litigants—sometimes called “top dogs”—are more likely to have their case heard (Ulmer 1978; Flemming and Krutz 2002; Evans and Eissler 2015) than are litigants of lower status, or underdogs. This fits a general pattern identified by Galanter (1974), who concluded that the law often benefited litigants with greater resources.
The finding that better resourced litigants are more likely to come out ahead is not, by itself, evidence for a preference for litigants of certain types. Individual or corporate litigants with higher incomes may hire better lawyers, and those lawyers may identify better arguments. If judges react to better arguments, then higher success rates for wealthier litigants may indicate a preference for getting the law right, rather than a preference for wealthy individuals. Alternately, litigants with higher incomes may be favored by the rules judges are required to interpret (Galanter 1974, 123).
These problems are less severe when dealing with appellate courts. If the rules generally favor high-status litigants, why shouldn’t this factor already be “priced-in” at the level of the first-instance court, and why should high-status litigants enjoy any additional advantage at the appellate level? For this reason, any additional status advantage at the appellate level does seem to suggest that preferences over types of litigants must play some role.
Identifying status advantage can be complicated by difficulties of measurement. Many articles on status advantage assume a hierarchy that pits central government at the top; individuals at the bottom; and associations, companies, and local government ranked in between in that order. The difference in status between any two opposing litigants is equal to their difference in ranks, and this variable is included in a regression model of case outcomes.7 This way of measuring status advantage conflates cases of different kinds. When differences in status are disaggregated, researchers often find that companies and associations enjoy no advantage over individuals—but that governments enjoy significant advantages over all other litigants (Hanretty 2014; Smyth 2000).
This finding may reveal a more specific preference—a preference for governmental litigants. Discussions of this preference are common, but they are usually couched in slightly different language. In the United Kingdom it is common to talk about judges having different attitudes toward the proper degree of judicial deference to the executive, or public authority more generally. This way of talking (p.68) is more precise, since it draws attention to the way that this preference operates only in cases involving public law, rather than all cases involving the government as a litigant. In practice, however, those judges who are inclined to defer to public authority in public law cases may also be more inclined to give the benefit of the doubt in private law disputes involving the government.
Just as with status advantage, governments generally might come out on top because the rules are written to benefit governments. This would not, however, explain a government advantage at the appellate level. If the rules favor the government, why then should this effect not be exhausted by the time the case reaches the Supreme Court?
There are good reasons for thinking that judges on the Supreme Court should have a relative preference for governmental litigants compared to other levels of the judicial hierarchy—perhaps less so than when they sat as Law Lords in Parliament, but still more than, say, judges at the level of the Court of Appeal.
Table 3.1 List of Expectations
The more important the case, the greater the chances of permission being granted
The more judges who have found for the appellant, the greater the chances of permission being granted
Public law cases will have greater chances of permission being granted than all other cases
Area of law dummies
The greater the workload of panel judges, the lower the chances of permission being granted
(alternately) median workload
The greater the workload of the court, the lower the chances of permission being granted
The greater the specialization of panel judges, the higher the chances of permission being granted
Maximum specialization in area
(alternately) median specialization in area
Cases where government is the appellant will have greater chances of permission being granted
The different expectations put forward here are listed for convenience together in Table 3.1, together with an indication of how each of the key concepts is operationalized. This matter of operationalization is the subject of the next section.
Explanatory Variables Used
To test these expectations, it is necessary to measure the different concepts used. In the previous chapter, I set out different measures of legal, organizational, and political variables. In this section, I reintroduce these measures, noting any differences that result from applying them to PTA applications.
The measure of case importance I use is based on the number of generalist law reports that reported the case in the originating court. This measure can be applied to PTA applications in exactly the same way. In recent years the Supreme Court has published a vendor-neutral reference to the appealed decision; for earlier years I have had to search out cases by name. Because permission to appeal applications can be speculative, there are a higher number of cases that are not reported in the generalist law reports I consider, or are not even indexed by Westlaw. These cases, far from representing missing data, constitute “structural zeros.” Average case importance on the zero to four scale is 0.7; the standard deviation is just 0.9.
(p.69) The second legal factor canvassed was opinion below, or the ratio of (one plus) the number of judges finding for the appellant, to (two plus) the number of judges who heard the dispute. This measure is calculated in just the same way as was described in Chapter 2. There are complications where both sides to a dispute seek permission to appeal (i.e., where there is a cross-appeal). In these cases I have taken the first application for permission to appeal and discarded the subsequent applications. The mean of this variable is 0.2; the standard deviation is 0.07.
The third and final legal variable concerns the area of law. This is a categorical variable that can ordinarily be read off from the case’s appellate history (thus the case is a family case if it has gone through the family courts, a criminal case if it has gone through the criminal courts, and so on). Civil cases are the reference category for this variable, so that the effects associated with other areas are effects relative to civil cases.
The legal factors mentioned all refer to properties of the case. The organizational factors discussed here refer instead to properties of judges, either collectively or individually. This causes a problem for the analysis. Judge-level properties affect judge-level outcomes—and yet there are no judge-level outcomes, but only aggregate outcomes that describe the panel’s decision. How then should we link judge-level properties to aggregate outcomes?
This link must depend both on our theory relating each property to the judge-level outcome, and on the decision-making rule employed by each panel. I begin by illustrating this with the case of specialization, before moving on to workload.
In Chapter 2, I set out “specialization profiles” for each of the judges, measured by the proportion of cases heard by each judge in each of seven areas of law prior to their appointment to the Supreme Court. For each permission to appeal application, I use the relevant part of the specialization profile: thus, if the application concerns a family law case, I take the proportion of family law cases heard by each judge prior to their appointment.
The tentative assumption described was that greater specialization in an area would make judges more likely to grant permission to appeal. That is, there ought to be a positive association between (judge-level) specialization and (judge-level) outcomes.
Now consider the decision-making rule used on the court, and in particular the rule of one, where a single “pretty determined” judge is all that is required to grant permission to appeal. With this rule, the pivotal judge is the judge most favorably disposed to the application. Under our theory relating specialization to judge-level outcomes, this is the judge with the greatest specialization in the relevant area of law. The aggregation rule for a positive judge-level association and a rule-of-one decision-making rule is to take the maximum value. If this maximum value is still low, then not even a rule-of-one decision-making rule will result in a favorable outcome for the would-be appellant.
Thus, to test the link between specialization and PTA outcomes, we can take the maximum value. Alternately, if we believe that the decision-making rule is majority rule, then the relevant judge becomes the median judge. Whether or not we expect a positive or negative judge-level association, the aggregation rule for majority decision-making is to take the median value. I therefore use two alternative measures—maximum specialization, and median specialization—where the specialization profiles are as described in Chapter 2. Although the mean values for the two measures differ for obvious reasons (0.4 using the maximum, 0.2 using the median), the standard deviations are much more comparable (0.3 and 0.2 respectively).
(p.71) The measure of individual workload uses the same logic in the opposite direction. Workload, unlike specialization, is expected to have a negative association with judge-level outcomes, as overworked judges become less likely to grant permission to appeal. The rule-of-one decision-making rule suggests (as before) that the pivotal judge is the one most favorably disposed toward the case—which means the judge with the lowest workload. The relevant aggregation rule therefore becomes the minimum. If the decision-making rule is majority rule, then as before the median value is used. When we take the minimum, then the mean value and standard deviation are 19.6 and 7.6 respectively. When we take the median, these figures become 24.3 and 8.4.
The final organizational factor is the workload of the court, which I defined in Chapter 2 as just the sum of the workloads of the individual judges on the court. Because this variable is supposed to operate in the same way on all judges on the panel, there is no need to consider alternative operationalizations. The mean value is 278; the standard deviation is 91.
Finally, the different political factors discussed earlier traded on the identity of the different litigants. I have classified both appellant and respondent into one of five categories:
• local government or other public authority;
• central government.
In order to avoid collapsing these different categories into an assumed hierarchy, I have instead created a series of trichotomous variables. Each type of litigant corresponds to a particular variable, which is scored 1 when the appellant belongs to this category, -1 when the respondent belongs to this category, and 0 in cases where either both appellant and respondent are in this category, or where neither is. This means that the coefficients attached to these variables will be positive where this type of litigant is more likely to have their case selected for a full hearing when they appeal, and less likely to have their case selected when they are the respondents to the appeal. The coefficients on these variables thus reflect a preference on the part of the Supreme Court as a whole, relative to courts below.
The data on the decision whether or not to grant permission come from the PDF files made available on the court website. These files list the name of the case, the unique case identifier (unique to the Supreme Court), the judges who decided on the application, the date, and the outcome. Later files include the vendor-neutral reference to the decision being appealed.
The decisions reached may include permission being refused, being granted, or being granted in part. I have collapsed the last two categories into one. Without access to the submissions made by the parties, it is difficult to say whether much is lost by this decision: whereas it is good practice for counsel to concentrate on well-founded grounds for appeal, some counsel may adopt a more scattergun approach that might lead more of their applications to be granted in part.
An additional complication arises from the possibility of multiple appeals and cross-appeals. In the case of multiple appeals—that is, applications made by different parties on the same side of a legal dispute—I have collapsed these multiple observations into a single observation by taking the most favorable value for the applicants, on the basis of the theories above. Thus, in a case involving an appeal by both a governmental actor and an individual, I take the application by the governmental actor. Cross-appeals are much harder to identify: I have removed these from the data.
In this section, I describe a series of logistic regression models of case selection. These are all logistic regression models because the dependent variable—whether or not the application was granted—is dichotomous, and because logistic regression is appropriate for variables of this type. I show the results from a series of models because each group of factors discussed corresponds to a different set of foundations for judicial behavior, foundations that are not easily combined. I therefore present separate legal, organizational, and political models—although I do also present a combined model at the end.
Each model is plotted as a graph. The plotted points represent the value of the coefficient in the model. The meaning of each coefficient depends on the units in which each variable is measured. We can transform these variables to make them comparable by dividing each continuous variable by two standard deviations. Thus, the coefficient for importance now reflects the change associated with a change of 2 × 0.9 = 1.8 on the original, untransformed scale; the coefficient for opinion below; the change associated with a change of 2 × 0.07 = 0.14 (p.73) on the original scale; and so on. Where a variable is not a numeric measure, but a categorical measure (either the area of law is public law or it is family law), the coefficients reflect the change associated with “switching this variable to the ‘on’ position.” Where a given categorical feature is as likely to be “on” as it is “off,” then a change of two standard deviations is an equivalently large change in a continuous variable (Gelman 2008, 2867). It is therefore possible to look at the absolute magnitude of each coefficient and work out which variable has the largest impact, without knowing how each variable is measured or how dispersed it is.
In a logistic regression, the value of each coefficient gives the change in the log odds ratio of a positive outcome (being granted PTA). The odds ratio is the odds of a positive outcome, divided by the odds of a negative outcome. If the value of the coefficient is 0.7, then I shall say that the odds of success are e0.7=2 times better. Unfortunately, because of the way odds ratios are calculated, this is not the same as the probability of success being twice as high. I discuss odds ratios when discussing each separate model, but when I discuss the combined model, I also show changes in the probability of PTA success.
An illustrative example of the types of plot I show is given in Figure 3.4. This figure shows the most likely estimate for each coefficient, together with 95 percent credible intervals (the thin line) and 83% credible intervals. Where the 95 percent credible intervals are either all positive (or all negative), we can be (p.74) confident that, under this model, the true value of the coefficient is positive (or negative) (variable A in the plot; variable C in the plot). Where the 83 percent credible intervals are all positive (or all negative), this is good evidence that the true value of the coefficient under this model is positive (or negative), but it is not as good as if the 95 percent intervals were all clear of zero (variable B). Where the lower end of the 83 percent credible interval for one coefficient is greater than the upper end of the 83 percent credible interval for another coefficient, that is good evidence that the first coefficient is bigger than the second coefficient (variables D and E).
The Legal Model
Figure 3.5 plots the results of a legal model of case selection with three variables: case importance, opinion below, and area of law. Our expectations regarding case importance and opinion below are clearly borne out. An increase of two standard deviations in case importance—which, at 1.8, is approximately the difference between being reported in two generalist law reports and not being reported at all—is associated with a change in the odds ratio of e0.77=2.16 times. This effect is relatively precisely estimated.
(p.75) The effect of a two standard deviation change in opinion below is smaller in magnitude, but as with importance we can be confident that the effect is positive. The greater the proportion of judges below who have found for the appellant, the more likely they are to be granted permission to appeal. Because the opinion below variable is constructed as a fraction, a two standard deviation change is difficult to understand intuitively—but a change of this magnitude (0.14) is like moving from the situation where four judges have found against the appellant to a situation where three judges have found against the appellant and one judge for the appellant (compare with Table 2.1).
The third legal factor that we expected to have an effect was the area of law, and in particular whether or not a case was a public law case. As the figure shows, our best point estimate of the effect of being a public law case is positive, but we cannot be confident about this.
I have included the other areas of law as additional variables. Of these, the only area of law that has an effect that approaches statistical significance is Scots law. Our best estimate suggests that Scots law cases are roughly one-quarter as likely (0.25) to be chosen as are civil cases—making Scots law a greater handicap than being reported in a generalist law journal was a boon.
The Organizational Model
Figure 3.6 shows the effects of organizational variables. Because the operationalization of these variables depended on different assumptions concerning panels’ decision-making rule, I plot the results of two different models, which correspond to the two different decision-making rules described earlier (majority rule and the “rule of one”).
My expectation was that higher workload, at both the individual and panel levels, would make it less likely that permission to appeal would be granted. Our best estimate of the effect of court workload points in this direction, but we cannot be confident that the effect really is negative. Our best estimate of the effect of individual workload points in the opposite direction. It seems therefore that there is little consistent evidence for an impact of workload.
My second expectation was that specialization in an area would make panel members more likely to grant permission to appeal. The results of the model, which assumes a “rule of one” suggests that, if anything, specialization in an area makes panel members less likely to grant permission to appeal: the coefficient on this variable is negative, and although not substantively as important as case (p.76) importance or opinion below (discussed in the earlier legal model), the effects are not negligible. Panel members do not shape the court’s docket so as to favor their own areas of expertise.
The Political Model
A political model of case selection is shown in Figure 3.7. It includes both the variables that encode the type of appellant, as well as information on the area of law. I have included information on the area of law because otherwise it might appear as though governments were more likely to be successful, when this might merely be an artifact of a general preference for public law case, an area of law in which governmental litigants are, for obvious reasons, more likely to feature.
The figure shows that even when allowance is made for the different effects of different areas of law, governmental litigants are significantly more likely than the reference category (companies) to succeed in permission to appeal applications. Central government has odds of success that are 1.63 times better (p.77) than companies. The advantage enjoyed by local governments is smaller, but not significantly so. There is some evidence for individuals being less likely to succeed than companies, but the evidence is not as clear as it is for government advantage.
The Combined Model
I have presented these models separately, since they flow from separate accounts of judicial decision-making that build on different foundations. The legal model emphasizes characteristics of the case, while the organizational account emphasizes characteristics of the judges. This has advantages, because it keeps these accounts of judicial decision-making separate.
The disadvantage of estimating these models separately is that some of the findings of individual models may be artifacts of patterns identified in other models. For example: in the political model, I found that governmental litigants enjoyed an advantage compared to companies. I did not, however, control for (p.78) case quality in that model. If governmental litigants were more selective litigants than companies, this might mean that government advantage is an artifact of higher case quality.
For these reasons, the penultimate figure in this chapter, Figure 3.8, presents a combined model that includes all of the variables that featured in previous models. Since some of the organizational variables depend on assumptions regarding the decision-making rule used by panels, I have plotted the results for the (p.79) majority-rule versions of these variables: the results for the rule-of-one versions are indistinguishable from the results plotted here.
Consider first the legal factors. When these factors were examined separately, both case importance and opinion below had significant positive effects on success. They retain those effects in the combined model, and their magnitudes are similar to those found in the original model. Public law cases were, in the original legal model, more likely to be granted permission to appeal, but it was not possible to rule out an effect that was negative or close to zero. In the combined model that controls for other factors (such as the type of appellant), the effect of a case being a public law case is comparable in magnitude to the effect of the appellant coming from central government. Compared to an ordinary commercial case, the odds of a public law case being granted permission to appeal are 1.55 greater.
Now consider the different organizational factors. Previously, none of these factors had effects that were appreciably different from zero. That remains the case in the combined model. In the case of specialization and panel workload, we can be relatively confident that the effect of these variables is tightly clustered around zero.
Finally, the identity of the appellant matters in this model just as it did in the previous, political, model. Central government litigants (and to a lesser extent local government) remain more likely to be successful in obtaining permission to appeal. Given the effect of public law cases, central government enjoys an added advantage when it seeks permission to appeal in public law cases.
Goodness of Fit
These models exist to test different hypotheses about the PTA process. These hypotheses in turn are part of a broader theoretical approach to judicial behavior. In describing these models, I have paid particular attention to individual variables and their effects. When we shift to considering the models as a whole, it is natural to ask which model is “best,” in the sense of providing the best fit to the data.
To answer this question, Figure 3.9 shows two different goodness-of-fit statistics: the percentage correctly predicted (PCP) and the leave-one-out information criterion (LOOIC).
PCP is a poor but easily understood measure of model fit. Logistic regression models generate probabilities of certain outcomes. If the model predicts permission to appeal in any given case will be granted with a probability of greater than one-half, and if permission to appeal is in fact granted, then we call this a correct (p.80) prediction. Conversely, if the model predicts PTA will be granted with probability less than one-half, and PTA is not granted, this too would be a correct prediction. We can average over all of these predictions and calculate the percentage that were correct. This is the PCP.
Because the coefficients in the model are subject to uncertainty, it is helpful to incorporate this uncertainty in our measures of model fit. I thus calculate (p.81) the PCP using multiple draws from the posterior probability distribution of the coefficients. If the PCP is high, but the interval surrounding the PCP is large, then we may have either an exceptional or a merely good model.
PCP is a poor measure of model fit for two reasons. First, PCP throws away extra information: probabilities of success of 51 percent and 91 percent are very different, but PCP treats them as the same. Second, measures of PCP can look good even if the model is not. A null model is a model that simply predicts that the most common outcome (in this case, refusal to grant PTA) will always happen. Where one outcome is common, the null model can predict a large proportion of decisions correctly. For this reason, it is important to interpret the PCP values together with the values for the null model. The figures given for PCP can, with this proviso, be interpreted in the same way for all of the binary outcomes modeled in this book.
LOOIC is a more complicated but ultimately more useful measure of model fit (Vehtari, Gelman, and Gabry 2017). LOOIC takes account of how good or bad each prediction is (not just whether it predicted something would or wouldn’t happen), and approximates the error we would get if we were repeatedly to estimate the model on all but one of our data points (hence, “leave one out”). LOOIC does not have units or ranges of possible outcomes, but depends on the number of cases and the details of each model. For these reasons, LOOIC cannot be compared across different data sets, but only between different models of the same data. The smaller the value of LOOIC, the better the model fits the data.
These two statistics give different rank orderings of models. Generally the different models are able to explain between 65 and 68 percent of decisions correctly. This is a slight improvement over the null model, which simply predicts that all applications will be refused permission to appeal, and which classifies 65 percent of applications correctly. Of the four models, one model—the organizational model—is clearly inferior, and delivers no improvement (on this statistic) over the null model. The combined model fares best in terms of PCP, followed by the political model.
The combined model also fares best in terms of LOOIC. What changes is the relative performance of the other models. The legal model is now preferable, on grounds of the LOOIC, to either the organizational or political models. These differences, however, are small relative to the uncertainty of the estimates. If we were initially indifferent between models, we would pick the combined model. If instead we had to evaluate other models relative to the legal model, and if we had to be confident that an alternative model was going to improve on the legal model, we would not be able to do so. Phrased differently, we would pick a combined model over the legal model on the balance of probabilities, but not if we had to prove the superiority of the combined model over the legal model beyond reasonable doubt.
In this chapter, I examined the way in which cases arrive at the Supreme Court. The selection of cases is an important part of the Supreme Court’s overall decision-making process. Most of the cases heard by the court are chosen by the court, and the court’s choices therefore have considerable implications for the development of the law. In choosing some cases, but not others, the court contributes to the development of certain aspects of the law, and not others. The court’s choices can also have dramatic consequences for individual litigants, each of whom seeks their day in court.
I also provided statistics on the rate of success of applications for permission to appeal. These statistics showed that success rates tend to be around one in three, but that success is more likely in more important cases and in public and criminal law cases.
I tried to explain these patterns using three broad families of explanation: a legal account, an organizational account, and a political account. I built and tested statistical models based on these accounts. Since a combined model, which included elements of all three accounts, was the model that provided the best fit for the data, I use this model to summarize my findings.
Considering legal factors, I expected that the more important the case (as judged by the number of generalist law reports reporting the case in the court below), the more likely the case would be granted permission to appeal. It is virtually certain that more important cases are more likely to be chosen, and the substantive magnitude of this effect is large. This effect is greater than the effect of opinion below, which however also works in the same direction. Knowing whether or not the case is a public law case, however, is a more important predictor than tallying the judges who have previously decided for or against the appellant.
Concerning organizational factors, I expected that increases in panel and court workload would decrease the chances of permission to appeal being granted, and that panel members would be more likely to choose cases in areas in which they specialized. These expectations were not borne out. All of these variables had effects that were close to zero. The only good evidence for a nonzero effect came from a purely organizational model in which specialization had an effect in the “wrong” direction (i.e., panels discriminated against cases in their area).
Finally, concerning political factors, I expected that governmental appellants would be more likely to win permission to appeal than all other actors, even controlling for the area of law. This is borne out: our best estimate of the effect of the appellant being a central government department (rather than a private company) is greater than the effect of a one-standard deviation increase in case (p.83) importance. There is some weak evidence that individual appellants are disadvantaged (relative to private companies).
These findings matter in three ways. First, they confirm the court’s own account of the permission to appeal process. The three factors that the court identifies (important, arguable points of law, of general public or constitutional importance) do matter, and indeed importance and public law are the first and third most consequential factors in the model.
Second, these findings suggest that judges on the Supreme Court are not engaged in “bureau-shaping” (Dunleavy 2014). They do not accept fewer assignments when they already have a large number of pending assignments, and they display no particular preference for their own areas of law. This last finding is of particular interest, since it goes against previous findings with respect to human rights issues and family law (Paterson 2013, 68; see also Poole and Shah 2009), and against the more general claim that judges are interested in cultivating an audience (Baum 2009).
Third, these findings suggest that the identity of litigants does matter, and that governments get a second bite at the cherry. The pattern found here of government advantage is one which will be tested later on, when I look at the final outcome.
Quite why this pattern of government advantage should arise is not clear. I have given a preference-based argument—that judges on the Supreme Court, compared to judges in lower courts, have a preference for governments being given a second or third chance to argue their case. This preference might either be direct or induced. If, for example, Supreme Court judges collectively felt more politically exposed than Court of Appeal judges, and more exposed to legislative overrule, then judges might have good reasons to be generous to governments in order to preserve the court’s authority (particularly for those moments when it really must rule against the government). It is easier to be generous in permission to appeal decisions than it is in deciding upon the merits of the case. These two accounts of why governments fare better in permission to appeal decisions are thus observationally equivalent—but if the basis for this pattern lies in judges’ preferences, then the pattern ought also to emerge when I turn to the decision on the final outcome. (p.84)
(1) The exact date is unclear. The court, in publishing the results of applications for Permission to Appeal for March and April 2016, described permission to appeal as having been granted on the 19th; in the court’s judgment, Lord Mance (10) described permission to appeal as having been granted on the 21st April, the first day of the hearing on the merits.
(2) Practice Direction 3, “Applications for Permission to Appeal,” §3.3.
(3) Practice Direction 1, “The Supreme Court of the United Kingdom,” §§1.2.17–1.2.19.
(5) Practice Direction 3, §3.1.2.
(6) See https://www.supremecourt.uk/about/role-of-the-supreme-court.html, last accessed 18 June 2018.