An in depth analysis of individual-level data gathered from respondents to the UK’s largest household survey in Understanding Society, which included the EU referendum question.
By Eleonora Alabrese (a*), Sascha O.Becker (b*), Thiemo Fetzer (c*), Dennis Novy (d*)
Published on Science Direct: European Journal of Political Economy, Volume 56, January 2019, Pages 132-150
(a*; b*; c*; d*)
Previous analyses of the 2016 Brexit referendum used region-level data or small samples based on polling data. The former might be subject to ecological fallacy and the latter might suffer from small-sample bias. We use individual-level data on thousands of respondents in Understanding Society, the UK’s largest household survey, which includes the EU referendum question. We find that voting Leave is associated with older age, white ethnicity, low educational attainment, infrequent use of smartphones and the internet, receiving benefits, adverse health and low life satisfaction. These results coincide with corresponding patterns at the aggregate level of voting areas. We therefore do not find evidence of ecological fallacy. In addition, we show that prediction accuracy is geographically heterogeneous across UK regions, with strongly pro-Leave and strongly pro-Remain areas easier to predict. We also show that among individuals with similar socio-economic characteristics, Labour supporters are more likely to support Remain while Conservative supporters are more likely to support Leave.
Populism has been on the rise across Europe and the United States in recent years, culminating in the election of Donald Trump as US President and the Brexit vote in the 2016 EU referendum. The Brexit vote came as a shock to many observers and triggered early attempts to understand the voting patterns.1 These studies relied almost exclusively on aggregate data at the level of voting areas. Regressing vote shares across voting areas on average population characteristics risks falling into the ecological fallacy trap of inferring individual associations from aggregate data (see Robinson, 1950).
We use detailed individual-level data from the Understanding Society survey containing the EU referendum question to address three interrelated questions. First, we investigate the relationship between voters’ personal characteristics and their expressed voting intentions. Particularly, we address whether ecological fallacy may be driving the associations documented in the aggregated data. Second, building a predictive model of Leave support we assess which voting determinants have the most power to predict voting behavior out of sample. Third, we investigate the classification errors that this predictive model makes by region and voters’ closeness to political parties.
We find that individual and aggregate coefficients point in a similar direction, suggesting that ecological fallacy is of limited concern. Second, we document that the predictive models exhibit a significant gain in accuracy when exploiting both individual and regional variables. Lastly, we document that a predictive model performs best in parts of the UK with the most extreme referendum outcomes: Lincolnshire (highest Leave share) and London (lowest Leave share across mainland Britain). Furthermore, a decomposition of classification errors reveals that closeness to a political party is likely an important omitted variable, suggesting that unobservable traits and identity are further key correlates.
The paper is structured as follows. Section 2 lays out the literature background, describes the data and explains our empirical approach. We present graphical summaries of our results in Section 3, and we conclude in Section 4. Underlying regression results and further details are relegated to an appendix.
2. Background, data and empirical approach
This paper builds on Becker et al. (2017) who analyze the Brexit vote shares across UK voting areas, using a wide range of explanatory variables. They show that the Leave vote shares are systematically correlated with older age, lower educational attainment, unemployment, or employment in certain industries such as manufacturing, as well as with a lack of quality of public service provision.
These results fit in with other evidence on the Brexit vote. An early attempt to explain the referendum outcome was made by Ashcroft (2016) whose polling data indicated that the typical Leave voter is white, middle class and lives in the South of England. Sampson (2017) reviews the literature on the likely economicconsequences of Brexit on the British economy and other countries.
Our paper also relates to the wider literature on political polarization as well on voting for far-right parties. Ferree et al. (2014) provide an extensive review of academic works which link voting patterns to demographic, economic and political features. Voters’ behavior has also been shown to be strongly associated with individual scepticism towards institutions (e.g. Euroscepticism) or intolerance against foreigners (see Whitaker and Lynch, 2011; Clarke and Whittaker, 2016; Arzheimer, 2009). Additional studies claim that ethnic minorities may engage in ‘ethnic’ or ‘policy’ voting depending on the issue they are called to vote upon (see Bratton and Kimenyi, 2008; Tolbert and Hero, 1996).
Polarization has also been related to immigration (see Barone et al., 2016) as well as trade integration (Dippel et al., 2015; Burgoon, 2012; Autor et al., 2016). In the UK context, Becker and Fetzer (2016) examine immigration from Eastern Europe as a potential driver of support for the UK Independence Party, while Fetzer (2018)explores the role of austerity policies since 2010.
Overall, the voting patterns in the Brexit referendum are complex. One possible – albeit not the only – interpretation of the empirical literature on Brexit so far is that some people who favor Leave may feel ‘left behind’, be it economically or culturally (see Hobolt, 2016; Clarke et al., 2017). This is consistent with sociological studies which demonstrate similar patterns for the Tea Party Movement and the 2016 US presidential election, e.g. Hochschild (2016).
Are these aggregate patterns found by Becker et al. (2017) and others a fair reflection of individual-level relationships? The individual-level data from wave 8 of the Understanding Society survey makes it possible to investigate this question. Our focus is on individual socio-economic variables for which region-level equivalents are used in Becker et al. (2017). Our approach of combining individual-level and aggregate data allows us (a) to check whether ecological fallacy is an important factor in aggregate analyses of the Brexit vote, and (b) to exploit the combined predictive power of individual-level and aggregate variables. This opens up insights into (c) geographic heterogeneity in predictive power across UK regions.
The Understanding Society data cover a wide range of topics, in particular basic demographic data for all household members such as sex, age and ethnicity, place of birth, family background including marital status, educational attainment, current job characteristics, housing characteristics (owning vs. renting), health status and life satisfaction. We describe the sampling design in more detail in the appendix, and how we construct our sample (also see Knies, 2016; Buck and McFall, 2012).
2.3. Descriptive statistics
According to the summary statistics in Table 1, 42.2% of the 13,136 individuals in our sample indicate that the UK should leave the EU in response to the survey question “Should the UK remain a member of the EU or leave the EU?” This compares to 51.9% of the electorate voting Leave in the referendum. We refer to Becker et al. (2017, section 3.1) for a discussion of the aggregate voting and turnout patterns in the 2016 referendum.
Notes: The table reports the number of observations (N), their mean, standard deviation (sd) as well as the minimum and maximum values. The summary statistics for the aggregate variables are reported based on the raw data, whereas in the regression tables these variables are used in standardized form.
As for demographics, the proportion of males is 45.4% of all individuals in the sample, while just about three out of ten respondents are aged 60 or above. People with no qualification account for about 8% of the sample. Roughly 90% of respondents are born in the UK. Asians are the largest ethnic minority amounting to 5.8% of the sample, followed by blacks (2.5%).2 Over half of respondents are married or in a civil partnership. In terms of current employment, roughly four out of ten people declare to be without a paid job or to not have worked in the seven days prior to being questioned.3
2.4. Understanding Society: Research in progress
We gained access to Understanding Society data in the summer of 2017, at the same time as other groups of researchers in a pilot ‘early access’ project. We briefly summarize related preliminary findings reported by other researchers in short presentations in the summer of 2017. For instance, Creighton and Amaney (2017)find that opposition to immigration played a key role. Martin and Sobolewska (2017)explore racial determinants and find that ethnic minorities are strongly in favor of remaining in the EU. De Vries and Solaz (2017) attempt to explain voters’ behavior by analyzing socio-economic determinants such as asset holdings, sources of income and skills, whereas Doebler et al. (2017) explore additional potential drivers such as personal economic struggle and regional economic decline.
As far as we are aware, only one other paper using Understanding Society data has come out as a working paper so far. Liberini et al. (2017) show that individuals dissatisfied with their own financial situation were more likely to vote Leave and that the very young were most likely to vote Remain. In related work, Pollock (2017)uses the Innovation Panel to argue that the rise in populism and the vote in favor of Brexit can be attributed to generational shifts away from mainstream political parties over the past three decades.
2.5. Empirical approach
We start with a simple model where the dependent variable yic is a dummy for individual i in local authority c which takes on the value 1 if the interviewed person answers “Leave” in response to the question “Should the UK remain a member of the EU or leave the EU?” and 0 if the answer is “Remain”:
(1)Becker et al. (2017) on the other. Our overall sample contains 13,136 respondents for our baseline regressions. We also analyze smaller samples and subgroups of variables since not all Understanding Society respondents were asked each survey module. As the summary statistics in Table 1 show, roughly 42% of respondents are in favor of Leave.The independent variables in the model are the Understanding Society cross-sectional individual covariates xic on the one hand, and area-specific aggregate variables zc from
We relegate the details of the underlying regression results to the appendix. For ease of interpretation, throughout the regression tables in the appendix we provide coefficients obtained from a simple linear probability model estimation of Eq. (1). However, each model is also estimated using the corresponding logistic regressionmodel to provide an estimate of the success rate at the bottom of each table.
Since our interest centers on prediction, we need a metric to assess predictive accuracy of our regression models. We perform a simple validation exercise known from the machine learning literature. Our sample is divided into a random training set (2∕3 of the sample) and a validation set. Logistic regressions are conducted on the training set, and we use the validation set to perform classification. We follow Bayes’ optimal decision rule and classify an observation as “Leave” if the predicted posterior probability exceeds 50%. In essence, this simple rule allocates the label (“Leave” or “Remain”) to an observation that, conditional on our predictors/features, is most likely. This decision rule minimizes the error rate or maximizes overall accuracy. Yet, it does so putting an equal penalty or cost on false positives versus false negatives. The comparison of the predicted to the actual assignments allows us to estimate the out-of-sample predictive power and to shed light on the two types of prediction errors (false positives versus false negatives). For instance, individual A in the validation set may, based on her characteristics, look like a typical Remain voter but is in reality a Leave voter, so we have a case of a false negative. Individual B in the validation set may, based on her characteristics, look like a typical Leave voter but is in reality a Remain voter, so we have a case of a false positive.
We stress that causality is beyond the scope of our paper. Instead, our results reflect a broad range of correlation patterns relating voting intentions to fundamental socio-economic features.4 In our earlier work (Becker et al., 2017), we grouped variables by four topics: (1) EU exposure: immigration, trade and EU transfers; (2) Public service provision and fiscal consolidation; (3) Demography, education and life satisfaction; (4) Economic structure, wages and unemployment. Those groupings follow from prominent hypotheses that have been proposed to explain the EU referendum result. That is, the first grouping looks at the relationship between EU exposure and Leave voting. Here, we follow the same logic and look at groups of variables that correspond to one specific set of explanations for the referendum result. For each variable grouping, we assess its predictive power by itself, and compare this to the joint predictive power of all groups of variables combined. We discuss the different groupings in more detail in the appendix (the regression tables using the groups of variables under discussion are described in A.4 Demographics, technology, education and employment, A.5 Health, A.6 Housing, A.7 Employment, A.8 Unearned income and state benefits, A.9 Life satisfaction, A.10 Nationality and ethnicity).5
As Becker et al. (2017) explain, the fundamental difference between prediction, as pursued in this paper, and causal inference is as follows. Causal inference focuses on the internal validity of causally estimated reduced-form (or structural) parameters β. In contrast, prediction is concerned with the external validity of the estimated fitted values ŷ.6 Causal inference seeks to obtain a set of estimated parameters that are usually studied in isolation. Thus, they often do not render themselves useful for prediction because the out-of-sample model fit is generally poor. Instead, good model fit typically requires a multitude of regressors, and machine learning can often substantially improve out-of-sample predictive performance (Mullainathan and Spiess, 2017). The underlying estimated parameters that yield good model fit are typically of limited interest per se. For this reason, we only show coefficient estimates in appendix tables, while in the main text we focus on graphical representation.
3. Predicting the vote
In order to focus on prediction quality, we relegate the discussion of individual regression tables to the appendix. First, we focus on the relative predictive power of individual-level and aggregate variables. Second, we examine the predictive power of our best-performing model across regions and lastly, we investigate the classification error structure.
3.1. Individual vs. aggregate variables
Fig. 1 reports the proportion of correct predictions (success rates) for each variable grouping estimated in the hold-out sample. In particular, Fig. 1(a) illustrates success rates for (groupings of) aggregate variables and Fig. 1(b) for individual-level variables. Fig. 1(c) combines aggregate and individual-level variables. Fig. 1(d) reports success rates for non-comparable individual variables.
The overall classification success rate when we rely on aggregate data in Fig. 1(a) is 58.8%. In the narrow individual-level sample for which employment and related individual data is collected in the Understanding Society sample, the overall accuracy reaches 62.9% when we use the aggregate level area employment characteristics. The improvement in terms of accuracy relative to a naive classification rule that classifies everyone as Remain (generating a success rate of 57.8%, i.e. one minus the sample ‘Leave’ share) thus is only modest. When focusing on all comparable individual-level covariates in Fig. 1(b), we see that individual-level variables have stronger predictive power than aggregate ones. The improvement in accuracy up to 63.4% with all variables included suggests an improvement in prediction accuracy relative to the naive benchmark by 9.7%.
Furthermore, an inspection of the tables in the appendix confirms that the individual-level predictors yield broadly similar sign patterns to their aggregate-level equivalents. This suggests that ecological fallacy is not a major concern for the results in Becker et al. (2017).
The combination of individual and aggregate characteristics yields a further slight improvement in prediction accuracy. Relative to the naive classification rule, accuracy can improve up to 64.6% with all covariates included, representing an improvement of 11.7% in relative terms. Adding further individual-level characteristics that are included in the Understanding Society sample (but for which no aggregate proxy measures exist) suggests that overall accuracy is not further improved.
In fact, our best model including all characteristics sees a small drop in the success rate. In terms of the bias-variance trade-off inherent in such predictive models, the improvements in terms of bias are therefore likely offset by an inflation in terms of variance, resulting in worse out-of-sample performance. We refer to James et al. (2013) for a discussion of the bias-variance trade-off.
As explained in the appendix, we explore a number of novel individual determinants. We find that marital status, technology use and dependence on income support and state benefits are all systematically linked to individual voting behavior. In particular, individuals who do not possess smartphones and who use the internet infrequently appear more inclined to support Leave. Those repeatedly seeking health care or receiving income support also tend to be more in favor of Brexit. Similarly, it is also fair to say that Brexit is a predominantly white phenomenon compared to ethnic minorities.
3.2. Geographical heterogeneity
An instructive step lies in attempting to decompose in which regions our model does a good job of correctly classifying the voting intentions in the Understanding Society sample. Among all NUTS2 regions in Fig. 2, Inner London displays the lowest error rate (21%) followed by Lincolnshire and North Eastern Scotland (with 23% and 26%, respectively). Lincolnshire and Inner London had among the highest and lowest Leave vote shares in the referendum. Thus, it is hardly surprising that the empirical model performs well in separating voters in these regions.
The model has the lowest performance in Tees Valley and Durham, East Anglia, and Merseyside (with error rates around 43–44%). Generally, the picture that emerges suggests that purely based on the socio-economic characteristics, areas that are more disadvantaged are the ones where it is most difficult to separate Leave from Remain voters. Non-economic factors may therefore be particularly helpful in capturing variation between voters in these areas.
3.3. Types of errors
We turn to decomposing errors into false positives and false negatives. The results presented in Fig. 2 suggest that the regions of Inner and Outer London, Berkshire, Buckinghamshire and Oxford as well as North Eastern Scotland stand out as having the highest rate of false negatives (blue bars). False negatives are cases in which our model identifies an individual as a Remain voter, while in fact they state an intention to vote Leave. The false negatives in Fig. 2 suggest that there are non-negligible proportions of voters who, based on their socio-economic characteristics, look like Remain voters but actually express an intention to vote Leave. In Outer London, 80% of all classification errors are false negatives. The same holds true for many of the other regions in London’s wealthy commuter belt.
We next investigate whether classification errors can be related to individual political party preferences. From previous Understanding Society survey rounds which asked participants what party they felt closest to, we obtain that measure for 65% of our estimation sample. Fig. 3 highlights that, while overall accuracy across the stated historical party preferences is similar, the type of classification error is quite heterogeneously distributed. In particular, Labour voters are more likely to contribute to the false positive errors – cases where our model classifies an individual as a Leave voter when in fact they favor Remain – making up 51.27% of the share of all false positives. By contrast, Conservative party supporters make up 44.8% of the share of false negatives – individuals who look like Remain voters but actually intend to vote Leave.
Overall, our findings indicate that Labour voters with observables that put them in the Leave camp – male, older, less educated, less likely to be in employment, etc. – are significantly more likely to express a preference for the status quo of remaining in the EU. Voters with similar socio-economic profiles who identify with the Conservative Party are more likely to vote Leave. This suggests the potential importance of other characteristics not in the data set, for instance psychological traits such as openness as well as attitudes towards national identity.
Individual-level regressors from the British Understanding Society survey containing the 2016 EU referendum question give similar results to corresponding aggregate variables at the level of local authority areas analyzed by Becker et al. (2017). We therefore find no evidence of ecological fallacy effects – individuals appear to behave in similar ways as suggested by the aggregate data.
We also shed light on the predictive power of different determinants of the Leave vote. Demographics and employment characteristics are the most relevant covariates for prediction, while the cumulative power of individual-level and aggregate variables shows a non-negligible gain over aggregate data alone. Geographical heterogeneity is also important as our model performs best in more prosperous areas (London in particular).
Finally, we also find that individuals who support the Labour Party but have otherwise observables that would put them in the Leave camp are significantly more likely to vote Remain. Vice versa, supporters of the Conservative Party with Remain-favouring characteristics are more likely to vote Leave.
We are grateful to the editor and two referees for constructive comments. We thank the Understanding Society team based at the Institute for Social and Economic Research at the University of Essex for access and guidance on the data. Support by the ESRC Centre for Competitive Advantage in the Global Economy (CAGE, ESRC grant ES/L011719/1) is gratefully acknowledged.
A. Data and regression results
In this appendix we present our data and empirical regression results in more detail.
A.1. Sampling design
Concerning the design and data collection of Understanding Society, the general population sample is a stratified, clustered, equal probability sample of residential addresses drawn to a uniform design throughout the whole of the UK. For each wave, the data collection is spread over a two-year period, and the overall sample is divided into 24 monthly subsamples, each independently representative of the UK population. Computer assisted personal interviewing (CAPI) was mainly used to collect the data.7
A.2. Constructing the sample
The construction of our sample takes place in various steps. Initially, the raw individual survey (wave 8) consists of 21,076 observations. Then, matching the household survey leaves 20,821 individuals. Further matching with local authority codes results in a sample of 17,697 respondents (i.e. over 3000 surveyed individuals get lost because there is no location code associated with their households). Finally, we merge this last sample with the aggregate information used in Becker et al. (2017). In this last step, the number of surveyed individuals is 15,844 across 377 local authorities.
When we consider the initial sample with 21,076 observations, 91% of the individuals provide an answer to the question concerning British EU membership. Among them, the share of those supporting Leave is 35.8%. Of the selected subsample with 15,844 units, 91.4% (14,476 individuals) disclose an answer for the outcome variable, and 42.6% turn out to be Leave supporters.8
As a final remark, we want to stress that our estimates come from the analysis of three specific subsamples of the 14,476 selected respondents. The main one contains 13,136 individuals. The sample with housing tenure status contains 6,425 individuals. The subsample on employment characteristics counts 8,434 individuals.
A.3. Regression results
We divide our variables into groupings as follows. The first group of explanatory variables includes basic demographic features such as sex, age, marital status, education and employment. The second group explores data on individuals’ use of health services. The third group captures information on housing (ownership vs. renting) drawn from the household questionnaire. The fourth group refers to employment. This is followed by a focus on unearned income and state benefits. The sixth group consists of life satisfaction indicators. The seventh and final group covers nationality and ethnicity.
The results are reported in Table A.1a, Table A.1b, Table A.1c, Table A.1d, Table A.1e, Table A.2, Table A.3, Table A.4, Table A.5, Table A.6, Table A.7, Table A.8, Table A.9, Table A.10. We present linear probability models as the default, with the exception of logit models in Table A.1b, probit models in Table A.1c and weighted OLS models in Table A.1d, Table A.1e.9
When variables are perfectly comparable at individual and aggregate levels, the first three columns of the tables directly compare those to address the potential ecological fallacy concern.
A.4. Demographics, technology, education and employment
In Table A.1a, Table A.1b, Table A.1c, Table A.1d, Table A.1e, Table A.2, Table A.3 we present results from regressions based on different types of demographic characteristics. Table A.1a, Table A.1b, Table A.1c, Table A.1d, Table A.1e explore the relationship of voting Leave with sex, age and technology use. Table A.1a presents our baseline results estimated with a linear probability model (OLS). Table A.1b, Table A.1c use the same explanatory variables but estimated with logistic and probit regressions, respectively, where we report marginal effects. Table A.1d reports weighted OLS regressions, with weights provided by Understanding Society. Table A.1e also displays weighted OLS regressions, but here we use artificial weights such that the proportion of Leave supporters in the sample matches the actual Brexit vote share. Overall, the coefficient signs and magnitudes are very similar across Table A.1a, Table A.1b, Table A.1c. They are also similar in comparison to Table A.1d, Table A.1e despite the weights and the reduced number of observations. We therefore focus our below discussion on Table A.1a.
Columns 1 to 3 of Table A.1a exhibit positive and significant coefficients for the old-age variables at both individual and aggregate levels, showing no evidence of ecological fallacy. Although the coefficient for the aggregate share of the elderly population is lower in magnitude, it presents a predictive power very similar to the individual counterpart. Column 4 indicates that males are 4.7% more likely to vote Leave. Compared to middle-aged respondents, the tendency to support Leave is substantially lower by 12.3% for younger cohorts up to the age of 30 and notably higher by 9.1% for individuals aged 60 or above. Columns 5 and 7 confirm these results in terms of significance even when we control for the share of the population aged 60 or above at the local authority level. In column 6 we focus on technology use. Individuals who do not use a smartphone are substantially more likely to vote Leave. Using the internet every day is associated with a substantially lower probability to vote Leave. These patterns persist even once we control for sex and age in column 7.
In Table A.2 we explore the predictive power of educational attainment. Again, variables on educational attainment relate to the referendum outcome in the same way and with similar matching power at both individual and aggregate levels although aggregate coefficients have lower magnitude and significance. Hence, highly qualified individuals with university and college degrees are considerably less likely to vote Leave by over 20% compared to people with average qualifications. In contrast, having no qualification is a very strong predictor of voting Leave. These results hold up once we control for aggregate characteristics on educational attainment in columns 3 and 5 as well as sex and age in column 6.
Next, in Table A.3 we analyze individuals’ current employment and marital status. At the individual level, comparison groups are predominantly retired and divorced respondents, respectively.10 Here, aggregate rates on employment are indistinguishable from zero (although they have the same predictive power as the individual variables, and self-employment and unemployment coefficients have the ‘correct’ sign). Column 1 of Table A.3 shows that self-employed and paid employees are more likely to support Remain (relative to mostly retired people). Column 4 shows that single and married people are significantly less likely to vote Leave (compared to divorcees, separated and widowed people). Again, most of these results hold up once we control for aggregate rates in column 3 as well as for age in column 5. Unemployment now also shows up as highly significant.11
To sum up our results on demographic variables, we find that individuals are more likely to support Leave if they are male, older, use less technology, are less qualified, retired or unemployed, and divorced, separated or widowed. These findings are consistent with the results by Becker et al. (2017) based on aggregate data who also find that age, low educational attainment and unemployment are key explanatory variables to predict the Leave vote shares across UK voting areas.
Table A.4 analyzes the relationship between Brexit support and individuals’ use of health services. Interestingly, columns 1 and 2 show that individuals who visit their general practitioner (GP) very frequently (over ten times in the previous 12 months) are more likely to support Leave. Those are arguably individuals of poor health or older generations. Conversely, those who did not visit the GP even once have a slightly higher probability to support Remain. Controlling for age in column 2 turns the latter result insignificant (possibly because it is young people who do not go to the doctor) but preserves the former result on frequent GP visits.
A similar picture emerges from columns 3 and 4, focusing on individuals who are never or extremely often classified as out-patients. The same holds for people admitted as in-patients at least once during the preceding 12 months. That is, people of poor health as proxied by frequent visits to the GP or hospital are substantially more likely to support Leave. Perhaps it is therefore no coincidence that a key pledge of the pro-Brexit referendum campaign was to invest more in the National Health Service (NHS).
When directly comparing individual tenure status to corresponding aggregate shares we see similar paths (columns 1 to 3), in particular with respect to direct ownership which is positively related to Leave support.
In terms of individual housing tenure, owning their own property tends to make individuals more likely to support Leave, although this particular association is barely statistically significant. The omitted category here is renting through a housing association. More importantly, higher property values are significantly related to an increased likelihood of supporting Remain. A one-standard deviation increase in property values increases the Remain likelihood by roughly 4%. Property values are arguably positively linked to individuals’ financial status, which would be consistent with earlier evidence on income based on aggregate data (see Becker et al., 2017).
This section shifts the focus towards employment-related determinants. For starters, Table A.6 indicates a higher probability of almost 10% to support Leave for individuals who did not work in the week prior to the questionnaire and who did not have a paid job compared to those respondents who were either working or had a paid job (stable across all specifications).
In Table A.7 we narrow our analysis to only those participants who worked or had a paid job. This reduces the number of observations to 8,434. First, columns 1 to 3 compare the individual sector of employment to the respective aggregate controls (manufacturing, construction, retail and finance as used in Becker et al., 2017). Estimates as well as their predictive power are aligned (although aggregate coefficients are lower in magnitude). Indeed, both specifications suggest that workers in the manufacturing, construction and retail industries are significantly more likely to support Leave. Note that individual estimates are fairly stable across all specifications.
In addition, it emerges from column 4 that those with a permanent job compared to those in non-permanent employment have a higher probability of supporting Leave. This result continues to hold qualitatively in column 5 after we control for individuals’ age, sex and education as well as the sectoral distribution and growth of employment at the aggregate level in column 6. This result appears surprising, but we note that the subsample in Table A.7 is highly unbalanced in the sense that 90% of the respondents have a permanent job. Still, 60% of individuals with permanent jobs support Remain versus 70% of those with temporary jobs. It also appears likely that the very young respondents, who are overwhelmingly in favor of Remain, are less likely to hold permanent jobs. Our age dummies in column 5 might not pick up these age patterns appropriately. Finally, self-employed respondents are also more likely to support Leave, even though this association is insignificant for most specifications in the table.
Overall, consistent with the aggregate results in Becker et al. (2017) our findings support the view that individuals are more willing to vote for Brexit if they work in sectors such as manufacturing that have arguably been hit relatively hard by trade openness and international competition (also see Colantone and Stanig, 2018). In addition, workers in manufacturing, construction and retail sectors have lower educational attainment on average while the opposite is true for workers in the financial sector.
A.8. Unearned income and state benefits
In Table A.8 we highlight the role of unearned income and state benefits. In column 1 we find that respondents who receive core benefits have a significantly raised probability of supporting Leave compared to those receiving none. These core benefits are broken down into their various components in column 2. In particular, recipients of income support are substantially more likely to be in favor of Leave (by 20%), whereas job seeker’s allowance, child benefit and universal credit do not matter.
Similar results hold for people receiving pensions. This particular finding is likely driven by the overwhelming share of older people amongst pension receivers (see section A.4). The same pattern holds for people on disability benefits, in line with our estimates on health service usage (see section A.5).
Finally, the opposite is true for respondents who receive other sources of income. Those are broken down in column 3. The key income streams are education grantsand student loans as well as payments from family members living elsewhere. This suggests a tight link with age and education (see section A.4).
In summary, the forms of income and benefits in Table A.8 are likely correlated with more fundamental characteristics such as age and health, as discussed in previous tables.
A.9. Life satisfaction
In Table A.9 we explore the potential link between Brexit support and indices of health, income and life satisfaction. When looking at overall life satisfaction only (columns 1 to 3), the individual coefficients suggest that dissatisfied people are significantly more likely to favor Leave while the aggregate estimate implies that a higher relative dispersion of well-being across voting areas, which can be interpreted as a measure of life satisfaction inequality, has positive predictive power for the Leave support. Success rates of prediction are very similar whichever level of variation is considered.
In addition, people dissatisfied with health and income have a higher probability of supporting Leave by 5.5% and 6.4%, respectively. Once again, we can relate these findings to those in Table A.4 on health and Table A.8 on income and benefits. Interestingly, people dissatisfied with their amount of leisure time are significantly more likely to support Remain by 6.3%. This may be linked to the fact that these respondents have on average higher levels of educational attainment and they are generally younger. Note that when these individual variables are considered (columns 4 and 5) the individual estimate of overall life satisfaction is absorbed and becomes insignificant.
A.10. Nationality and ethnicity
Table A.10 provides insights on the importance of individuals’ nationality and ethnicity in shaping their attitudes towards Brexit. Survey participants born in the UK as opposed to elsewhere have a significantly larger probability of supporting Leave by 12.4% (see column 1). It is useful to point out that in the sample, 90% of respondents are born in the UK, and 95% of them are white.
In terms of ethnic minorities compared to whites (see column 2), people of mixed ethnicity, Asians and black respondents all have a significantly larger probability of supporting Remain (in the range of 12%–23%). These results are in line with the preliminary work by Martin and Sobolewska (2017).
- 1 See Burn-Murdoch (2016) in the Financial Times as an example of various correlation plots; more in-depth work followed, for example Clarke and Whittaker (2016), Darvas (2016), Langella and Manning (2016).
- 2 Note that we sourced nationality and ethnicity variables also from earlier waves.
- 3 The aggregate variables in Table 1 are not standardized for descriptive purposes, but they are in all regressions.
- 4 In a fascinating paper, Colantone and Stanig (2018) focus on one specific causal factor behind the Leave vote: rising import competition from China. While papers studying causality are extremely important, they give prominence to one factor at a time, an aim different from ours which is to look at the relative predictive power of different variables.
- 5 One might wonder whether including region fixed effects above and beyond the individual-level and region-level predictors is beneficial in terms of prediction accuracy, but the benefits are very marginal in our case. Since region fixed effects are a ‘black box’, we refrain from including them given the very limited gains.
- 6 While we do not use machine-learning methods in this paper such as best subset selection (BSS) or LASSO, we did so in Becker et al. (2017), i.e. our selection of variables is guided by the (aggregate) variables employed in that earlier paper.
- 7 These details are taken from Understanding Society: Design Overview by Buck and McFall (2012). For further details refer to the Understanding Society User Guide (wave 1–6) by Knies (2016).
- 8 In unreported tables (available upon request), we compare the 14,476 individuals who answer the Brexit question to the 1,368 non-respondents for each group of covariates (i.e. all regressors in Table A.1a, Table A.1b, Table A.1c, Table A.1d, Table A.1e, Table A.2, Table A.3, Table A.4, Table A.5, Table A.6, Table A.7, Table A.8, Table A.9, Table A.10) and establish along which dimension the two groups are statistically different. If anything, non-respondents seem to display most of the characteristics of a typical Leave voter. More specifically, non-respondents are significantly older, less used to technology, with lower educational attainments and more frequently unemployed. In addition, they seek more medical attention, their housing status is more often local authority renting, and more of them receive income support. Finally, non-respondents are less often UK natives and more often members of an ethnic minority.
- 9 We would like to note that sampling weights in Understanding Society, which we use in Table A.1d are quite homogenous. In our main estimation sample, the median sampling weight is 0.956, the 25th percentile is 0.770 and the 75th percentile is 1.237. This explains why weighted and unweighted regression results are so similar. [USOC wave 8 data has a substantive number of observations with missing weights. This is due to the fact that it is a pre-release version. The final version of wave 8 is expected to be released towards the end of 2018 or in early 2019.] In Table A.1e, we mechanically re-weight the sample to align the share of Leave voters with the actual referendum result.
- 10 Excluded categories among current activity feature Retired (64.7%), Looking after family or home (10%), Full-time student (14.3%), Long-term sick or disabled (7.5%), Doing something else (2.2%). Excluded categories among marital status feature Divorced (57.4%), Separated (10.3%), Widowed (31.6%), Other (0.7%).
- 11 To get a sense of whether changes in (un)employment status matter, in unreported regressions, we used additional information based on a short employment history (looking at respondents participating in both wave 7 and the pre-release version of wave 8 with the EU question). The results suggest that the preferences for Remain and Leave are quite static or do not respond in a remarkable fashion to individuals switching employment status (by becoming unemployed or employed between wave 7 and wave 8). Rather, the first-order differences in tendencies to support Leave or Remain for our prediction exercise are driven by individuals who are employed or unemployed in both survey waves, implying that looking at only the cross-section is sufficient to capture the role of employment variables.