Will summer save us? What can we learn from international COVID-19 league tables

Nicholas Cooper
16 min readMay 26, 2020

I have been morbidly monitoring ‘worldometer’ coronavirus statistics since the start of the Italian outbreak like a daily weather report. By April, I couldn’t help but notice a stark divide between the northern and southern hemispheres in the number of COVID-19 deaths. It is hard not to be drawn to the idea that influenza has always been seasonal, and that the virus we are currently battling is just a particularly nasty variant. If that is true, perhaps we can hope that summer will coincide with a hiatus from COVID-19 for Europe and the United States. Australia and New Zealand have suffered very few deaths so far and this could be because the Coronavirus appeared at the tail end of summer which made containment a lot easier.

Given that it’s always easy to spot trends that confirm your personal biases I undertook a statistical analysis of the world-o-meter data to make sure I wasn’t just cherry picking some convenient examples and to also try to resolve the presence of counter examples for warm countries like Mexico and Brazil. This turned out to be a lot bigger task than I initially imagined, which required digging up more data to be able to address the question with sufficient context. In the end this task has comprised a broad establishment of the trends for differing death rates between countries, qualification of the drivers of death rate during lockdown, alongside an analysis of the drivers of flu seasonality. This led me to a prediction of what our coming summer will entail.

Seasonality of influenza

To set the scene, there are several plausible reasons why colds and flu are seasonal. I’ll list these briefly:

Vitamin D

The most talked about factor is Vitamin D which we absorb most easily through UV-B rays on days with a UV index above 3, or through eating fatty fish or certain mushrooms. There is a proven link between Vitamin D and immune system function. In a meta analysis of 11,321 people, supplementation reduced the chance of respiratory infections by 70% versus placebo, for those with Vitamin D deficiency, while general supplementation regardless of baseline levels provided a 12% reduction. Increased melatonin levels are associated with poorer absorption of Vitamin D, so could be a plausible contributing factor to higher levels of COVID-19 death within ethnic populations in the UK.

Crowding and air recirculation

In winter we spend more time indoors and the same air is more like to be recycled within relatively small spaces. The risk of contracting COVID-19 outdoors while observing social distancing is tiny, likely less than 1 in 1000 cases have been contracted outdoors. Large transmission events have included choir practices, restaurants and birthday parties, and it seems like the majority of COVID-19 transmission is through superspreader events.

Ideal conditions for virus survival

Common influenza can survive longer outside the body in cold dry weather. Optimal conditions for COVID-19 are likely close to 5 degrees Celsius within a humidity range of 0.6–1.0 kPa (dry), consistent with European temperatures as the outbreak hit. UV light is also being used for sterilisation against the virus but this only refers to dangerous UV-C light which is filtered out by the atmosphere and so is not related to seasonality. Notably these trends for viral abundance apply only to moderate latitudes. Viral epidemics in tropical and equatorial regions tend to correlate with high rainfall, while Influenza risk is similar all year round.

Seasonality of our immune system

You may have heard of circadian rhythms, which involve your sleep/wake cycle and hormonal changes with a 24-hour cycle. It is also believed that we have an equivalent yearly cycle stimulated by changes in daylight, akin to deciduous trees or hibernating animals. One theory of seasonal infection is that viruses are always present to some degree and that what changes is the effectiveness of our immune systems at keeping them at bay. Anyone who looks after plants or fish will know that if they become unhealthy, they will find some virus or fungus to catch — which didn’t suddenly materialise after visiting a bar with strange foliage.

The evolutionary origins of this fluctuation could have been to save energy during winter by down-regulating metabolism and immune function. I was involved in the analysis of longitudinal gene expression dataset showing that thousands of genes up- and down-regulate in an annual cycle, including key immune system components. The cycle is plotted below for large numbers of genes and you can see that the Australian cohort nicely inverts the trend of the Northern Hemisphere groups. Areas of extreme climate like Finland show a less distinct pattern because much of the year is so inhospitable that people experience an artificial ‘indoor’ season.

Figure 1: Key graphs of clustered gene expression from the seasonality paper in Nature Communications, showing the rise and fall of summer versus winter genes (c) and the difference between northern and southern hemispheres (d).

Other theories

It has been speculated that viruses could be spread long distances on wind currents but I don’t think this is seriously entertained any more. There have also been simulations using dynamics of normal social interaction and travel that could potentially introduce a seasonal pattern of outbreaks. Lastly, cold air can be abrasive to the throat. Localised inflammation due to extended breathing outside especially while exercising, may provide a breeding ground and entry point for respiratory infections.

A final note on influenza versus COVID-19, structurally it is over 99% different to SARS-Cov-2. But they are both cause respiratory symptoms and the viruses survive outside the body in similar climatic conditions. In studies of winter illness such as the meta-analysis quoted above, usually all colds and flus get rolled into one, given their similar symptoms as patients won’t often be lab tested. Given I am proposing that the effectiveness of the immune system is what changes seasonally, then to some extent it’s not crucially important which virus we’re defending against.

Comparing countries on the global league table

Making comparisons using the global table of deaths is a lot more nuanced than meets the eye. Everyone who has done a statistics course has heard the expression ‘correlation is not causation’. Examining differences between countries comes with a host of confounding variables, some of which (for my attempt to do this quickly in my spare time) could not be entered into, such as variation in death classification methodology, variations in national health systems, and comprehensiveness of record keeping between countries. Despite these challenges, I aimed to collect as much relevant data on potentially confounding variables for each country as possible, which included:

National data hypothesised to be correlated with viral spread

National data hypothesised to be correlated with viral seasonality

  • Northern Hemisphere located (yes/no)
  • Within the tropical/equatorial latitude range (yes/no)
  • Distance from the ‘prime latitude risk range’ of 30 to 50 degrees north
  • Fish and seafood consumption
  • Number of months since November, providing a good opportunity for Vitamin D absorption from sunlight (taken from a climatic averages database)
  • Vitamin D ‘score’ — without being able to directly measure vitamin D deficiency, I created a standardised score using the sum of scaled and weighted values from the prime latitude, seafood consumption, Northern Hemisphere location, and tropical/equatorial location.

NB: All distance metrics were calculated from latitudes/longitudes using the sp package.

Measuring COVID-19 national mortality

I considered three alternative dependent variables to encapsulate the extent of the problem within each nation. Each has its own strengths and weaknesses so my philosophy was to temper the interpretation by concentrating on analyses showing consistency between all three. All measures are scaled by population size.

Statistical Results

[NB: If you’re not into numbers and graphs, you might like to skip ahead to the interpretation]

Table 1 below shows the raw correlations between all of these variables and death rates, with yellow shading representing P values below 0.05 and orange shading showing P values significant after a correction for multiple comparisons. There are associations with nearly all of these predictors showing the complexity of the differences between countries. Indeed, there does seem to be some substance to correlations with latitude and Vitamin D. Furthermore, intuitively obvious variables like lockdown dates, globalisation, older populations and density also occur in expected directions. These tabulated correlations are Pearson R values, where larger positive correlations mean an increase in ‘row’ is associated with an increase in ‘column’, or negative correlations mean an increase in ‘row’ is associated with a decrease in ‘column’. The social globalisation index was the strongest raw correlate for peak death rate, percentage over 65 years was the strongest correlation for days above 1 death per million and prime latitude (30–50 degrees North) distance was the strongest correlate of total deaths.

This initial analysis sets the scene, allows discarding of any unrelated measures and selecting of the best representative amongst similar measures. But in order to make a proper comparison multiple linear regression will be used because it is able to assess relative contributions of each measure while controlling for all of the others. Correlations within these ‘predictor’ variables cause biases and problems, but by using iterative variable selection methods the aim is to try to minimise the confounding and try to establish some good representative models.

Table 1: Correlations between potential predictors (rows) and measures of COVID-19 deaths (columns). Orange = statistically significant P < (0.05 ÷ 57).

Multiple variable modelling

Rather than fill the page with regression coefficients and p-values, I’ll summarise the main trends uncovered through stepwise regression, show one representative model, and if anyone wants to see the remaining raw statistics I’m happy to email them (or the raw data if you’re that nerdy).

This model shows that the strongest predictor of total death rate, when looking at all variables combined, is the KOFG ‘social global index’. This variable encompasses individual travel, tourism, trade, and cultural globalisation. I’d included this measure because I felt it would be a good proxy for the connectedness between and within countries.

The high correlation supports the obvious consensus that social distancing and travel restrictions are effective ways to contain the virus, as these effectively reduce the concepts captured in this KOFG score.

Two other statistically significant key predictors which are very intuitively obvious are:

  1. The percentage of the population of each country living in urban areas.
  2. The number of days between the first case and lockdown within each country (if lockdown occurred at all), although notably correcting for multiple comparisons would occlude this effect.

Interestingly Belgium, with the highest death rate in Europe scores very high on globalisation as the EU headquarters and has one of the highest urban percentages globally. Belgian and English tourists are also the largest groups for ski tourism in Italy each winter.

Population density and percentage of the population over 65 were significant in some models, but in general became non-significant in the presence of stronger predictors.

The overall multiple R squared statistic of 53.6% for the model, shows that these measures can explain more than half of the variance (a statistical measure of the differences between countries death rates), which given the imprecision of some of the data and many hard-to-measure factors, is a quite impressive statistic.

Samples: 200 countries that had most of this data (missing replaced with mean)

Dependent Variable: totalDeaths

Predictors: VenetoDistance, VitaminDScore, lockDownMinusFirst, socialGlobalIndex, UrbanPercent

Distance from the Italian Epicentre trumps seasonal predictors

The most striking effect of combining these variables rather than analysing separately is that the ‘distance from Veneto’ when combined with the measures of latitude, hemisphere and Vitamin D, tended to overpower those variables in its association. When ‘Veneto’ was removed, the Vitamin D variable was a strong predictor of death alongside globalisation and density measures. This renders the final model less interpretable because when predictors in a regression model are correlated, they can skew one another (‘multicollinearity’ problem). In fact, it can be seen in Table 2 below that our seasonal variables all have very significant correlations with the distance of each nation from the Italian outbreak of the epidemic. In Figure 2 it is clear that the bottom right corner (high death rate, close to Italy) has mostly red countries indicating Vitamin D deficiency and the top left (low death rate, far from Italy) has mostly blue countries indicating likely Vitamin D sufficiency. So, it is hard to distinguish whether the real driver is more strongly proximity based or seasonally based.

Figure 2: Distance from North Italy versus Total death rate, where the country names are more RED where Vitamin D deficiency is likely to be more common. It is clear there is a strong correspondence between the distance measure and Vitamin D score.
Table 2: Confounding variables: Inter-correlations between seasonal predictors (rows) and distance / latitude variables (columns). Orange = statistically significant P < 0.0001.

Due to this confounding, I felt the questions of seasonal effects were not fully addressed by this analysis, so to provide further qualification, I sought to run a similar analysis on US states. The state statistics are easily obtainable and differences in policy, culture and geography should be less extreme than trying to pool every country in the world.

Analysis of US States

Variables chosen for analysis of US States for viral spread, seasonality and death rate

Table 3: Correlations between predictors (rows) and total deaths. Orange= significant P < 0.01. Right: image source.

Examining the data from the United States showed a slight trend of more Vitamin D exposure being associated with less deaths if weighted by state population, but not in the overall analysis in the table above. Once again, the strongest predictors were the distance from the outbreak epicentre and the equivalent measure of travel and connectedness (state GDP). Vitamin D exposure opportunity was not related to the total deaths within the model. Once again, there is a serious geographical confound because there is a 0.45 correlation between NYC distance and Vitamin D days. There is a further confound due to a moderate negative correlation between Vitamin D days and GDP (southern states are poorer). This confound is demonstrated in Figure 3 which is the equivalent United States version, for comparison with the international version in Figure 2. The bottom right quadrant (high death rate, close to New York) has mostly red cities indicating less days exposed to Vitamin D, while the top left quadrant (low death rate, far from New York) has more blue cities indicating more days of exposure to Vitamin D.

Figure 3: Distance from New York versus Total death rate, where the state names are more RED where Vitamin D deficiency is likely to be more common. It is clear there is a strong corresponsondence between the distance measure and Vitamin D score.

Samples: 50 states (missing observations replaced with mean)

Dependent Variable: totalDeaths

Predictors: nycDistance, VitaminDDays, GDPperState

Interpreting nation versus nation and state versus state analyses

After all of this analysis I still feel a little unrequited given the three-way entanglement of measurement for virus epicentre locations, latitude and the social/financial/global connectedness of the world. However, replication of the same trend within both datasets has led me to down-weight the impact of Vitamin D from my initial expectations. While it would be much better to have directly measured Vitamin D, this was not possible.

In terms of seasonality, there is some support for the alternative seasonal hypothesis that latitude-based associations might be the real driver, because correlations were stronger with latitude based measures than measures based on weather data. This would lend weight to the seasonal immunity hypothesis, which is driven by the annual light and warmth cycle, rather than UV-B exposure. So, the main insights supported by this analysis are:

  • Proximity, connectedness and social factors trump seasonal/climatic effects for COVID-19
  • Seasonal viral patterns may be driven more by cyclical immune system changes than by Vitamin D
  • Countries and states with high death rates are global and social hubs, in high risk latitudes, suggesting apportioning blame or comparison to other regions without accounting for these factors may be unfounded

Linking these findings with what we know about seasonal influenza

The strains of seasonal influenza are different every year with deaths in the UK ranging from the low thousands to numbers nearer 30,000 depending on how bad the strains in circulation are. It can almost always be characterised by a peak between December and March, with six months of regular weekly deaths, and six months of relatively little infection. The quoted R0 for the regular flu is 1.3 in contrast with the substantially higher R0 for SARS-CoV-2 which is thought to be higher than 3.

In order to assess what reduction in R0 for the warmer months would be required to generate this cyclical peak for the regular flu, I used this epidemic simulator to create a curve using the known characteristics of influenza, to closely match the UK data for the 2011 flu season, which was a bad year but not extreme. A reduction of 50% of the R0 to 0.65 was sufficient to generate the seasonal pattern, suggesting that the driver of flu seasonality doesn’t need to eliminate the virus, it just needs to reduce contagion by half. Based on the meta-analysis reported above, population Vitamin D supplementation reduces illness occurrence by around 12%, which is not quite enough to explain the drop. To me this suggests that other factors, which are likely a mix of ideal viral conditions, seasonal immunity, more time spent indoors, and possibly herd immunity, must also contribute the reduction of virulence that nearly eliminates Influenza every summer.

Figure 4: Left: 1999–2017 England and Wales weekly death rate combined with flu infections presenting to primary care; source. Right: My attempt to model the R0 change to drive the 2011 flu season.

What can be expected for the summer ahead?

Lockdown and social distancing strategies seem to have reduced the R0 in the UK by around 80%. London and the south of England are faring slightly better than the north. However, the northeast does have less sunshine, more essential workers per head of population, and is typically worse affected by seasonal influenza than the south west.

Figure 5: Excess winter deaths by UK region. Source.

Given the outbreak occurred at a similar time to peak flu season we can assume that the R0 of around 3 is the maximum for this strain. If the historical seasonal factors provide similar protection against SARS-CoV-2, then we could imply that the summer R0 for uncontrolled COVID-19 would be around 1.50. This is still more virulent than a moderately bad flu, so this level of transmission would be out of control. It might suggest that whatever distancing measures we aim to take in summer, only need to be half as effective as those in winter to contain the virus.

We have had an atypically warm period of weather since lockdown and due to the ‘one exercise per day’ rule we have had an opportunity to replenish Vitamin D, or perhaps kickstart our summer immune system. Could some of our success in beating the virus be attributed to the weather rather than our social distancing? Even if so, it is likely that the majority of this 80% drop was due to lockdown and it is also likely that we can hope for additional benefit from seasonal factors.

The current situation in the UK

Daily deaths are still in the hundreds, but are definitely on the way down, and we must always remember today’s deaths reflect the situation several weeks ago. You may also be surprised on the latest figures of how many people in the UK have already had the virus (many without knowing). Estimates suggest that 10% to 25% of the whole country have already been infected despite lockdown and social distancing. About 8% of people get the flu each year, but vaccinations against the flu are common and the regular flu is less contagious than COVID-19. Herd immunity requires a much greater proportion of the population to be immune to be effective, but if the 25% figure is true, this can help a little. It does seem despite initial fears and some anecdotal cases, that having COVID-19 does provide immunity against reinfection.

Containing the spread perfectly is impossible because essential services must continue to operate and these workers cannot help but spread tiny viral particles. Additionally, people must continue to visit supermarkets and pharmacies. The issue of civil disobedience of lockdown rules is probably a smaller issue than these former. Large percentages of essential workers (who make up 22% of the workforce) have been infected and some estimates say upwards of one third of NHS staff test positive for COVID-19, mostly asymptomatically. Large proportions of supermarket workers test positive, as do half of the soles of the shoes of NHS workers.

There have also been some key updates on infection rate and mortality risk within age groups and the age differential is even more stark than first thought. Imperial College had the overall death rate as 5.1% for people in their 70s and 9.3% for over 80s. The latest analysis puts the death rate for the over-75s even higher at 16%, but has reduced estimates for all other ages: 1.8% for 64–75 year olds, 0.28% for 45–64 year olds, and 0.018% for 25–44 year olds. The infection rates are also much higher in the less at-risk age groups, which may relate to factors like viral load, greater general mobility, and could also reflect less caution with social distancing due to the lower perceived personal risk. Given such a low percentage of over 75’s have been infected, this population is still extremely vulnerable.

Figure 6: Source: telegraph.co.uk

Conclusions

After all of that data trawling, the result is fairly consistent with circulating expert and government advice. COVID-19 is unlikely to remain contained this summer unless we maintain some level of social distancing.

Based on the progress we’ve made so far and the leg-up we should get from seasonal factors, I hope we can relax restrictions considerably for low risk age-groups during the peak of summer. Meanwhile taking care to continue careful sheltering and management of older and at-risk populations. It would likely be just a brief break before things once again ramp up at the approach of winter, when stronger restrictions will almost certainly need re-instigating. During this time, we will probably need to continue building surge capacity for beds, nurses, respirators, PPE, tracing and testing and prepare for the coming winter. Whether or not the government is willing to temporarily relax core lockdown restrictions is yet to be seen, but I believe this would be a fairly plausible scenario.

Meanwhile in the Southern Hemisphere countries like Australia and New Zealand are set to capitalise on their remoteness and should be able to maintain relative normalcy by border quarantine alone. Countries in South America are only likely to worsen, as the southernmost will be heading into winter and the equatorial epidemic will likely be unaffected by the seasons.

--

--

Nicholas Cooper

PhD in Statistical Genetics from the University of Cambridge. B Sc Psychology/Mathematics University of Sydney.