Some considerations on Italy’s Coronavirus Outbreak

Introduction

Covid 19 is a particularly aggressive Coronavirus strain originated in China in December 2019 and rapidly spreading around the globe in what is likely to be the deadliest pandemic of the century. One of the first European countries to be hit by Covid 19 is Italy. Despite the draconian measures in place, the country is having hard time coping with the containment of the disease.

Goal

This is a very experimental article based on scarce available data. Here we try to explore if the containment measures put in place by the Italian Government are effectively reducing the spread of Covid 19 in Italy.

Pre-Requisites

This article suitable to readers with an understanding of statistics and experimental setup, in particular a reasonable understanding curve fitting is helpful.

Method

Here we run a naive analysis of the available data under the assumption the disease has an exponential growth and the containment measures are pushing the spread towards a slower regime.

We model the spread as

$y = 2^{a + t / b}$

Whereby $b$ is the time for the process to double its impact (i.e. infect twice as many people) and $a$ is a time lag coefficient to accommodate for the missing information at the very beginning of the process.

We also assume the noise in the data can be safely approximated with a Normal distribution (not validated)

import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt

# DATA
infections = [20, 76, 153, 229, 322, 400, 650, 888, 1128, 1694, 2036, 2502, 3089, 3858, 4636, 5883, 7357,
         9172, 10149, 12462, 15113, 17660, 21157, 24747, 27980, 31506, 35713, 41035, 47021, 53578, 59138]
fatalities =  [0, 0, 0, 7, 10, 12, 17, 21, 29, 34, 52, 79, 107, 148, 197, 233, 366, 463, 631, 827, 1016, 1266,
            1441, 1809, 2158, 2503, 2978, 3405, 4032, 4825, 5476] 
day = range(1, len(cases)+1, 1)

# Support functions
def r_squared(y, y_fit):
    # residual sum of squares
    ss_res = np.sum((y - y_fit) ** 2)
    # total sum of squares
    ss_tot = np.sum((y - np.mean(y)) ** 2)
    # r-squared
    r2 = 1 - (ss_res / ss_tot)
    return (r2)    

def test_func(t, a, b):
    """ 
    A very simple exponential model of base 2 for easier interpretation of the coefficients
    t: Time in days
    a: Lag coefficient
    b: Doubling time
    return: The expected infections under exponential regime given the day and parameters 
    """
    return 2**(a + t/ b)

def go(day, y, n, title, ylim):
    """ 
    A quite cumbersome function to generte fits and plots at once (Not an example of good practies)
    day: The time in days
    y: the observations e.g. Infections, Fatalities etc.
    n: the last day we want to use for the fit
    title: A template for teh graph title
    ylim: A limit for the y axis of the plot, so that we can keep them on the same scale. 
    """
    params, params_covariance = optimize.curve_fit(test_func, day[1:n], y[1:n], p0=[-3, 2])
    y_fit = test_func(day[1:n], params[0], params[1])
    r2 = r_squared(y[1:n], y_fit)
    plt.figure(figsize=(6, 4))
    fig = plt.scatter(day, y, label='Data')
    plt.plot(day, test_func(day, params[0], params[1]), label='Fit:2^(a+t/b)')
    plt.legend(loc='best')
    plt.title(title.format(n,r2,params[1]))
    plt.ylim(ylim)
    plt.show()

Fitting the infections data at 3 points in time

The dataset with the highest volume, thus more likely to have less noisy data is provided by the infections data. Here we have 31 datapoints samplead with a frequency of 1 day and spanning values from 20 to about 60k.

We try to fit the data at 3 point in time to see if i) the model is able to produce a reasonable fit and ii) if the data extrapolate behind the model of we can see a reasonable slowdown of the process.

# Now we try to fit the infections at different time intervals from teh onset of the measures on.
ylim = -5000,150000
n=18
title ='Cases up to day {0} (Conte''s lockdown)\nr2:{1:.3f}, doubling every {2:.2f} days'
go(day, infections, n, title, ylim)

n=23
title ='Cases up to day {0}\nr2:{1:.3f}, doubling every {2:.2f} days'
go(day, infections, n, title, ylim)

n=28
title ='Cases up to day {0}\nr2:{1:.3f}, doubling every {2:.2f} days'
go(day, infections, n, title, ylim)

Fitting the fatality data at 3 points in time

Now we can try to fit the fatalities data at the same thre points in time. In this case the fit is harder as (fortunately) the volumes are lower, thus more noisy. Also there is a clear time lag between the onset of the measures and their effect in reducing the fatality rate.

Again we try to see if an exponential model is a good fit for the process and if we can observe some deviation in the number of fatalities on the evolution of the of the pandemic after containment measures.

# Now we try to fit the infections at different time intervals from teh onset of the measures on.
ylim = -1000,20000

n=18
title ='Fatalities up to day {0} (Conte''s lockdown)\nr2:{1:.3f}, doubling every {2:.2f} days'
go(day, fatalities, n, title, ylim)

n=23
title ='Fatalities up to day {0}\nr2:{1:.3f}, doubling every {2:.2f} days'
go(day, fatalities, n, title, ylim)

n=28
title ='Fatalities up to day {0}\nr2:{1:.3f}, dubling every {2:.2f} days'
go(day, fatalities, n, title, ylim)

Results

Infection data

We can observe the fit on the initial data is quite accurate with a r-squared coefficient above 0.99. The initial doubling time is about 3 days, in-line with the data and meaningful in terms of the physical meaning of the coefficient in the process. If we compare the extrapolation with the observations out of sample, we can also observe this model is also overestimating the expected number of cases.

It is interesting to note that as we add more data, the doubling time grows, meaning that the process is slowing down and we also see a slight reduction in r-squared (not significant in an exponential process with so few data).

We can interpret the result as a sensible deviation from a full powered exponential growth we had at the beginning of the spread and hopefully a positive sign that the containment measures are being effective in containing the spread of the pandemic.

Fatalities data

We can observe the fit on the initial data is quite accurate with a r-squared coefficient above 0.99, still a bit less precise than in the infections dataset. The initial doubling time is about 2.3 days, in-line with the data and meaningful in terms of the physical interpretation of the coefficient in the process. If we compare the extrapolation with the observations out of sample, we can also observe this model is also overestimating the expected number of cases.

The process is still faster than the infections and the slow down of the process is not as as pronounced as for the infections. HOwever, we can observe a sensible reduction in the doubling parameter and we can also speculate the evolution of the fatality rates lags at least 5 days behind the evolution of the infections.

Discussion

From this very simplistic still surprisingly fitting analysis we can conclude that, according with the hypothesis the pandemic in Italy is slowing down after the Prime Minister Conte put the country under severe containment measures, thus the measures are effective. On the other hand, the coefficient is far from approaching a zero growth and a regression of the pandemic. Therefore, we hope there will be an increase in the containment measures and a strict observation of social distancing as to further hamper the spread of the plague and a faster return to normality for millions of people.

Are you ready to take smarter decision?

Otherwise you can always drop a comment…