COVID-19 Data

COVID 19 cases data from Johns Hopkins, augmented and reformtted

jhu.edu-covid19-2.4.4. Modified 2020-04-02T22:23:50

Resources | Packages | Documentation| Contacts| References| Data Dictionary

Resources

  • confirmed. Confirmed cases by date and country
  • deaths. Deaths cases by date and country
  • recovered. Recoveries cases by date and country

Documentation

This dataset processed and augments the COVID-19 data provided by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). The source data is checked into Github daily. and is collected from a variety of sources.

This dataset reformats the data into tidy format, with dates expressed as values instead of column headings, and adds several fields that are useful for analysis.

The ‘rate_t5d’ column is the growth date from 5 days before the observation to the observation. For example, for a row with a current observation of value x_5, and a past observation of x_0, the rate_t5d is calculated as e^((log(x_5)-log(x_0)) / 5)-1 . The result is that x_5 = x_0 * (1+rate_t5d)^5, and rate_t5d from the previous 5 days

Caveats

  • China’s minimum cases in the dataset is 548, so it’s value for days sincle 100 cases is shifted by 6 days. It’s just a guess, but it looks good.
  • Countries that haven’t reached 100 cases yet will have a days since 100 cases value that is always negative.

Contacts

Packages

Accessing Packages in Metapack

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/jhu.edu-covid19-2.4.4.zip')

# Create Dataframes

confirmed_df = pkg.resource('confirmed').dataframe()
deaths_df = pkg.resource('deaths').dataframe()
recovered_df = pkg.resource('recovered').dataframe()

Data Dictionary

confirmed | deaths | recovered

confirmed

Column NameData TypeDescription
countrystringCountry
provincestringProvince, state, country or other region
locationstringCombination of county and province
datedatetimeDate of observation
confirmedintegerCumulative number of confirmed positives
date_10datetimeEarliest date at which there were more than 10 cases
days_10integerNumber of days since the earliest date of 10 cases
date_100datetimeEarliest date at which there were more than 100 cases
days_100integerNumber of days since the earliest date of 100 cases
rate_t5dnumberGrowth rate, averaged over the following 5 days.
confirmed_lognumberLog of the number of confirmed positive cases

deaths

Column NameData TypeDescription
countrystringCountry
provincestringProvince, state, country or other region
locationstringCombination of county and province
datedatetimeDate of observation
deathintegerCumulative number of deaths
date_10datetimeEarliest date at which there were more than 10 cases
days_10integerNumber of days since the earliest date of 10 cases
date_100stringEarliest date at which there were more than 100 cases
days_100integerNumber of days since the earliest date of 100 cases
rate_t5dnumberGrowth rate, averaged over the following 5 days.
death_lognumberLog of the number of deaths

recovered

Column NameData TypeDescription
countrystringCountry
provincestringProvince, state, country or other region
locationstringCombination of county and province
datedatetimeDate of observation
recoveredintegerCumulative number of recoveries
date_10datetimeEarliest date at which there were more than 10 cases
days_10integerNumber of days since the earliest date of 10 cases
date_100datetimeEarliest date at which there were more than 100 cases
days_100integerNumber of days since the earliest date of 100 cases
rate_t5dnumberGrowth rate, averaged over the following 5 days.
recovered_lognumberLog of the number of recoveries

References

Urls used in the creation of this data package.

Packages

Accessing Packages in Metapack

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/jhu.edu-covid19-2.4.4.zip')

# Create Dataframes

confirmed_df = pkg.resource('confirmed').dataframe()
deaths_df = pkg.resource('deaths').dataframe()
recovered_df = pkg.resource('recovered').dataframe()