COVID-19 Data

COVID 19 cases data from Johns Hopkins, augmented and reformtted

jhu.edu-covid19-2.4.38. Modified 2020-07-08T04:59:37

Resources

• confirmed. Confirmed Non-US cases by date and country
• deaths. Non-US Death cases by date and country
• recovered. Non-US recoveries cases by date and country
• confirmed_us. Confirmed US cases by date and country
• deaths_us. US Death cases by date and country

Documentation

This dataset processed and augments the COVID-19 data provided by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). The source data is checked into Github daily. and is collected from a variety of sources.

This dataset reformats the data into tidy format, with dates expressed as values instead of column headings, and adds several fields that are useful for analysis.

The ‘rate_t5d’ column is the growth date from 5 days before the observation to the observation. For example, for a row with a current observation of value x_5, and a past observation of x_0, the rate_t5d is calculated as e^((log(x_5)-log(x_0)) / 5)-1 . The result is that x_5 = x_0 * (1+rate_t5d)^5, and rate_t5d from the previous 5 days

Caveats

• China’s minimum cases in the dataset is 548, so it’s value for days sincle 100 cases is shifted by 6 days. It’s just a guess, but it looks good.
• Countries that haven’t reached 100 cases yet will have a days since 100 cases value that is always negative.

Data Dictionary

confirmed | deaths | recovered | confirmed_us | deaths_us

confirmed

Column NameData TypeDescription
countrystringCountry
provincestringProvince, state, country or other region
locationstringCombination of county and province
datedatetimeDate of observation
confirmedintegerCumulative number of confirmed positives
date_10datetimeEarliest date at which there were more than 10 cases
days_10integerNumber of days since the earliest date of 10 cases
date_100datetimeEarliest date at which there were more than 100 cases
days_100integerNumber of days since the earliest date of 100 cases
rate_t5dnumberGrowth rate, averaged over the following 5 days.
confirmed_lognumberLog of the number of confirmed positive cases

deaths

Column NameData TypeDescription
countrystringCountry
provincestringProvince, state, country or other region
locationstringCombination of county and province
datedatetimeDate of observation
deathintegerCumulative number of deaths
date_10datetimeEarliest date at which there were more than 10 cases
days_10integerNumber of days since the earliest date of 10 cases
date_100stringEarliest date at which there were more than 100 cases
days_100integerNumber of days since the earliest date of 100 cases
rate_t5dnumberGrowth rate, averaged over the following 5 days.
death_lognumberLog of the number of deaths

recovered

Column NameData TypeDescription
countrystringCountry
provincestringProvince, state, country or other region
locationstringCombination of county and province
datedatetimeDate of observation
recoveredintegerCumulative number of recoveries
date_10datetimeEarliest date at which there were more than 10 cases
days_10integerNumber of days since the earliest date of 10 cases
date_100datetimeEarliest date at which there were more than 100 cases
days_100integerNumber of days since the earliest date of 100 cases
rate_t5dnumberGrowth rate, averaged over the following 5 days.
recovered_lognumberLog of the number of recoveries

confirmed_us

Column NameData TypeDescription
uidinteger
datedatetime
confirmedinteger
date_10datetime
days_10integer
date_100string
days_100integer
rate_t5dnumber
confirmed_lognumber
iso2string
iso3string
code3integer
fipsinteger
provincestring
countrystring
latnumber
longnumber
locationstring
populationinteger

deaths_us

Column NameData TypeDescription
uidinteger
datedatetime
deathinteger
date_10datetime
days_10integer
date_100string
days_100integer
rate_t5dnumber
death_lognumber
iso2string
iso3string
code3integer
fipsinteger
provincestring
countrystring
latnumber
longnumber
locationstring
populationinteger

References

Urls used in the creation of this data package.

Packages

Accessing Data in Vanilla Pandas

import pandas as pd

deaths_us_df =  pd.read_csv('http://library.metatab.org/jhu.edu-covid19-2.4.38/data/deaths_us.csv')

Accessing Package in Metapack

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/jhu.edu-covid19-2.4.38.csv')

# Create Dataframes
confirmed_df = pkg.resource('confirmed').dataframe()
deaths_df = pkg.resource('deaths').dataframe()
recovered_df = pkg.resource('recovered').dataframe()
confirmed_us_df = pkg.resource('confirmed_us').dataframe()
deaths_us_df = pkg.resource('deaths_us').dataframe()