Water quality data for San Diego county beaches from CEDEN, with added features for log transformation, quantiles and group codes.
sandiegodata.org-beachwatch-4
Resources | Packages | Documentation| Contacts| References
Resources
- stations. Measurement stations
- measure_codes. Measurement group codes
- beachwatch. Beachwatch data, with features added
Documentation
This datasets rebuilds ceden.waterboards.ca.gov-beachwatch-sandiego with
constant and null columns removed and many features added. It also breaks out
station information into a seperate datasets, and enumerates the many
difference combinations of methodname/analyte/unit, adding a code for each
group to the dataset in measure_code. The measure code identifies sets of
records that have compatible measurements.
The dataset adds counts, mean, median and quantiles for groups of
station_code/measure_code. The dataset rows are grouped, for each station and
measure code, and mean, median and quantiles computed for each group. The procedure is performed both for result and for lresult, the log of results. 
After computing the group summary statistics, the processing creates
dichotomous features for the relationship of result and lresult to the
summary value, including:
- Greater than the median
- Greater than the mean
- Less than or equal to the 25th percentile
- Greater than or equal to the 7th percentile
These variables are particularly useful for doing logistic regressions across the measure code groups or stations.
Elided Columns
This datasets excludes the const and empty columns from the source dataset. These columns and their values are:
program                      BeachWatch
parentproject                BeachWatch_San Diego County
project                      BeachWatch_San Diego County
locationcode                 SurfZone
collectiondepth              -88
unitcollectiondepth          NR
sampletypecode               Grab
collectionreplicate          1
resultsreplicate             1
labsampleid                  Not Recorded
matrixname                   samplewater
mdl                          -88
rl                           -88
batchverification            NR
compliancecode               NR
eventcode                    WQ
protocolcode                 Not Recorded
collectionmethodname         Water_Grab
collectiondevicedescription  Not Recorded
calibrationdate              0000-00-00
positionwatercolumn          Not Recorded
preppreservationname         Not Recorded
preppreservationdate         0000-00-00 00:00:00
digestextractmethod          Not Recorded
digestextractdate            0000-00-00
analysisdate                 0000-00-00
dilutionfactor               -88
expectedvalue                0
submissioncode               NR
county                       San Diego
county_fips                  73
regional_board               San Diego
rb_number                    9
sampleid                     Not Recorded
The dataset also excludes these Null columns:
- observation
- samplecomments
- collectioncomments
- resultscomments
- batchcomments
- groupsamples
- occupationmethod
- startingbank
- distancefrombank
- unitdistancefrombank
- streamwidth
- unitstreamwidth
- stationwaterdepth
- unitstationwaterdepth
- hydromod
- hydromodloc
- locationdetailwqcomments
- channelwidth
- upstreamlength
- downstreamlength
- totalreach
- locationdetailbacomments
- huc8
- huc8_number
- huc10
- huc10_number
- huc12
- huc12_number
- waterbody_type
Notes
The most prevalent measure code in this dataset is 24 for Enterococcus (analyte) meaured with Enterolert (methodname) in units of MPN/100 mL. This is probably because in 2004, the EPA changed recomendations to use Enterococcus as a primary indicator bacteria in coastal waters:
EPA subsequently recommended the use of E. coli or enterococci for fresh
recreational waters and enterococci for marine recreational waters because
levels of these organisms more accurately predict acute gastrointestinal
illness than levels of fecal coliforms.
Documentation Links
- https://ceden.waterboards.ca.gov/AdvancedQueryToolCEDEN advanced query tool page
- https://ceden.waterboards.ca.gov/Metadata/ControlledVocab.phpControlled vocabulary search page
- https://ceden.waterboards.ca.gov/Metadata/get_lu_data.php?format=html&table=AnalyteLU&include_all=yesAnalyte controlled vocabulary
- https://ceden.waterboards.ca.gov/Metadata/get_lu_data.php?format=html&table=MethodLU&include_all=yesMethod controlled vocabulary
- https://ceden.waterboards.ca.gov/Metadata/get_lu_data.php?format=html&table=ResultQualityLU&include_all=yesResult Qualifier controlled vocabulary
- https://ceden.waterboards.ca.gov/Metadata/get_lu_data.php?format=html&table=QA_LU&include_all=yesQA controlled vocabulary
Contacts
- Wrangler
Packages
- zip http://library.metatab.org/sandiegodata.org-beachwatch-4.zip
- csv http://library.metatab.org/sandiegodata.org-beachwatch-3.csv
- source https://github.com/san-diego-water-quality/water-datasets.git
Accessing Packages in Metapack
import metapack as mp
# ZIP Package
pkg = mp.open_package('http://library.metatab.org/sandiegodata.org-beachwatch-4.zip')
# CSV Package
pkg = mp.open_package('http://library.metatab.org/sandiegodata.org-beachwatch-3.csv') 
resource = pkg.resource('resource_name') # Get a resource
df = resource.dataframe() # Create a pandas Dataframe
gdf = resource.geoframe() # Create a GeoPandas GeoDataFrame
References
Urls used in the creation of this data package.
- index:ceden.waterboards.ca.gov-beachwatch-sandiego#beachwatch-sd. Beachwatch source data
Last Modified 2018-08-10T22:40:33
