Processed crime incidents for San Diego county, 2007-2013. Of our crime datasets, this one is most useful for analysis.
sandag.gov-crime-2007e2013-2.1.4. Modified 2020-11-26T06:39:42
- incidents. San Diego county crime incidents, 2007 to 2013
- i5y_sample. A sample of 10,000 incidents, from years 2008 to 2012
Processed crime incidents, based on data supplied by SANDAG.
- See Caveats for limitations and warnings regarding this data.
- Use of this data is subject to multiple terms and conditions. See Terms for details.
This dataset includes geocoded crime incidents from 1 Jan 2007 to 31 March 2013 that were returned by SANDAG for Public Records request 12-075.
The integer values in the asr_zone field are taken directly from the SANGIS parcel data. These values are:
╔════╦═════════════════════════════════╗ ║ -1 ║ Unset ║ ║ 0 ║ Unzoned ║ ║ 1 ║ Single family residential (R-1) ║ ║ 2 ║ Minor multiple (R-2) ║ ║ 3 ║ Restricted multiple (R-3) ║ ║ 4 ║ Multiple residential (R-4) ║ ║ 5 ║ Restricted commercial ║ ║ 6 ║ Commercial ║ ║ 7 ║ Industrial (M zone) ║ ║ 8 ║ Agricultural ║ ║ 9 ║ Special and/or misc. ║ ╚════╩═════════════════════════════════╝
Addresses and Geocoding
SANDAG returns the position of incidents as a block address, and occasionally as an intersection. Block addresses are the original address of the incident, with the last two digits set to ’00’.
Before geocoding, all of the original block addresses are normalized to be more consistent and to remove different versions of the same address. There are a few transformations performed on the address, including:
- Converting street types synonyms like ‘Avenue’, ‘Avenu’ and ‘ave.’ to standard abbreviations like ‘ave.’
- Converting street directions ( ‘West main Street’ ) to abbreviations like ‘W Main st’
Many geocoders are designed to work with mailable addresses, and block addresses are not real postal addresses. This data is geocoded with custom code that uses the SANGIS streets database, matching the block addresses to a street segment. This produces more sensible results, because the crime is attributed to an entire block, rather than to an arbitrary point on the block. However, with the crime is represented as a point, it will appear at the location of the center of the street segment, usually in the middle of the block.
This means that all of the incidents on a block will appear at a single location. In most GIS programs, it is difficult to see that there are actually many points in one place. Be aware that each point you see may actually be dozens of incidents.
The files that SANDAG returned included 1,008,524 incident records, and 953,824 records were geocoded (95%). The ‘gctype’ field has a value of NONE when the record was not geocoded, and any field that depends on a locations, such as x, y, lon, lat, segment_id, community, and others, will have default values.
As with most crime data, there are many issues, limitations and problems that users must be aware of to avoid making incorrect conclusions.
Crime incident data is inherently problematic. Crime incident reports are collected by busy officers in stressful situations who are trying to describe complex situations with rigid categories. Virtually every point of the data collection process has multiple opportunities for errors and few opportunities for correction after the fact. Analysts must consider the difficulties of collecting crime data when assessing the validity of any conclusions.
Data is collected by 19 different agencies. While the data is all sourced from SANDAG, it originates with 19 different police departments. These departments may have different policies that can result in different categorizations for the same crime, and they may have different emphases on which crimes they pursue.
Many incidents at a single point. Because all of the crimes on a block are geocoded to the middle of the block, many incidents will appear as a single point.
5% of crimes are not geocoded. GIS users should consider that about 5% of the incidents were not properly geocoded, and are not included in the shapefiles. These crimes appear in the CSV files, and can be included in time series analysis, but they will not be available for spatial analysis.
Time and dates are often unreliable Time and dates for many incidents are unreliable, with times being more unreliable than dates.
Property crimes that occur while the owner is gone may be recorded as the time a responsible person left the property, arrived at the property to discover the crime, or the average between the two. There is no information available to select among these possibilities, so these incidents have very unreliable times.
Because the time is unreliable, so is the date, for crimes that occurred at night.
Times may have not been recorded in the original report. These times may be entered as midnight, or as another time.
Multiple crime incidents may not have all crimes recorded. If a single person is charged with multiple violations for a single arrest, departments may enter only the most serious charge, the last charge, or all of the charges. There is no information to disambiguate these possibilities.
Locations may be unreliable. Crimes that involve pursuits or violations committed and multiple locations may be recorded and any of many different locations. When the location is ambiguous, tt is common for incidents to have the address recorded as the location where the arrested person was charged. Because of this, the highest crime block in San Diego is the downtown police station. Check high crime locations to ensure they are not police stations.
Number of incidents by year:
year count ---------- ---------- 2007 186014 2008 178445 2009 163646 2010 160133 2011 147270 2012 141318 2013 31699
Crime types, from the “type” field, and the number of that type
type count ------------------------ ---------- DRUGS/ALCOHOL VIOLATIONS 230462 THEFT/LARCENY 138030 VEHICLE BREAK-IN/THEFT 123955 MOTOR VEHICLE THEFT 97498 BURGLARY 91695 VANDALISM 83912 ASSAULT 70687 DUI 58311 FRAUD 55219 ROBBERY 22685 SEX CRIMES 22281 WEAPONS 11117 ARSON 2145 HOMICIDE 528
Incidents by city:
name code count ------------------------- ---------- ---------- San Diego SndSAN 401787 S.D. County SndSDO 342282 Oceanside SndOCN 44022 Chula Vista SndCHU 38387 Escondido SndESC 26079 Vista SndVIS 20044 Carlsbad SndCAR 18330 La Mesa SndLAM 17871 El Cajon SndELC 16548 National City SndNAT 16509 San Marcos SndSNM 14230 Santee SndSNT 12328 Encinitas SndENC 12302 Poway SndPOW 8565 Imperial Beach SndIMP 5442 Del Mar SndDEL 4876 Lemon Grove SndLEM 4198 Coronado SndCOR 2466 Solana Beach SndSOL 2259
Name of file, extracted from clarinova.com-crime-incidents-casnd-7ba4. San Diego Regional Data Library. 2013-08-07 http://sandiegodata.org
This data is released under the following terms and conditions.
Clarinova and the San Diego Regional Data Library disclaim any warranty for this data shall not be liable for loss or harm. See the SDRDL Disclaimers and Limitations web page for complete details.
This data is based on data from SANGIS, which is subject to its own terms and conditions. See the SANGIS Legal Notice for details.
This data is based on data from SANDAG, which is subject to its own terms and conditions. See the SANDAG Legal Notice for details.
|Column Name||Data Type||Description|
|datetime||datetime||date and time in ISO format|
|year||integer||Four digit year.|
|month||integer||Month number extracted from the date|
|day||integer||Day number, starting from Jan 1, 2000|
|week||integer||ISO week of the year|
|dow||integer||Day of week, as a number. 0 is Sunday|
|time||datetime||Time, in H:MM:SS format|
|hour||integer||Hour number, extracted from the time|
|is_night||integer||1 if time is between dusk and dawn, rounded to nearest hour. All comparisons are performed against the dusk and dawn times for the 15th of the month.|
|category||string||Crime category, provided by SANDAG *This is the short crime type*|
|address||string||Block address, street and city name|
|segment_id||integer||segment identifier from SANGID road network data.|
|community||string||Community Name for San Diego, city name outside of San Diego|
|comm_pop||integer||Population of the community area, from the 2010 Census|
|council||string||San Diego City council|
|council_pop||integer||Population of San Diego city council|
|asr_zone||integer||Assessor’s zone code for nearest parcel.|
|lampdist||integer||Distance to nearest streetlamp in centimeters|
|lat||number||Latitude, provided by the geocoder.|
|lon||number||Longitude, provided by the geocoder.|
|desc||string||Long description of incident.|
Urls used in the creation of this data package.
- file:data/incidents-2007.csv.zip#&encoding=ascii. San Diego crime incidents 2007
- file:data/incidents-2012.csv.zip#&encoding=ascii. San Diego crime incidents 2012
- file:data/incidents-2013.csv.zip#&encoding=ascii. San Diego crime incidents 2013
- file:data/incidents-2011.csv.zip#&encoding=ascii. San Diego crime incidents 2011
- file:data/incidents-2008.csv.zip#&encoding=ascii. San Diego crime incidents 2008
- file:data/incidents-2009.csv.zip#&encoding=ascii. San Diego crime incidents 2009
- file:data/incidents-2010.csv.zip#&encoding=ascii. San Diego crime incidents 2010
- metapack+http://library.metatab.org/sandiegodata.org-demographics-tract-1.1.1.csv. Demographics data package
- s3 s3://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4.csv
- csv http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4.csv
- source https://github.com/metatab-packages/sandag.gov-crime-2007e2013.git
Accessing Data in Vanilla Pandas
import pandas as pd incidents_df = pd.read_csv('http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4/data/incidents.csv') i5y_sample_df = pd.read_csv('http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4/data/i5y_sample.csv')
Accessing Package in Metapack
import metapack as mp pkg = mp.open_package('http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4.csv') # Create Dataframes incidents_gdf = pkg.resource('incidents').geoframe() i5y_sample_gdf = pkg.resource('i5y_sample').geoframe()