Processed crime incidents for San Diego county, 2007-2013. Of our crime datasets, this one is most useful for analysis.
sandag.gov-crime-2007e2013-2.1.4
. Modified 2020-11-26T06:39:42
Resources | Packages | Documentation| Contacts| References| Data Dictionary
Resources
- incidents. San Diego county crime incidents, 2007 to 2013
- i5y_sample. A sample of 10,000 incidents, from years 2008 to 2012
Documentation
Processed crime incidents, based on data supplied by SANDAG.
Important:
- See Caveats for limitations and warnings regarding this data.
- Use of this data is subject to multiple terms and conditions. See Terms for details.
This dataset includes geocoded crime incidents from 1 Jan 2007 to 31 March 2013 that were returned by SANDAG for Public Records request 12-075.
Asr_zone Codes
The integer values in the asr_zone field are taken directly from the SANGIS parcel data. These values are:
╔════╦═════════════════════════════════╗
║ -1 ║ Unset ║
║ 0 ║ Unzoned ║
║ 1 ║ Single family residential (R-1) ║
║ 2 ║ Minor multiple (R-2) ║
║ 3 ║ Restricted multiple (R-3) ║
║ 4 ║ Multiple residential (R-4) ║
║ 5 ║ Restricted commercial ║
║ 6 ║ Commercial ║
║ 7 ║ Industrial (M zone) ║
║ 8 ║ Agricultural ║
║ 9 ║ Special and/or misc. ║
╚════╩═════════════════════════════════╝
Addresses and Geocoding
SANDAG returns the position of incidents as a block address, and occasionally as an intersection. Block addresses are the original address of the incident, with the last two digits set to ’00’.
Before geocoding, all of the original block addresses are normalized to be more consistent and to remove different versions of the same address. There are a few transformations performed on the address, including:
- Converting street types synonyms like ‘Avenue’, ‘Avenu’ and ‘ave.’ to standard abbreviations like ‘ave.’
- Converting street directions ( ‘West main Street’ ) to abbreviations like ‘W Main st’
Many geocoders are designed to work with mailable addresses, and block addresses are not real postal addresses. This data is geocoded with custom code that uses the SANGIS streets database, matching the block addresses to a street segment. This produces more sensible results, because the crime is attributed to an entire block, rather than to an arbitrary point on the block. However, with the crime is represented as a point, it will appear at the location of the center of the street segment, usually in the middle of the block.
This means that all of the incidents on a block will appear at a single location. In most GIS programs, it is difficult to see that there are actually many points in one place. Be aware that each point you see may actually be dozens of incidents.
The files that SANDAG returned included 1,008,524 incident records, and 953,824 records were geocoded (95%). The ‘gctype’ field has a value of NONE when the record was not geocoded, and any field that depends on a locations, such as x, y, lon, lat, segment_id, community, and others, will have default values.
Caveats
As with most crime data, there are many issues, limitations and problems that users must be aware of to avoid making incorrect conclusions.
Crime incident data is inherently problematic. Crime incident reports are collected by busy officers in stressful situations who are trying to describe complex situations with rigid categories. Virtually every point of the data collection process has multiple opportunities for errors and few opportunities for correction after the fact. Analysts must consider the difficulties of collecting crime data when assessing the validity of any conclusions.
Data is collected by 19 different agencies. While the data is all sourced from SANDAG, it originates with 19 different police departments. These departments may have different policies that can result in different categorizations for the same crime, and they may have different emphases on which crimes they pursue.
Many incidents at a single point. Because all of the crimes on a block are geocoded to the middle of the block, many incidents will appear as a single point.
5% of crimes are not geocoded. GIS users should consider that about 5% of the incidents were not properly geocoded, and are not included in the shapefiles. These crimes appear in the CSV files, and can be included in time series analysis, but they will not be available for spatial analysis.
Time and dates are often unreliable Time and dates for many incidents are unreliable, with times being more unreliable than dates.
-
Property crimes that occur while the owner is gone may be recorded as the time a responsible person left the property, arrived at the property to discover the crime, or the average between the two. There is no information available to select among these possibilities, so these incidents have very unreliable times.
-
Because the time is unreliable, so is the date, for crimes that occurred at night.
-
Times may have not been recorded in the original report. These times may be entered as midnight, or as another time.
Multiple crime incidents may not have all crimes recorded. If a single person is charged with multiple violations for a single arrest, departments may enter only the most serious charge, the last charge, or all of the charges. There is no information to disambiguate these possibilities.
Locations may be unreliable. Crimes that involve pursuits or violations committed and multiple locations may be recorded and any of many different locations. When the location is ambiguous, tt is common for incidents to have the address recorded as the location where the arrested person was charged. Because of this, the highest crime block in San Diego is the downtown police station. Check high crime locations to ensure they are not police stations.
Dataset Statistics
Number of incidents by year:
year count
---------- ----------
2007 186014
2008 178445
2009 163646
2010 160133
2011 147270
2012 141318
2013 31699
Crime types, from the “type” field, and the number of that type
type count
------------------------ ----------
DRUGS/ALCOHOL VIOLATIONS 230462
THEFT/LARCENY 138030
VEHICLE BREAK-IN/THEFT 123955
MOTOR VEHICLE THEFT 97498
BURGLARY 91695
VANDALISM 83912
ASSAULT 70687
DUI 58311
FRAUD 55219
ROBBERY 22685
SEX CRIMES 22281
WEAPONS 11117
ARSON 2145
HOMICIDE 528
Incidents by city:
name code count
------------------------- ---------- ----------
San Diego SndSAN 401787
S.D. County SndSDO 342282
Oceanside SndOCN 44022
Chula Vista SndCHU 38387
Escondido SndESC 26079
Vista SndVIS 20044
Carlsbad SndCAR 18330
La Mesa SndLAM 17871
El Cajon SndELC 16548
National City SndNAT 16509
San Marcos SndSNM 14230
Santee SndSNT 12328
Encinitas SndENC 12302
Poway SndPOW 8565
Imperial Beach SndIMP 5442
Del Mar SndDEL 4876
Lemon Grove SndLEM 4198
Coronado SndCOR 2466
Solana Beach SndSOL 2259
Citation
Name of file, extracted from clarinova.com-crime-incidents-casnd-7ba4. San Diego Regional Data Library. 2013-08-07 http://www.sandiegodata.org
Terms
This data is released under the following terms and conditions.
Clarinova and the San Diego Regional Data Library disclaim any warranty for this data shall not be liable for loss or harm. See the SDRDL Disclaimers and Limitations web page for complete details.
This data is based on data from SANGIS, which is subject to its own terms and conditions. See the SANGIS Legal Notice for details.
This data is based on data from SANDAG, which is subject to its own terms and conditions. See the SANDAG Legal Notice for details.
Documentation Links
Contacts
- Wrangler
Data Dictionary
incidentsincidents
Column Name | Data Type | Description |
---|---|---|
datetime | datetime | date and time in ISO format |
year | integer | Four digit year. |
month | integer | Month number extracted from the date |
day | integer | Day number, starting from Jan 1, 2000 |
week | integer | ISO week of the year |
dow | integer | Day of week, as a number. 0 is Sunday |
time | datetime | Time, in H:MM:SS format |
hour | integer | Hour number, extracted from the time |
is_night | integer | 1 if time is between dusk and dawn, rounded to nearest hour. All comparisons are performed against the dusk and dawn times for the 15th of the month. |
category | string | Crime category, provided by SANDAG *This is the short crime type* |
address | string | Block address, street and city name |
city | string | City name |
segment_id | integer | segment identifier from SANGID road network data. |
nbrhood | string | Neighborhood name |
community | string | Community Name for San Diego, city name outside of San Diego |
comm_pop | integer | Population of the community area, from the 2010 Census |
council | string | San Diego City council |
council_pop | integer | Population of San Diego city council |
geoid | string | Tract geoid |
asr_zone | integer | Assessor’s zone code for nearest parcel. |
lampdist | integer | Distance to nearest streetlamp in centimeters |
lat | number | Latitude, provided by the geocoder. |
lon | number | Longitude, provided by the geocoder. |
desc | string | Long description of incident. |
gctype | string | |
gcquality | integer | |
geometry | string |
References
Urls used in the creation of this data package.
- file:data/incidents-2007.csv.zip#&encoding=ascii. San Diego crime incidents 2007
- file:data/incidents-2012.csv.zip#&encoding=ascii. San Diego crime incidents 2012
- file:data/incidents-2013.csv.zip#&encoding=ascii. San Diego crime incidents 2013
- file:data/incidents-2011.csv.zip#&encoding=ascii. San Diego crime incidents 2011
- file:data/incidents-2008.csv.zip#&encoding=ascii. San Diego crime incidents 2008
- file:data/incidents-2009.csv.zip#&encoding=ascii. San Diego crime incidents 2009
- file:data/incidents-2010.csv.zip#&encoding=ascii. San Diego crime incidents 2010
- metapack+http://library.metatab.org/sandiegodata.org-demographics-tract-1.1.1.csv. Demographics data package
Packages
- s3 s3://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4.csv
- csv http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4.csv
- source https://github.com/metatab-packages/sandag.gov-crime-2007e2013.git
Accessing Data in Vanilla Pandas
import pandas as pd
incidents_df = pd.read_csv('http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4/data/incidents.csv')
i5y_sample_df = pd.read_csv('http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4/data/i5y_sample.csv')
Accessing Package in Metapack
import metapack as mp
pkg = mp.open_package('http://library.metatab.org/sandag.gov-crime-2007e2013-2.1.4.csv')
# Create Dataframes
incidents_gdf = pkg.resource('incidents').geoframe()
i5y_sample_gdf = pkg.resource('i5y_sample').geoframe()
1 thought on “San Diego Region Crime Incidents 2007 – 2013”
Comments are closed.