San Diego Business Clusters

Business in San Diego linked to entertainment clusters and population density.

sandiegodata.org-business_clusters-1.1.5. Modified 2021-03-21T19:41:52

Resources | Packages | Documentation| Contacts| References| Data Dictionary

Resources

Documentation

This dataset processes the City of San Diego Master Business file to add geocoded addresses and links to business clusters. San Diego publishes two lists of businesses, which are based on payment of the San Diego City business tax: the Master Business File, and a SANGIS file that includes geographic information. Unfortunatel y, these files are quite different and cannot be linked. The SANGIS file is oriented toward the tax assessors parcel that the business occupies, and the Master Business List has account numbers and addresses, but there is no common key between the files.

The files in this package add address geocodes to the Master Business List, and links the businesses to clusters of businesses. The Clusters are created by collecting nearby businesses from Open Street Map data. The cluster types are:

  • NA: No cluster, 31787 businesses
  • shop: OSM tags ‘shop’, ‘clothes’, ‘supermarket’, ‘bank’, ‘laundry’, ‘parking’, 14615 businesses
  • ent: Entertainment, OSM tags ‘cafe’, ‘restaurant’, ‘bar’, 14320 businesses
  • casual: Fast food and convenience stores, OSM tags ‘fast_food’, ‘convenience’, 10991 businesses

The sd_business_clusters file has the clusters and their WKT geographies. The sd_custered_businesses links San Diego businesses to clusters, and a single business may be in more than one cluster because the clusters of different tyoes overlap.

NAICS Codes

It appears that the NAICS codes used in the Master Business List are vintage 2007. The code ‘72221’ appears frequently, which is valid in 2007 NAICS, but not in 2012 or 2016 NAICS.

Geocoding

The geocoding was performed with a local installation of Pelias. There are some notable errors in the geocoding. For instance, Ba Ho Liquor and Deli, with address of ‘4031 AVATI DR SUITE I SAN DIEGO 92117-4403, CA’, was geocoded to 4144 Avati, moving the location from a neighborhood mini-mall to a residence. It is unknown how many such error there are, so use the geocodes with caution.

Contacts

Data Dictionary

sb_mbl | sd_business_clusters | sd_businesses | sd_custered_businesses | naics

sb_mbl

Column NameData TypeDescription
business_acctinteger
dba_namestring
ownership_typestring
addressstring
citystring
zipstring
statestring
business_phonestring
owner_namestring
creation_dtdate
start_dtdate
exp_dtdate
naicsinteger2007 NAICS code
activity_descstring

sd_business_clusters

Column NameData TypeDescription
cluster_nintegerCluster number
cluster_typestringCluster type: ent, shop, or casual
geometrystring

sd_businesses

Column NameData TypeDescription
accountinteger
gc_addressstringAddress used for geocoding
latnumberGeocoded latitude
lonnumberGeocoded longitude
dba_namestring
ownership_typestring
creation_dtdate
start_dtdate
exp_dtdate
owner_namestring
naicsinteger
activity_descstring
geometrystringGeocoded position, in WKT format
geoidstringGeoid of the Census block group that contain the business point.
popintegerPopulation of the bock group, from 2019 5 year ACS
areaintegerArea of the block group in square meters.

sd_custered_businesses

Column NameData TypeDescription
accountinteger
gc_addressstring
latnumber
lonnumber
geoidstring
popinteger
areainteger
dba_namestring
ownership_typestring
creation_dtdate
start_dtdate
exp_dtdate
owner_namestring
naicsinteger
activity_descstring
cluster_nintegerCluster number
cluster_typestringCluster type: ent, shop, or casual
geometrystringPoint location of the business, in WKT format

naics

Column NameData TypeDescription
accountintegerBusiness account number
naicsintegerFull NAICS code
naics_2integer2 digit NAICS prefix
naics_3integer3 digit NAICS prefix
naics_4integer4 digit NAICS prefix
naics_5integer5 digit NAICS prefix
naics_6integer6 digit NAICS prefix
naics_descstringDescription of the NAICS code
naics_2_descstring
naics_3_descstring
naics_4_descstring
naics_5_descstring
naics_6_descstring

References

Urls used in the creation of this data package.

  • index:civicknowledge.com-osm-demosearch-2.1.1#business_clusters. US business clusters
  • sd_businesses_ak. San Diego Businesses A-K
  • sd_businesses_lz. San Diego Businesses L-Z
  • metapack+http://library.metatab.org/sangis.org-business_sites.csv#business_sites. San DIego Business locations, from SANGIS
  • metapack+http://library.metatab.org/sandiegodata.org-geography-2018-13.csv#sd_county_boundary. San Diego County Geo boundry
  • naics_index_2007. NAICS index file, 2007.
  • naics_index_2007_26. NAICS index file, 2007, 2 to 6 digit codes
  • censusgeo://2019/5/CA/blockgroup. CA Census Blocks
  • census://2019/5/CA/blockgroup/B01003. Total population by blocks

Packages

Accessing Data in Vanilla Pandas

import pandas as pd


sb_mbl_df =  pd.read_csv('http://library.metatab.org/sandiegodata.org-business_clusters-1.1.5/data/sb_mbl.csv')
sd_business_clusters_df =  pd.read_csv('http://library.metatab.org/sandiegodata.org-business_clusters-1.1.5/data/sd_business_clusters.csv')
sd_businesses_df =  pd.read_csv('http://library.metatab.org/sandiegodata.org-business_clusters-1.1.5/data/sd_businesses.csv')
sd_custered_businesses_df =  pd.read_csv('http://library.metatab.org/sandiegodata.org-business_clusters-1.1.5/data/sd_custered_businesses.csv')
naics_df =  pd.read_csv('http://library.metatab.org/sandiegodata.org-business_clusters-1.1.5/data/naics.csv')

Accessing Package in Metapack

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/sandiegodata.org-business_clusters-1.1.5.csv')

# Create Dataframes
sb_mbl_df = pkg.resource('sb_mbl').dataframe()
sd_business_clusters_gdf = pkg.resource('sd_business_clusters').geoframe()
sd_businesses_gdf = pkg.resource('sd_businesses').geoframe()
sd_custered_businesses_gdf = pkg.resource('sd_custered_businesses').geoframe()
naics_df = pkg.resource('naics').dataframe()