San Diego Downtown Homless Computer Vision Package

Files and code for analyzing San Diego downtown homelessness data with computer vision

sandiegodata.org-downtown_cv-5

Resources | Packages | Documentation| Contacts| Data Dictionary

Resources

Documentation

This dataset collects records related to a conversion of 5 years of paper maps that record positions of homeless sleepers in downtown San Diego. The San Diego Regional Data Library is converting these paper maps to a digital form with a manual process that uses an image annotation tool, and theses annotations can be used to train computer vision algorithms to georeference maps and recognize handwritten marks.

These datasets link to map urls and annotations, for three kinds of annotations:

  • Ground Control Points, which identify the map image locations for known intersections, linking image coordinates ( in pixels ) to geographic coordinates.
  • Image locations of handwritten marks and the number written in the mark.
  • File annotations, for other handwritten notes such as the temperature and presence of rain.

More Information:

  • Blog Post. For more discussion about the GCP and handwritten marks, and the tasks in volved in developing computer vision algorithms for these data, see our recent blog post on the subject.
  • Clustering Notebook. For some examples of using OpenCV to extract and match templates, to georeference maps, see the Templates and Clustering Jupyter Notebook].
  • Extract Marks Notebook. For examples of extracting ( but not recognizing ) handwritten marks, see this notebook.

Developer notes

After anotation JSON files are copied into S#, the list of S# urls must be updated. To refresh the list of urls run

$  bin/update_s3.sh <s3-profile>

Contacts

Packages

Accessing Packages in Metapack

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/sandiegodata.org-downtown_cv-5.zip')

# Create Dataframes

gcp_df = pkg.resource('gcp').dataframe()
intersection_regions_df = pkg.resource('intersection_regions').dataframe()intersections_gdf = pkg.resource('intersections').geoframe()

file_annotations_df = pkg.resource('file_annotations').dataframe()
counts_df = pkg.resource('counts').dataframe()

Data Dictionary

gcp | intersections | intersection_regions | file_annotations | counts

gcp

Column NameData TypeDescription
image_urlstringMap image url
xintegerX position of upper left of region rectangle, in pixels
yintegerY position of upper left of region rectangle, in pixels
widthintegerWidth of selection region rectangle in pixels
heightintegerHeight of selection region rectangle in pixels
intersectionstringName of intersection

intersections

Column NameData TypeDescription
geometrystringWKT format geometry of intersection point
neighborhoodstringNeighborhood intersection is in
intersectionstringName of intersection

intersection_regions

Column NameData TypeDescription
image_urlstringUrl to a map image
neighborhoodstringName of the neighborhood for the maps
yearintegerYear portion of the data collection date.
monthintegerMonth portion of the data collection date.
intersections_idstringA string composed of the names of the four intersections.
intersection_groupstringA name, based on the neighbrhood, that identifies distinct intersection_id strings.
map_namestringA name based on the neighborhood and map changes in 2016 and 2017
source_invstringThe intersection polygon, fromed from the intersection points, in WKT format, in the pixel coordinate space. This version is inverted, with the Y coordinate being subtracted from 2000, so the orientation of the Y axis is the same as the EPSG:2230 geographic coordinate space.
sourcestringLike source_inv, but the Y axis is not inverted, so the coordinates are the same as the image.
source_areanumberArea of source shape, in square pixels
source_shapestring(X,Y) shape of source polygon bounding box
source_shape_xintegerX value of source_shape
source_shape_yintegerY value of source_shape
deststringThe intersection polygon, but in EPSG:2230 (State plane 6, California, Feet) coordinates.
matrixstringAn affine transformation matric that transforms from the coorinates of source_inv to dest. When pixel locations are properly inverted, this matrix transforms from pixel locations to geographic locations.

file_annotations

Column NameData TypeDescription
image_urlstringUrl to a map image
url_yearintegerYear, from url
url_monthintegerMonth, from url
datedatetimeDate, from file annotation, or from url if the annotation is empty
neighborhoodstringNeighborhood
url_neighborhoodstringNeighborhood from url
total_countnumberTotal count of handwritten marks, or may be the processed value, with the structure and vehicle counts multipled by conversion factors.
tempintegerTemperature, if it was given on the map
rainstringRain, if it was recorded on the map.

counts

Column NameData TypeDescription
image_urlstringMap image URL
cxintegerX value of the center of the circle region, in pixels
cyintegerY value of the center of the circle region in pixels
rintegerRadius of the circle region, in pixels
typestringType of sleeper: Individual, Vehicle or Structure
countstringCount of sleepers

Last Modified 2019-09-13T04:52:40

Packages

Accessing Packages in Metapack

import metapack as mp
pkg = mp.open_package('http://library.metatab.org/sandiegodata.org-downtown_cv-5.zip')

# Create Dataframes

gcp_df = pkg.resource('gcp').dataframe()
intersection_regions_df = pkg.resource('intersection_regions').dataframe()intersections_gdf = pkg.resource('intersections').geoframe()

file_annotations_df = pkg.resource('file_annotations').dataframe()
counts_df = pkg.resource('counts').dataframe()