About this document

This is a project prepared as part of an independent study in the Masters of Urban Spatial Analytics Program directed by Professor Ken Steif. This is considered a proof-of-concept method for using deep learning techniques in R, specifically image classification using Keras.

Abstract

Philadelphia Zero Waste plan aims to reduce litter and waste in the Philadelphia by 2035. One hurdle to overcome is locating dispersed and unmonitored trash cans, which often accumulate trash without being emptied. Using the City’s data of wire trash cans, I created an image set of trash cans around Philadelphia. I used this set and a set of images without trash cans to train a convoluted neural network using the Keras library to classify images as having trash cans or not, with an accuracy of 92%.

1: Introduction

Philadelphia has a litter problem that needs solving. The City implemented Zero Waste Philadelphia in 2009. This program aims “to fully eliminate the use of landfills and conventional incinerators by 2035.” One mission of this initiative is to reduce litter throughout the city, yet until recently little has been done to meet this challenge. In February 2018, the City released a new block-by-block litter index which helps visualize the widespread little problem, as shown in the map below with blue points representing currently located trash cans. The data was collected by surveyors who rated blocks from 1 to 4, where a score of one indicated no litter and a score of four indicated the need for heavy machinery to remove trash. The City is using this data to help them direct street cleaners more effectively.

Map of Philadelphia

While this data is a step in the right direction towards alleviating litter on the streets, this solution does not necessarily address the issue of trash buildup in public waste bins. A recent Philly.com opinion piece expounds on this litter problem caused because of unemptied trash cans, noting that even the Big Belly trash cans which are supposed to alert the Cty three times weekly often fail.

Even more concerning is that the City has no database to keep track of most trash can locations, save for Big Belly and some wire bin ones. Without a robust geo-located database, it would be hard to expect the City to create a systematic method for emptying trash cans.

I decided to take on a project using publicly available Google Streetview images and a simple deep learning model developed thru Keras to attempt to classify corners as having trash cans or not in an attempt to help the City better understand its trash can locations for the purposes of litter reduction. The Keras method uses computer image vision to examine an image to classify if it contains an object that it is trained to detect; in this case, the presence of trash cans. The model infers an object’s presence by generalizing the object to a bounding boxes learned from a training set. This bounding box is then applied to a test set, and the model generates a probability for each image which rates its likelihood of containing the object. I hypothesized that google streetview images would be of a high enough quality for the model to learn what a trash can should look like and classify its presence in an image.

2: Methods

I relied heavily on the googleway and keras packages to create my models.

I used the City’s open data Wire trash cans dataset to generate images at coordinates that ostensibly contained trash cans. These points are represented on the map above in blue. Here is an example of what one of these trash cans looks like when pulled from Google street images.

As you can see in the map above, the trash cans are mostly distributed outside of Center City. Below, find a table which shows the number of trash cans per neighborhood.

##                   Neighborhood Number_Of_trashcans
## 1                    Frankford                  46
## 2                Oxford Circle                  22
## 3                     Lawndale                  18
## 4                   Rhawnhurst                  18
## 5                       Tacony                  17
## 6                North Central                  15
## 7                  Wissinoming                  14
## 8                 Hunting Park                  13
## 9                    Hartranft                  11
## 10                     Mayfair                  10
## 11                    Burholme                   8
## 12                    Nicetown                   8
## 13                  Summerdale                   8
## 14            Upper Kensington                   8
## 15                 Feltonville                   7
## 16                  Harrowgate                   7
## 17                     Elmwood                   6
## 18               Franklinville                   6
## 19                  Holmesburg                   6
## 20                 Kingsessing                   6
## 21               West Oak Lane                   6
## 22                     Belmont                   5
## 23                  Cedar Park                   5
## 24                    Richmond                   5
## 25                 Cobbs Creek                   4
## 26             East Germantown                   4
## 27                   Fox Chase                   4
## 28                       Logan                   4
## 29                    Paschall                   4
## 30        Southwest Germantown                   4
## 31             Haverford North                   3
## 32                  Roxborough                   3
## 33     West Central Germantown                   3
## 34              Allegheny West                   2
## 35             Bartram Village                   2
## 36               Dearnley Park                   2
## 37                   Fern Rock                   2
## 38         Germantown - Morton                   2
## 39      Germantown - Penn Knox                   2
## 40                     McGuire                   2
## 41                   Northwood                   2
## 42        Southwest Schuylkill                   2
## 43                     Stanton                   2
## 44             West Kensington                   2
## 45                 Brewerytown                   1
## 46                Carroll Park                   1
## 47               Chestnut Hill                   1
## 48                  East Falls                   1
## 49             East Mount Airy                   1
## 50                 East Poplar                   1
## 51 Fishtown - Lower Kensington                   1
## 52       Germantown - Westside                   1
## 53                Germany Hill                   1
## 54           Graduate Hospital                   1
## 55                      Mantua                   1
## 56          Northern Liberties                   1
## 57                      Ogontz                   1
## 58                       Olney                   1
## 59                Point Breeze                   1
## 60                   Sharswood                   1
## 61          Strawberry Mansion                   1
## 62                       Tioga                   1
## 63            Upper Roxborough                   1
## 64             West Mount Airy                   1
## 65                      Wister                   1
## 66                    Yorktown                   1

It is clear that there is clustering of the trash cans in the Northeast section of Philadelphia. I decided that this clustering was acceptable after examining the images of the trash cans because the context of the images, meaning the backdrops of where the trash cans were located (industrial, commercial, residential, and wooded areas), were varied enough to provide the model with different landscapes where trash cans could be located and would mean that the model would have to heavily rely on the bounding box of a trash can to classify its presence.

To create a dataset of street corners without trash cans, I used the street nodes shapefile to generate the 24,706 images of street corners, which I hypothesized would be more likely to have trash cans. I selected 648 of those points without trash cans in combination with the 352 images to create a dataset of 1000 images to train the model.

Below is the code I used to scrape the images from Google Streetview:

library(jsonlite)
library(sf)
library(ggplot2)
library(tidyverse)
library(googleway)
library(keras)
library(leaflet)
library(htmltools)
library(dplyr)
library(plotROC)
library(pROC)
library(gridExtra)
library(ROCR)

#install_keras() ##only need to run once

wire_baskets <- fromJSON(
  "http://data.phl.opendata.arcgis.com/datasets/5cf8e32c2b66433fabba15639f256006_0.geojson")
wire_baskets <- as.data.frame(wire_baskets$features)
wire_attri <- as.data.frame(wire_baskets$properties)
wire_geo <- as.data.frame(wire_baskets$geometry)
wire_baskets <- cbind(wire_attri,wire_geo)
coords <- as.data.frame(wire_baskets$coordinates)
coords <- t(coords)

nodes1 <- st_read("C:/Users/Evan Cernea/Box Sync/MUSA Independent Study/Street_Nodes.shp") 

nodes2 <- nodes1 %>%  
  select(geometry) %>% 
  mutate(lat = as.numeric(st_coordinates(nodes1)[,1])) %>% 
  mutate(lon = as.numeric(st_coordinates(nodes1)[,2]))
nodes3 <- nodes2 %>% 
  dplyr::select(-geometry)

nodes4 <- as.data.frame(nodes3)
#ggplot()+geom_point(data=nodes, aes(x=(st_coordinates(nodes)[,1]), y=(st_coordinates(nodes)[,2])))

df <- data.frame(lat = as.numeric(WasteBaskets_Wire$Y),
                 lon = as.numeric(WasteBaskets_Wire$X))
df$lat[1:nrow(df)]

for (i in 1:nrow(coords)){
  mypath <- file.path("C:","Users","Evan Cernea","Box Sync","MUSA Independent Study",paste("trash can_", coords[2][i],"_",coords[1][i],".jpeg", sep = ""))
  jpeg(file=mypath)
  google_streetview(location = c(coords[2][i], coords[1][i]),
                    size = c(800,1200),
                    panorama_id = NULL,
                    heading = 90,
                    output = "plot",
                    fov = 120,
                    pitch = 0,
                    response_check = TRUE,
                    key = "AIzaSyAHPB8q_KueuMEglgzV9N9k_QLka2A_fPM")
  dev.off()
}

for (i in 1:nrow(nodes4)){
mypath <- file.path("C:","Users","Evan Cernea","Box Sync","MUSA Independent Study",paste("randomPoint_",nodes4$lon[i],"_",nodes4$lat[i],".jpeg", sep = ""))
jpeg(file=mypath)
google_streetview(location = c(nodes4$lon[i], nodes4$lat[i]),
                  size = c(800,1200),
                  panorama_id = NULL,
                  heading = 90,
                  output = "plot",
                  fov = 120,
                  pitch = 0,
                  response_check = TRUE,
                  key = "AIzaSyAHPB8q_KueuMEglgzV9N9k_QLka2A_fPM")
dev.off()
}

After successfully scraping these images, I had to create directories in my computer so that the model would know how the images were classified. The code below allows one to create these directories in the RStudio console, although a user may choose to do this manually.

original_dataset_dir <- "C:/Users/Evan Cernea/Box Sync/MUSA Independent Study/All"

base_dir <- "C:/Users/Evan Cernea/Box Sync/MUSA Independent Study"
dir.create(base_dir)

train_dir <- file.path(base_dir, "train")
dir.create(train_dir)

validation_dir <- file.path(base_dir, "validation")
dir.create(validation_dir)

test_dir <- file.path(base_dir, "test")
dir.create(test_dir)

train_trash_dir <- file.path(train_dir, "trash")
dir.create(train_trash_dir)

train_notrash_dir <- file.path(train_dir, "notrash")
dir.create(train_notrash_dir)

validation_trash_dir <- file.path(validation_dir, "trash")
dir.create(validation_trash_dir)

validation_notrash_dir <- file.path(validation_dir, "notrash")
dir.create(validation_notrash_dir)

test_trash_dir <- file.path(test_dir, "trash")
dir.create(test_trash_dir)

test_notrash_dir <- file.path(test_dir, "notrash")
dir.create(test_notrash_dir)

3: Creating the model

To allow the model to know what images to look at and their original classifications of containing a trash can or not, you have to set up your files into the directories that were previously created. There is a training, testing, and validation set that have trash and no trash subfolders.

The training set is the group of images that the model is given to learn what an image that contains a trash can looks like and what an image that does not contain a trash can looks like. In the case of Keras, knowing what a trash can “looks like” means understanding what bounding box defines a trash can.

The validation set is a group of images used in making sure that the model is not too heavily reliant on the training data to make its classification (in statistics terms, this is called overfitting). When the model is trained using the training data, the validation data is simultaneously tested.

The test set is the group of images that the model does not see initially. The model is then used to predict classifications of the images in the test set, using the information it learned from the training set.

This in-console did not work for me thru R, so I did it manually. However, I am including the code here in the hopes that someone will be able to successfully replicate it:

fnames <- paste0("myplot_", 1:117, ".jpeg")
file.copy(file.path(original_dataset_dir, fnames), 
          file.path(train_trash_dir)) 

fnames <- paste0("myplot_",118:235, ".jpeg")
file.copy(file.path(original_dataset_dir, fnames), 
          file.path(validation_trash_dir))

fnames <- paste0("myplot_", 236:352, ".jpeg")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(test_trash_dir))

fnames <- paste0("randomPoint_",1:214, ".jpeg")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(train_notrash_dir))

fnames <- paste0("randomPoint_", 215:429, ".jpeg")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(validation_notrash_dir)) 

fnames <- paste0("randomPoint_", 430:642, ".jpeg")
file.copy(file.path(original_dataset_dir, fnames),
          file.path(test_notrash_dir))

I relied heavily on chapter 5 from the recently-released book, Deep Learning in R, to set up my model. An excerpt of this chapter can be found here.

I used an Imagenet pretrained model to guide my predictions. Imagenet is a database repository run by Stanford University which has used computer vision image classification on many different datasets. The ImageNet VGC16 pretrained model that I used is ideal because it’s been trained on large, varied datasets and can be used for most classification problems. Using this feature extraction method of deep learning allows for the highest accuracy in the shortest amount of time with the least processing power on both the computer and user side, especially if you do not have access to a GPU.

Below, I use code to set up the convolution base in conv_base, which is the feature extraction architecture the model uses to make its classifications based on the data it is fed. Then I create a sequential model that will be used to classify images, using the default settings for a model which is intended to perform image classifications.

conv_base <- application_vgg16(
  weights = "imagenet",
  include_top = FALSE,
  input_shape = c(150, 150, 3)
)
model <- keras_model_sequential() %>% 
  conv_base %>% 
  layer_flatten() %>% 
  layer_dense(units = 256, activation = "relu") %>% 
  layer_dense(units = 1, activation = "sigmoid") 

It is then crucial to freeze the weights of the model’s convolution base using the freeze_weights() function. This is necessary to prevent the model from unlearning the classifications while it is being trained on the data.

freeze_weights(conv_base)

The next step is a process called data augmentation. The number of images with trash cans in the dataset is limited. To prevent the model from overfitting on the images of trash cans, we use a process to transform the images into different but believable images using stretching, rotation, flipping, and zoom. This increases the size of the training set with the limited number of images that are available to us, which should make the model perform with a higher success rate. We use the image_data_generator() command to perform these operations, as seen below:

train_datagen = image_data_generator(
  rescale = 1/255,
  rotation_range = 40,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE,
  fill_mode = "nearest"
)

Now, we are ready to train the model using the images from the training set. The code below trains the model by setting up the training and validation datasets, while also increasing the number of images in the training dataset using the augmentation process from above. We only rescale the validation dataset so that the computer is able to look at the same size image.

test_datagen <- image_data_generator(rescale = 1/255)  

train_generator <- flow_images_from_directory(
  train_dir,                  # Target directory  
  train_datagen,              # Data generator
  target_size = c(150, 150),  # Resizes all images to 150 × 150
  batch_size = 20,
  class_mode = "binary"       # binary_crossentropy loss for binary labels
)

validation_generator <- flow_images_from_directory(
  validation_dir,
  test_datagen,
  target_size = c(150, 150),
  batch_size = 20,
  class_mode = "binary"
)

model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 2e-5),
  metrics = c("accuracy")
)

history <- model %>% fit_generator(
  train_generator,
  steps_per_epoch = 100,
  epochs = 30,
  validation_data = validation_generator,
  validation_steps = 50
)

Running the history argument allows you to see how the model improves its accuracy and reduces its loss score on both the training set and the validation set.

Accuracy is the rate at which the model is able to correctly classify images as containing a trash can or not containing a trash can. Ideally, the accuracy should be as high as possible.

Loss is a value that we attempt to minimize during our training of the model. The lower the loss, the closer our predictions are to the true labels.

Below, the graph shows the improvement of the model’s classification on each epoch, meaning each time the model is run and retrained on the training data. As specified in the history argument, there were 30 epochs used the train this model.

You can see in this model, the training data’s accuracy and loss improve on both the training and validation data with each epoch. This outcome indicates that the model is not too overfit on the training data. However, ideally the validation data would have a higher accuracy and lower loss than training data, as that would indicate that the model is making its predictions well once it knows how to classify images.

4: Creating predictions from the model

You can run the model on your testing data and generate an accuracy and loss score with the code seen below. An important note when making predictions: make sure that the batch_size multiplied by the steps argument is equal to the total number of files in your prediction set.

test_generator <- flow_images_from_directory(
  test_dir,
  test_datagen,
  target_size = c(150, 150),
  batch_size = 13,
  class_mode = "binary",
  shuffle = FALSE ## You must have this on to make sure that each image is given the correct probability
)

model %>% evaluate_generator(test_generator, steps = 43)

This generated a result of 92% accuracy on the test set, with a loss of about 20%. This is not optimal, but it is a good start for a simple computer vision model.

Running this evaluation does not generate a prediction for each image. I wrote a simple script which generates a data frame that has the predictions and original classifications of the images.

predict_model <- model %>%
  predict_generator(test_generator, step = 43, verbose = 1)

stat_df2 <- as.data.frame(cbind(predict_model, test_generator$filenames)) %>%
  # assign prediction probability for filenames
  rename(
    predict_proba = V1,
    filename = V2
  ) %>%
  mutate(pred = as.numeric(as.character(predict_proba))) 
stat_df2 <- stat_df2 %>% 
  mutate(reallabel = substr(filename, 0, 5)) %>% 
  mutate(label_number = ifelse(reallabel == "notra", 0, 1))

From this data frame, I generated an ROC curve as seen below. An ROC curve shows the tradeoff between a true positive classification (the image is classified as having a trash can and it actually has one) and a false positive classification (an image is said to have a trash can) at different probability cutoffs.

The area under the curve indicates the strength of the model’s ability to discriminate between images that did and did not contain trash cans. This metric is used to measure how well the model predicts data, known in statistics as goodness of fit. An area of .96 indicates a very good ability to discriminate, a 92 percent improvement in classification over a coin flip.

I then needed to choose a probability cutoff to translate a probability into a classification of an image containing or not containing a trash can. I wanted to choose a cutoff that maximize both the sensitivity and specificity of the model. Sensitivity is the ability for the model to correctly classify images with trash cans; specificity is the ability for the model to correctly classify images without trash cans. Using the code below, I calculate the point on the ROC curve for which the specificity and sensitivity is optimized.

pred <- prediction(stat_df2$pred, stat_df2$label_number)
#Below, tpr = true positive rate, another term for sensitivity
#fpr = false positive rate, or 1-specificity
roc.perf = performance(pred, measure = "tpr", x.measure="fpr")

opt.cut = function(perf, pred){
  cut.ind = mapply(FUN=function(x, y, p){
    d = (x - 0)^2 + (y-1)^2
    ind = which(d == min(d))
    c(sensitivity = y[[ind]], specificity = 1-x[[ind]], 
      cutoff = p[[ind]])
  }, perf@x.values, perf@y.values, pred@cutoffs)
}
print(opt.cut(roc.perf, pred))

The optimal cutoff generated was .3787147. A image that was assigned a probability higher than this cutoff would be classified as having a trash can. Using this probability cutoff, I augmented the stat_df2 data frame to classify each image as containing or not containing a trash can based on the prediction cutoff, and then classifying those predictions as correct or incorrect.

stat_df2 <- stat_df2 %>% 
  mutate(predicted_label = ifelse(pred > .3787147, 1, 0)) %>%
  mutate(predicted_label = as.integer(predicted_label)) %>%
  mutate(predicted_label_name = ifelse(predicted_label == 0, "notra", "trash")) %>% 
  mutate(correct = ifelse(predicted_label_name == reallabel, "Correct", "Incorrect"))

I then mapped the classifications of the 133 images of trash cans to see if there was any spatial clustering of the correct and incorrect images. There does not seem to be any particular pattern, which is a positive observation because I hope to generalize the model to the rest of the City.

Below, you can see sample images and their classifications of containing or not containing trash cans.

Image has trash can, classified as trash can image

Image has no trash can, classified as no trash can image

Image has trash can, classified as no trash can image

Image has no trash can, classified as trash can image

In inspecting these images, it didn’t seem like there was rhyme or reason as to why some images were classified correctly or some were not. The most consistent theme that popped out to me was that many of the misclassified images that had no trash cans had fire hydrants or other objects similar in shape to a trash can in them. Because the neural network only “sees” bounding boxes, it is possible that fire hydrants are seen the same as trash cans and classified accordingly.

5: Predicting for all of Philadelphia

I wanted to then create predictions for the number of trash cans in all of Philadelphia using the street nodes shapefile. I used the 24,706 images generated from the street nodes shapefile and applied the model to them. I then used the same cutoff from the previous model to classify if the image had trash cans or not. I created a count and a rate of trash cans for each neighborhood, which is explorable on the map found here.

Rate of Trash Cans By Neighborhood

You can click on each neighborhood to see the number of trash cans, the percent of street corners that have a trash can, and the litter score for the neighborhood.

I created a correlation graph to see if there were statistically significant relationships between litter score and rate of trash cans. I chose rate becausse it accounts for the size of the neighborhood, as it is calculated using the number of street corners with trash cans over the number of total street corner images generated. Trash can rate had no significant relationship with litter score.

In an ideal world, we would expect to see that a neighborhood with a higher litter index score would have more trash cans to meet the demands of more litter. What I expected to see was that neighborhoods with a higher litter index would have fewer trash cans. But instead, the lack of correlative relationships indicates that a more sophisticated trash can allocation approach could be used to address the City’s litter problem in the neighborhoods with the most public detritus.

It is important to emphasize that this prediction is only looking at street corners. There are trash cans that exist outside of street corners, so it is very likely that this is undercounting the number of trash cans per neighborhood. Due to data processing limitations, however, this represents a good start in understanding where trash cans are located in Philadelphia.

6: Concluding Remarks

This exercise has two purposes. The main purpose is to demonstrate to the City of Philadelphia that it is possible to use a data driven approach to identify trash can locations. A Keras model, while imperfect, does have the ability to look at an image and understand if it contains a trash can or not.

This approach could be used to create a more robust dataset of trash cans throughout the City to be used by the Streets Department. If all images of sidewalks in Philadelphia were fed into a Keras classification model that was well-trained on many images, it would be possible to create a geo-located database of trash cans. Understanding trash can locations would help the City curtail litter by creating a contingency to empty these public s on a regular basis.

Although the metrics of this model, loss and accuracy, were far from perfect, I was able to demonstrate that the model is still able to classify the presence of trash cans well. I hypothesize that with better imagery and a more robust dataset of trash cans, I would be able to further develop this model and generate more accurate predictions with a lower loss score. The City of Philadelphia has a license to Cyclo Media, which provides high-resolution street view imagery. Access to that database may allow me to improve my classification model.

The second purpose of this assignment was to learn how computer vision worked and to create a bespoke example in RStudio. I encountered many difficulties along the way, mostly because running these neural network models requires a high amount of computational power. A computer with higher CPU and GPU would be able to run these models in mere seconds, while each model run for me took upwards of 5 hours. I also learned that the googleway package, while robust, does not scrape the highest quality images from the street view functionality. It would be better in the future to simply take screen shots of each location that would have a trash can, rather than trying to do so through the console.