Data Exploration

data from google street view, google place, Craigslist Boston, Zillow and Data Boston.


On top of the top down data (generated by the local government), the bottom-up data (post processing of Google Street Views, Google Places, and Craigslist Boston), and the top-down data(Crime data from Data Boston and hosing price data from Zillow ) can be mapped and deployed for train and contrive the analytical model of predicting housing price. To process the google street view data, there are two data structures (pixel and graph data structure) where individual data are populated and calculated. Pixel data structure is a matrix, discretizing an urban or district into a finite setting for analysis, in which each pixel has the relationship with its neighbors, and each one computes its own data on the basis of neighbors’ settings, so that urban data can be naturally addressed and computed in spatial context.

Data from Craigslist Boston

[keyword: house and room]

As the bottom data, house and room data from Craigslist with Geo coordinate position and text data can be utilized as a train data for the housing prediction model..

MA scale

Boston scale


number of data used and keyword:
house - 2,500 posts including text data
room - 2,500 posts including text data
JSON: download

Data from Google Place

The interesting pattern from the data visualization is that, in the scale of MA, the evenly sparse density of Housing visualize the contrast to the room data which compacted around particular areas in Boston area. When it comes to text data, there are...


number of data used: 20,559 places

Type of query:

'parking', 'veterinary_care', 'airport', 'plumber', 'roofing_contractor', 'atm', 'meal_takeaway', 'hair_care', 'insurance_agency', 'school', 'synagogue', 'stadium', 'movie_theater', 'doctor', 'zoo', 'electrician', 'establishment', 'funeral_home', 'spa', 'aquarium', 'storage', 'casino', 'park', 'courthouse', 'hospital', 'subway_station', 'painter', 'moving_company', 'movie_rental','embassy', 'fire_station', 'gym', 'bicycle_store', 'local_government_office', 'book_store', 'police', 'florist', 'museum', 'lawyer', 'car_rental','real_estate_agency', 'physiotherapist', 'electronics_store', 'hindu_temple','car_dealer', 'jewelry_store', 'gas_station', 'mosque', 'liquor_store', 'campground', 'library', 'university', 'accounting', 'travel_agency', 'finance', 'locksmith', 'bank','convenience_store', 'health', 'church', 'bakery', 'lodging', 'laundry', 'shopping_mall', 'dentist', 'store', 'cemetery'

JSON: download

Data from Google street view

Deep learning for semantic segmentation

Caffe framework with ADE20K dataset(CSAIL MIT)


"path": imgFinal/StreetView_theBoston_0-1024_292.0_6_70.png
" coordinates": -71.14012330788046, 42.35124604676936
"node_id": 0-1024

semantic segmentation

"building, edifice": 410351 pixel
"sky": 114653 pixel
"plant, flora, plant life": 59085 pixel
"car, auto, automobile, machine, motorcar": 5375 pixel
"tree" : 33864 pixel
"road, route": 28512 pixel
"sidewalk, pavement": 20407 pixel
"house" : 18637 pixel
"grass": 15550 pixel
"fence, fencing" 14803 pixel
"wall": 8737 pixel
"stairs, steps" : 1979 pixel
"earth, ground": 1329 pixel
"signboard, sign": 1244 pixel
"bridge, span" : 896 pixel
"path" : 393 pixel
"ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash
barrel, trash bin" : 374 pixel
"person, individual, someone, somebody, mortal, soul": 346 pixel
"bus, autobus, coach, charabanc, double-decker, jitney, motorbus, motorcoach, omnibus, passenger vehicle": 315 pixel
"truck, motortruck": 297 pixel
"stairway, staircase" : 98 pixel
"railing, rail": 30 pixel
"van": 5 pixel

The semantic segmentation from the Deep Learning make it possible to compute numerical computation and calculate precise ratio of objects in the google street view. This data is utilized for training the prediction model.


Image data used: 18,588 images

Semantic type:

airplane, aeroplane, plane, animal, animate being, beast, brute, creature, fauna, apparel, wearing apparel, dress, clothes, arcade machine, armchair, ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin, awning, sunshade, sunblind, bag, ball, bannister, banister, balustrade, balusters, handrail, bar, barrel, cask, base, pedestal, stand, basket, handbasket, bathtub, bathing tub, bath, tub, bed, bench, bicycle, bike, wheel, cycle, blanket, cover, blind, screen, boat, book, bookcase, booth, cubicle, stall, kiosk, bottle, box, bridge, span, buffet, counter, sideboard, building, edifice, bulletin board, notice board, bus, autobus, coach, charabanc, double-decker, jitney, motorbus, motorcoach, omnibus, passenger vehicle, cabinet, canopy, car, auto, automobile, machine, motorcar, case, display case, showcase, vitrine, ceiling, chair, chandelier, pendant, pendent, clock, column, pillar, computer, computing machine, computing device, data processor, electronic computer, information processing system, conveyer belt, conveyor belt, conveyer, conveyor, transporter, counter, crt screen, curtain, drape, drapery, mantle, pall, cushion, desk, door, double door, earth, ground, escalator, moving staircase, moving stairway, fan, fence, fencing, field, fireplace, hearth, open fireplace, flag, floor, flooring, flower, food, solid food, fountain, glass, drinking glass, grandstand, covered stand, grass, hill, house, hovel, hut, hutch, shack, shanty, kitchen island, lake, lamp, land, ground, soil, light, light source, microwave, microwave oven, minibike, motorbike, mirror, monitor, monitoring device, mountain, mount, ottoman, pouf, pouffe, puff, hassock, oven, painting, picture, palm, palm tree, path, person, individual, someone, somebody, mortal, soul, pier, wharf, wharfage, dock, pillow, plant, flora, plant life, plate, plaything, toy, pole, pool table, billiard table, snooker table, poster, posting, placard, notice, bill, card, pot, flowerpot, railing, rail, refrigerator, icebox, river, road, route, rock, stone, rug, carpet, carpeting, runway, sand, sconce, screen door, screen, sculpture, sea, seat, shelf, ship, shower, sidewalk, pavement, signboard, sign, sink, sky, skyscraper, sofa, couch, lounge, stage, stairs, steps, stairway, staircase, step, stair, stool, stove, kitchen stove, range, kitchen range, cooking stove, streetlight, street lamp, swimming pool, swimming bath, natatorium, table, tank, storage tank, television, television receiver, television set, tv, tv set, idiot box, boob tube, telly, goggle box, tent, collapsible shelter, toilet, can, commode, crapper, pot, potty, stool, throne, towel, tower, trade name, brand name, brand, marque, traffic light, traffic signal, stoplight, tray, tree, truck, motortruck, van, vase, wall, wardrobe, closet, press, washer, automatic washer, washing machine, water, waterfall, falls, windowpane, window

JSON: download

Crime data visualization

The crime data visualization shows the relationship between area and violence in a sense of social problem in the urban context. Although the data covers range from trivial one to serious incident, it could be categorized and harnessed for train data.


number of data used: 5,001 crimes

JSON: download

Housing Price data visualization

Most importantly, the housing price data from Zillow is a key data for training model. This is because one of our assumption is that there should be a strong relationship between the prices and urban environmental condition.


number of data used: 988 houses

JSON: download

Rent Price data visualization

Additionally, rent is also very important data for social economic data since Boston is famous for educational city meaning that many of students could not be able to buy house. This perspective is give us different lenses of viewing and understanding Boston.


number of data used: 13049 rents

JSON: download

Poperties Assessment and Energy Use Intensity visualization from City of Boston

Lastly, as a social economic data (Top-down) take into consideration, so that the features are trained with the urban data.

number of EUI(Energy Use Intensity) data: 1806 Data
JSON: download

number of Property Assessment data: 50869 data
JSON: download


On top of Pixel and Graph Data, we query the selected Pixel based on longitude and latitude to generate the synthesized data matrix for traning machine.
library we made : Pixel map class(c#) , Data visualization(python)

Rent Price Data for Machine Learning[CSV]: download
data type: numeric, string, and category
featrues: date, title of post, content of post, geographic coordinate

Features Vs Price/SQM

Features in Different Classes (Housing) | High,Middle,Low

Features in Different Classes (Rent) | High,Middle,Low

Housing Price Data for Machine Learning[CSV]: download
data type: numeric, string, and category
featrues: date, title of post, content of post, geographic coordinate


Finally, two csv files both for Housing and rent data is generated with Built Urban Environment Assessment data in City of Boston. Now it is ready to train the analytical models.