Robotic House Hunting

March 3rd, 2009  |  Published in ruby  |  5 Comments

So, we’re looking for a house. One fun part of the process is looking at houses and wondering what it would be like to live in them. One tedious part is finding the places and rounding up all the info on them.

We start by looking at RMLS.com, which is o.k. for ferreting out what’s available in our parameters. The listings there are useful for the basics: rooms, age, asking price, misc. features. The problem is that the listings aren’t bookmarkable. Once you find a listing you have to keep its listing number around so you can find it again, or you can print it out and keep it that way.

To pad out our information on a place, we go to a few more sites:

  • http://portlandmaps.com, which provides a bunch of good stuff: assessed price, taxes, permit history, crime maps, lot illustrations, the name of the neighborhood, etc.

  • http://zillow.com, which provides some supplemental information and rudimentary comps

  • http://maps.google.com, where we save the address in a map we’re keeping of what we’re interested in, what we’ve gone to look out and what we’ve rejected. GMaps Streetview is also good, because you can get a look at the angles the realtor didn’t take pictures of. In a lot of infill situations, pictures often hide just how crowded a place is. Streetview is also a good way to pan around and see what neighboring houses are like. We also have our map points in Google Earth, which is a more fluid experience for getting aerial photos, which also help with judging how crowded a house is on its lot, how big the yard is, etc.

  • http://walkscore.com, which provides a walkability score for a given address by tallying up nearby amenities like restaurants, grocery stores, bookstores, schools, parks, etc.

I should pause at this point and note that a. I know there has to be more and better information out there, but b. even so, those four sites provide a lot of good information.

Initially, when we’d get together with our agent to go look at places, I’d print the RMLS listings to PDFs which I’d store in Evernote to take along on the iPhone. That was slow and the information on hand while looking at a place was limited.

To speed things up, my first iteration of automating some of the process was to scrape the RMLS listings to get just the basic information off of a page and save it as text. From there, things kind of snowballed into a very crude pocket mashup: I scrape the RMLS listing and use that to seed either API calls to Zillow, WalkScore or Google Maps or just use the information to create links to canned searches on PortlandMaps.com.

The resulting HTML we get out of it gets imported into Evernote and provides the basic RMLS listing, a map of the area around the house, the walk score, links to the RMLS slideshow for the house, and links to all the other sites we consult. On the iPhone, the Google Map links open the map app instead of the Web page, so it’s easy to look a place up and get directions to it. Because Evernote is the target, the HTML is pretty plain jane. It would be easy, for instance, to embed the picture shows or a dynamic Google Map, but that stuff gets stripped out anyhow, so there was no point in bothering.

I was using appscript to create the notes right in Evernote, but it got crabby about importing the HTML programatically. Now I just write the HTML out to a file, drag it into Evernote and it imports fine. The other bit of inefficiency is that I haven’t bothered to make getting the listing numbers into it particularly easy: I just put them in an array at the top of the script instead of popping up a dialog box or taking them from ARGV.

Anyhow, I put up a sample of the HTML output.

As usual, there are some great gems involved, in particular:

  • rillow, which provides access to the Zillow API. I don’t make the most of this at the moment. It just pulls in a reliable link to a given address.

  • Geokit, which provides geocode information for a given address

Mechanize is good because it can handle redirects, which PortlandMaps and RMLS both throw into the mix when searching. OpenStruct is handy to keep everything straight.

So I suppose this was the other fun part of looking for a house:

    

    #!/usr/bin/env ruby

    require 'rubygems'

    require 'mechanize'

    require 'rillow'

    require 'cgi'

    require 'ostruct'

    require 'open-uri'

    require 'geokit'

    require 'json'

    require 'net/http'

    include Geokit::Geocoders



    listings = [AN ARRAY OF RMLS LISTING NUMBERS]

    agent = WWW::Mechanize.new

    rillow = Rillow.new(YOUR ZILLOW API KEY)

    ws_api_key = "YOUR WALKSCORE API KEY"

    gmaps_api_key = "YOUR GOOGLE MAPS API KEY"







    listings.each do |l|



      # scrape rmls.com

      page = agent.get('http://rmls.com/RC2/UI/search_mlsnumber.asp')

      rmls_form = page.form('Mainform')

      rmls_form.fields[5].value = l

      page = agent.submit(rmls_form)



      report_form = page.form('mainform')

      report_form.ID = l

      report_form.RID = 'RC_DETAIL_001'

      report_form.PMD = '1'

      report_page = agent.submit(report_form)





    # set up the tables/rows for parsing

      doc = Hpricot(report_page.body)

      root_table = doc.search("//div[2]/div/table/tr[1]/td/table[1]")

      info_table = root_table.search("/tr[1]/td/table/tr/td[2]/table")

      info_table2 = root_table.search("/tr[2]/td/table")

      features_table = doc.search("//div[2]/div/table/tr[1]/td/table[2]")



      house = OpenStruct.new

      house.listing_no = l

      house.price = info_table.search("/tr[1]/td").inner_text.match(/\$\d{2,},\d{1,}/)[0].to_s

      house.beds = info_table.search("/tr[2]/td").inner_text.match(/\d{1,}/)[0].to_s

      house.baths = info_table.search("/tr[3]/td").inner_text.match(/\d{1,}/)[0].to_s

      house.year = info_table.search("/tr[6]/td").inner_text.match(/\d{4}/)[0].to_s

      house.area = info_table.search("/tr[8]/td").inner_text.match(/\d{3,}/)[0].to_s

      house.address = info_table2.search("/tr[2]/td").inner_text.gsub!(/Address: /, "").strip!

      house.comments = info_table2.search("/tr[10]/td").inner_text.strip!



      geoinfo = MultiGeocoder.geocode(house.address)

      house.lng = geoinfo.lng

      house.lat = geoinfo.lat

      house.ll = geoinfo.ll

      image = doc.search("//div[2]/div/table/tr[1]/td/table[1]/tr[1]/td/table/tr/td[1]/table/tr[1]/td/table/tr/td/img[1]")



      image.set(:height => "120", :width => "160")



      house.image = image

      house.street = house.address.gsub(/^(.+?)Portland.*/, "\\1")



    # get the zillow information

      result = rillow.get_search_results(house.street,"Portland, OR")

      house.zillow_url = result.find_attribute("homedetails").to_s.gsub(/\/$/, "")

      zillow_link = "<a href='#{house.zillow_url}'>Zillow Link</a>"



    # get the portlandmaps.com information

      maps_url = "http://portlandmaps.com/parse_results.cfm?action_override=&query=#{CGI.escape(house.street)}"

      maps_link =  "<a href='#{maps_url}'>Portland Maps Link</a>" 



    # google maps

      gmaps_url = "http://maps.google.com/maps?q=#{CGI.escape(house.address)}"

      google_maps_link = "<a href='#{gmaps_url}'>Google Maps Link</a>"

      google_map = "http://maps.google.com/staticmap?center=#{house.ll}&zoom=15&size=256x128&frame=true&maptype=mobile &markers=#{house.ll}&key=#{gmaps_api_key}&sensor=false"



    # slideshow from rmls.com

      photo_viewer_link = "<a href='http://rmls.com/RC2/UI/photoViewer.asp?ml=#{l}'>Photo Viewer Link</a>"



    # get the walkscore

      ws_base_url = "http://api.walkscore.com/score?format=json"

      ws_url = "#{ws_base_url}&lat=#{house.lat}&lon=#{house.lng}&wsapikey=#{ws_api_key}"

      ws_data = Net::HTTP.get_response(URI.parse(ws_url)).body

      pp ws_data

      ws_json = JSON.parse(ws_data)

      house.walkscore = ws_json["walkscore"]

      house.walkscore_link = ws_json["ws_link"]



      walkscore_link = "<a href='#{ws_json["ws_link"]}'>Walkscore Link</a>"









    listing =  <<HERE

    <html>

    <head>

      <title>#{house.address} (#{house.listing_no})</title>

      </head>

      <body>

    <p><b>#{house.address}</b> (#{house.listing_no})<br />

    #{house.price}, #{house.beds} br / #{house.baths} baths, #{house.area} sq. ft.<br />

    Built in #{house.year}<br />

    Walk Score: #{house.walkscore}

    </p>





    <table>

    <tr>

    <td width="175">

    #{house.image}

    </td>



    <td width="256">

    <img src="#{google_map}" width=256 height=128 frame = 0 alt = #{house.address}" />

    </td>

    </tr>

    <tr>

    <td colspan="2">





    #{zillow_link} | #{maps_link} | #{google_maps_link} | #{photo_viewer_link} | #{walkscore_link}



    </td>

    </tr>

    </table>



    #{info_table2}

    </body>

    </html>

    HERE





    puts "Creating note for #{house.address}, listing #{l}"



    docname = "#{l}.html"

    listing_file = File.new("/Users/mph/Sites/listings/new_#{docname}", "w")

    listing_file << listing

    listing_file.close



    end



    

Responses

  1. Colin says:

    March 6th, 2009 at 11:19 am (#)

    Mike, this is fantastic work. It would be great to see this as a public web app. It’d be super useful to any house hunter.

    Add some jquery snazziness to the front end, maybe a database for users to make comments and tags, and you’ve just launched your own genuine web2.0 company. :)

  2. mph says:

    March 6th, 2009 at 12:00 pm (#)

    Thanks, Colin. :-)

    The Portland Maps, Zillow, Walkscore and Google Maps stuff all seem like they’d be pretty stable. The latter three offer actual APIs, and Portland Maps has been the way it is for years.

    The wildcard is the RMLS.com site. No API and strictly regional. I’ve always wondered why a lot of mashups* are “coming soon to your region” and I guess I know why, now: there are a lot of things like RMLS.com in every community and they feel much more like they’re written “good enough” for the context the designer expects them to be consumed in. Our agent tells us the realtor-facing part of RMLS.com is even more hostile to unintended use: IE-only, unless you lie about your user agent, etc.

    Still and all … like you said: locals might like it if it got prettied up and pointed public. I’d just put a note at the bottom explaining that it’s run by a guy who has 9/10 of a philosophy major and some attentional issues, which ought to tell visitors how reliable it will be after I’m done needing it for my own selfish ends. :-)

    We’re going to find out today if we need it anymore, I think. We’re waiting around for a seller to come back on the results of the inspection.

    * I think this is more of a “smooshup”, but still.

  3. J Graham says:

    March 20th, 2009 at 8:05 pm (#)

    Mike,

    Good stuff! I’m doing something similar with a site I’ve got called housemob.com. Although it takes the custom RMLS report you get from your agent (http://www.rmlsweb.com/v2/public/report.asp?type=CR&CRPT2=longnumberstring) and scrapes that and places all the homes on a Google map for you. I also scrape and display the Zillow Zestimate. Unfortunately I haven’t worked on it in some time so some of the data isn’t being scraped correctly – need to fix that.

    Eventually I’d like to let the user over lay various Portland Maps GIS layers and create house to house driving directions if you’d wanted to go look at a few of the places. Adding the Walk Score would be good too.

    -Jonathan

  4. Hugh says:

    January 10th, 2010 at 4:46 pm (#)

    Hi Mike,

    I stumbled upon this post as a current house hunter and love the usefulness of this web app. What I am afraid is hamstringing me from using this is my lack of knowledge around Ruby and how to execute the code above to output the HTML (making it easy to move to my Evernote as you have). Is the code simply an .rb that you run via Terminal? You’ll have to forgive my naivety with this, but I’m trying to learn a bit of Ruby to understand how to get this to work in our house hunt.

    If you have the time, I’d love to hear from you about how to get this going.

    Thanks,

    Hugh

  5. mph says:

    January 10th, 2010 at 5:04 pm (#)

    Hi, Hugh,

    Yep … it’s just a Ruby script, so you can run it from Terminal.app.

    If you haven’t already, you’ll want to make sure you have all the needed gems:

    sudo gem install mechanize sudo gem install rillow sudo gem install geokit sudo gem install json

    The other problem with scripts like this, which are heavily dependent on screen scraping, is that a minor change can kill the whole thing pretty easily. RMLS.com looks to be the kind of site that won’t ever change much, but if you get a lot of Hpricot errors, I’d look at RMLS.com first.

Leave a Response

© Michael Hall, licensed under a Creative Commons Attribution-ShareAlike 3.0 United States license.