House hunting can be a daunting experience given there is so much information to consider. rhousehunter aims to simplify the information collection process for end-users with four simple function syntax in R.

This document will show you how to use the functions of rhousehunter package to gather rental information on Craiglist with ease.

Scraping

The first function in our package is the scraper(). Here you will input a Craigslist housing url for the main housing and apartment rentals page of Craigslist BC and designate the argument online = TRUE to scrape directly from the internet. When online = FALSE the scraper function will scrape from a local HTML file, this may be handy if the Craigslist website is down or for internal development and test. Please note that you cannot input the url for an individual listing.

url <- "https://vancouver.craigslist.org/d/apartments-housing-for-rent/search/apa"

scraped_data <- scraper(url, online = FALSE)
head(scraped_data)
#> # A tibble: 6 x 3
#>   listing_url                                                  price  house_type
#>   <chr>                                                        <chr>  <chr>     
#> 1 https://vancouver.craigslist.org/bnc/apa/d/burnaby-must-see~ $1,250 1br-600ft~
#> 2 https://vancouver.craigslist.org/rds/apa/d/surrey-bedroom-b~ $1,300 2br-      
#> 3 https://vancouver.craigslist.org/van/apa/d/vancouver-furnis~ $1,850 1br-500ft~
#> 4 https://vancouver.craigslist.org/van/apa/d/vancouver-yaleto~ $3,695 2br-900ft~
#> 5 https://vancouver.craigslist.org/van/apa/d/vancouver-bed-ba~ $2,390 2br-748ft~
#> 6 https://vancouver.craigslist.org/van/apa/d/cozy-bedroom-apa~ $1,675 1br-500ft~

Cleaning

Our data_cleaner() function is straightforward and powerful tool. It turns the tibble with data generated by the scraper() function into a clean and tidy tibble object. It has a single input, which is the output of the scraper() function.

cleaned_data <- data_cleaner(scraped_data)
head(cleaned_data)
#> # A tibble: 6 x 5
#>   listing_url                                 price num_bedroom area_sqft city  
#>   <chr>                                       <int>       <int>     <int> <chr> 
#> 1 https://vancouver.craigslist.org/bnc/apa/d~  1250           1       600 burna~
#> 2 https://vancouver.craigslist.org/rds/apa/d~  1300           2        NA surrey
#> 3 https://vancouver.craigslist.org/van/apa/d~  1850           1       500 vanco~
#> 4 https://vancouver.craigslist.org/van/apa/d~  3695           2       900 vanco~
#> 5 https://vancouver.craigslist.org/van/apa/d~  2390           2       748 vanco~
#> 6 https://vancouver.craigslist.org/van/apa/d~  1675           1       500 <NA>

Filtering

The filter() function allows you to filter the cleaned data to find the rentals meeting your specifications. The inputs of this function include: the tibble object generated by data_cleaner(), along with the numeric values for the minimum price, maximum price, minimum square feet, minimum number of bedrooms, and a string of the city name of the desired rentals. It outputs a tibble object with the matching results.

filtered_data <- data_filter(cleaned_data, 
                             min_price = 1000, 
                             max_price = 2000, 
                             sqrt_ft = 500, 
                             num_bedroom_input = 1, 
                             city_input = 'Vancouver')
filtered_data
#> # A tibble: 44 x 5
#>    listing_url                                price num_bedroom area_sqft city  
#>    <chr>                                      <int>       <int>     <int> <chr> 
#>  1 https://vancouver.craigslist.org/van/apa/~  1850           1       500 vanco~
#>  2 https://vancouver.craigslist.org/van/apa/~  1675           1       500 <NA>  
#>  3 https://vancouver.craigslist.org/van/apa/~  1700           3        NA vanco~
#>  4 https://vancouver.craigslist.org/van/apa/~  1575           1       500 <NA>  
#>  5 https://vancouver.craigslist.org/van/apa/~  1550           1       500 <NA>  
#>  6 https://vancouver.craigslist.org/van/apa/~  2000           2       850 vanco~
#>  7 https://vancouver.craigslist.org/bnc/apa/~  1450           1       900 <NA>  
#>  8 https://vancouver.craigslist.org/rds/apa/~  1500           2        NA <NA>  
#>  9 https://vancouver.craigslist.org/van/apa/~  1650           1        NA vanco~
#> 10 https://vancouver.craigslist.org/nvn/apa/~  1750           1       505 vanco~
#> # ... with 34 more rows

Emailing

At this stage, you can choose to email your filtered results in a .csv. You will need to input the email address you wish to send the results to and the filtered tibble object. You also have the choice to change the optional email_subject argument to set your email subject. After the function runs through smoothly without error, you should also receive an email from in your chosen email’s inbox.

send_email(email_recipient = "elabandari@gmail.com", 
           filtered_data = filtered_data,
           email_subject = 'Results from RHouseHunter')

We do hope rhousehunter makes your house hunting process easier.