Analyze Price to Rent Ratio Across Markets in Python
Want to analyze housing data across markets? With Quandl's API you can access Zillow Housing Data for free to get information on home values, inventory, and sales. In this video, I'll walkthrough how to use the API in python and get price to rent ratio.
Transcription
Ariel Herrera 0:00
Hey everyone, are you looking for a way to be able to get price to rent ratios for Metro and ZIP Code areas throughout the United States. That way you can compare different markets accurately and fast. Well, I'm going to show you how to do that using Python And The quandl API. My name is Ariel Herrera with the analytics erielle channel where we bridge the gap between real estate and technology. This is part two of a two part series, and being able to get this data with quandl. And use Python to quickly create a chart that's going to allow us to see what areas particularly zip codes may have the best price rent ratios. This is the kind of concept that you enjoy them, please subscribe to this channel as well like this video, so I know to make more of it. Alright, let's get started. In the last video, we explored Zelos housing data which does encompass of house values, inventory, as well as rental values. They publish this data for free, and it's publicly available by going onto their site and downloading files on different geographies, all the way down to the zip code. Now, in this channel, we tried to make things as efficient as possible. And this really wouldn't be a good solution in the long term, since working with different CSV files, and just an Excel application could open itself to error. And it's better for us to do this programmatically. So we were able to find the same exact data on NASDAQ data link and one of their databases, particularly with the quandl API quandl. You might have heard of their API before, as they're one of the leaders in order to extract data programmatically, like stock prices. In this particular dataset. It's called Zillow real estate data. They mentioned here and documentation that it has information on markets such as rental sales and inventory. And they also go deeper into what those regions are, and some more information on what the actual data sets are. But for this purpose, we're going to dive straight into the information for price rent ratio. So you might be completely new to price to rent ratio, and that's okay, let's go over it pretty quickly. So from stessa, you can read more about it on their site, but they basically go into it and say price to rent ratio is a calculation real estate investors use to forecast the potential demand for rental property in a given market, the simple formula is dividing the median home price by the median annual rent in the market. So for example, say if the median home price in a market is 120,000, the median rent is 11,000. Price to rent ratio is 10.91. In a market that's cheaper to rent than to own, if we increase that home price value, we could see that the price rent ratio increases. So the way we think about this is say if you're a landlord maybe wants to get into markets where it's difficult for tenants to be able to afford their own home. So thus, you might think that you're going to have a better tenant pool, and it's going to have a better opportunity of having renters in the future. An example of a market with a high price to rent ratio I could imagine would be New York City as well. So this can also be an indicator to see is an area starting to become unaffordable is an area that we could still jump into on an earlier phase to make an impact as an investor. So you could go more into this article, if you want to understand this a little bit more. And I personally don't use this as a sole calculation. But it is a good attribute to at least have handy. Now let's dive into the notebook on how we get price to rent ratio using the cuando API and Python. So for our first part, here, we're going to select the region of interest. And in this case, I'm going to my original home state New Jersey, and I'm using Pandas to select that by state, then I use a group by and I aggregate how many zip codes are actually in these metro areas, because I want to just focus in on one particular metro area and see how price to rent ratio is. So in this case, the top five that we get back is Newark, New York and Jersey City Metro Area with about 350 threes of coats. And that's very interesting to know. So what I'm going to do here is just filter on this metro area, just so we can have our sample size more down instead of looking at the whole United States, but feel free to do that as well. So here the next thing that I want to do is get the price to rent ratio. But first I'm going to get home values and then rent
Unknown Speaker 5:00
information. So I created this function called get latest solo data by region. So in my particular case, I don't really care what the previous home values were, say in 2010, I really just care about right now, what does it look like for the zip codes in this metro area for the price to rent ratio. So this function, what I do is, I go to this quandl. Get table Zillow data, I pass in an indicator, what I noticed is when you actually pass in more than 200 regions to the API, it stalls either does not complete the request, or it takes a very long time, which was unusual. So what I did to fix that is I first get the length of how many regions are passed through. And if it's over 200, then I basically do a for loop, or I only pass in 200 region IDs at a time. If not, then I just straight go and just query the quandl dot get table function to pass in all of my regions at once. Then I look to see per region, what is the latest state, and I only want to get the latest state. So this is why over here, I do a merge an inner join so that it's only joining on those regions with their latest date, and that information. So now that I have that function, I can now easily call this data. So if I want to get the median home price, I just send in all of my regions, which in this case is going to be all the zip codes from the New York Newark Jersey City area. And I want to get z all which is going to be comprising of all types of properties, condo, co op and single family to get home value. So if I look at the first item within that data frame, I can see that for a particular region, I was able to get that home value. And this is a longer data frame, of course, but I just want to see the first one to make sure it worked. Then I repeat this process, I also send in the same exact regions. And then instead I specify the rent code, so you get the rent value. Now the next step here is if you notice, I'm able to get the data back. But region ID doesn't really help me What is this actual name of the city and what's the zip code, that will be a little bit better of context. So this third part is where I merge these tables together. And now, if you look down at the bottom, we're able to get the indicators that we have the region associated median price median rent, which you see there's a no value here. And we'll get into that zip code state, county, city and Metro. So now we have a more full dataset.
Unknown Speaker 7:53
And the next step is adding features. So the price to rent ratio feature is actually super easy. It's just taking this median price that we got. And then the median rent is multiplied by 12 for 12 months, and voila, we have the price rent ratio.
Unknown Speaker 8:10
Then I want to plot this data. And I was looking for a good library that could help me pinpoint where to plot this was looking at many different types of Python plotting tools, but some of them were just a little bit more intricate for these purposes. So what I did was, look to find the general latitude and longitude per zip code, which from that library that we installed in the last video, I'm able to get the query postal code and this case, I get the latitude and longitude. And now we have two more columns that have that information, and then specify which columns I want to look at. And then I sort this by the top PRR, which is a top price to rent ratio. Next, I take this data frame and I just remove whenever price to rent ratio is no which we'll touch on in a moment. I send in my Mapbox token, which is really it's a Plotly easy to get a key. And then I use Plotly express the scatter map box function. And this is super easy. I just pass in my data frame. what columns are going to be the latitude and longitude. What is the color so what's going to be the variable that changes here, which is price rent ratio. So we could see if we look at the price rent ratio, where it's most affordable, would be on the lower end here and on the higher end would be higher price to rent ratio. Most people choose to rent probably versus own because of the high rents relative to the price and the number one hears for Somerset County. Not as familiar with that county and more familiar with those that are
Unknown Speaker 10:00
to the ocean. But it would definitely be something to look at further and at least get some oversight. But one of the things that frustrates me at this point is why do we have only a couple bubbles here? Shouldn't there be so many more zip codes? I'm a little bit confused. So I want to check why is this data missing. And for the zip codes within this metro area that I was focused on, only 12 of them were plotted. And when I look at this, I pick one city in particular that had no values, which was Patterson. So for Patterson, Paterson, we were able to get the median price, but not the median rent. And these are all the zip codes within Paterson, the city.
Unknown Speaker 10:47
So I drill down a little bit more just to look at, say, Patterson at a city type and not zip code. So my theory was maybe zip code is too granular. And for some reason, not all the rental data comes through. So let me just try high level at City. And I queried again, that quandl data I put in the region for that city, and then it still didn't come up. So I guess there is still some missing data here. So you're not going to be able to get a full dataset for all zip codes, just using the Zillow housing data. However, I think this is still a great step. And if we can use other resources to get average or median rent for ZIP code, or metro area, whatever we're looking for, we can append this, we could be a slew say, all right, from Zillow, we were getting no values. But from another data set, we have that let's append that and then let's rerun the price to rent ratio. So I hope this has been useful just to be able to understand how you can use this dataset to actually calculate some metrics, how you can plot it, and some ideas and how you can improve upon the dataset. If you haven't already, please subscribe. Thanks