Get Zillow Real Estate Data from Quandl API | Part 1
Want to analyze housing data across markets? With Quandl's API you can access Zillow Housing Data for free to get information on home values, inventory, and sales. In this video, I'll walkthrough how to use the API in python.
Transcription
Ariel Herrera 0:00
Hey fellow real estate investors. Are you looking to be able to analyze markets across the United States?
Ariel Herrera 0:06
Well look no further because Zillow is research economic data does just that. on a frequent basis. Zillow publishes information like home values, and rental rates across the US and they go as deep as zip code level. In this video, I'm going to show you how to query that data using the quandl, API and Python. And this is going to be a two part series where we actually do a case study in the second half to get price to rent ratios, and zip code, so don't miss it. My name is Ariel Herrera with the analytics area channel where we bridge the gap between real estate and technology, I have a huge passion in real estate and being able to simplify it, build tools and help those like you to systemize your processes. So if this is the kind of content that you enjoy them, please subscribe as well as like this content. So I know to make more of it for you. Alright, let's get started. On zils research page, they offer a lot of information in terms of trends analysis, for particularly, we're going to be focusing on the data aspect housing data. So it's really useful is that on a frequent basis, Zillow posts, housing information, including home values, rental, inventory, sales cuts, and more. So this is super useful. If we want to get a sense of how our home values looking within a particular market, do we see it increasing relative to other markets, is this something that can signify to us this might be a new market to venture into. And at the minimum, it just gives us clarity to understand how different markets are performing. So currently, in order to be able to get this data, you would come to zillow.com research slash data, and then based on what you're looking for. So for example, home values, if you want to look at all homes. So this includes single family rentals, condo and Co Op, you could select this data type. And the second piece is the geography. So what's really cool about this is that Zillow goes as deep as the zip code level, and hadn't been exposed to a lot of these aggregated datasets. For example, in the northeast, one of the metro areas is newer, Jersey City, I think New Arkansas, consider one metro area. However, there's so many, many sub markets and neighborhoods within that. So it's really useful to be able to get down to that level with this data. So in this case, you would click zip code, for example, download it, and then you'd be able to open this up in a CSV file. So either open it up for free in Google Sheets, or if you have Excel, and you could just quickly open it up within there. From this data, we get a region ID, we get information on the state name, city, Metro county name, and then information going back to 2000, all the way up to the most recently published data, which is January 31, of 2022. And as of this recording in February, we are only backdated by one full month. So that's completely normal, even though the data is useful. And I could start working with it either in a UI like Tableau or I can start working on it in Python, it's not really useful to do this on a frequent basis every single month. And this is published. And this is where actually utilizing API's comes into play because we could programmatically get this data, have it refreshed, say every month, so that we're not manually going through the process of extracting this running V lookups, or formulas. It's just all automated. So how do we actually do this? Well, Zillow doesn't actually provide this information in the API to my knowledge, so an alternative is going to NASDAQ data link. Within this NASDAQ has a list of API's, and one of them being by cuando, which they have a lot of information as well on the stock market. But in this browser for databases, you could see that once I type in Zillow, that there's a Sylow real estate dataset. And if we click through it, you need to be able to first log in as a free account. So do that quickly. And then you'll be able to see the list of tables. The first of the tables, we have solid data. So this is ultimately what we're trying to get at. And in this case, the indicator ID that we have is ZSF h which is for single family home or a particular region. And we have information going back for every single month here as well as the home value.
Then we have another table, which is actual indicators. So even though the dataset was looking at single family home, there's other indicators that we could look at, for example, five plus bedrooms. And this is the equivalent of when we go back to Zillow site, we select this data type here, it's basically the same thing, selecting it over here. Then we have regions. So this region that was selected was 99999, for the example, but for every zip code, state city pairing, there's a different region ID, which we're going to need to use in order to query the data. And don't worry, this will make a lot more sense as we dive into the actual notebook. But it's good for us to understand what the database is, where the documentation is, if you have further questions as to what the coverage and different codes actually mean. Awesome. So now we have an overview of what the data is where you can find information for documentation. And now let's actually jump into the code. Great. So what you want to do is clone this notebook yourself, the link is below in the show notes. In order to clone you're going to copy the link, go to File, and then save a copy to your drive. Assuming that you have a free Google account. Then, once you do, you're going to see some notes up here. And what's important for you, if you haven't already is to log in with that account for NASDAQ, and request a NASDAQ Data API key. Even though this dataset is free, I highly recommend to be able to get this key. So you can maybe use it for other different tables in the future. And in this example, I'm going to be showing how to actually request the data with the API key, then I have some other useful resources. This includes Google collab cheat sheet, NASA database, and some more. First thing to start off with is running the cells. If you're new to Google collab, it's very similar to Jupyter Notebook. It's just an IDE to run your code. And you don't have to have Python locally installed, which is the best part. So here, we could run this by either clicking the play button, or by clicking runtime, run all. And what we're doing here is just installing the packages that we need. So we need to install quandl. So we can use their functions. And then later in the second half of us actually getting this price to rent ratio analysis, we're going to want to get coordinates. So you'll see why we need that package later. Next, we want to import different libraries we're going to be using, this includes the quandl API. So that's very important, as well as some data manipulation libraries like NumPy, and pandas, and then later, we'll be visualizing this information in Plotly have some functions here that are going to make some of our repeatable code a little bit easier to run for right now just run that. And then in the next two sections, this is very specific to Google collab. So I have it set up that I have a file where all my API keys are. And it makes it a little bit easier to be able to get that data right away. So in this next line, what I do is I say go find that file that has all my API keys. And then I want you to locate where there's quandl. And that box, since we're going to be plotting something later, then get those keys and assign them to these variables, which are strings. And I then reassign this quandl API key to quandl API config, that API key was just how I honestly have seen it done in other documentation. So I have that set there. Great. So for the next part, and the data sets that we saw, we were able to get Zillow indicators here. And if we expand that, we could see all the different indicators that we can get data for. So for example, median, days, depending median list price, single family, home values, etc. Now, if we go back here, it's very easy. We do get table, and then we specify Zillow, and indicators. So within the zillo space, we want to get the indicators table, then when I run this, I'm able to quickly get how many indicators there
are, as well as grouped by what category they're in. So in this case, we have 56 different indicators we could get data for, which is pretty cool, because I'm pretty sure within just the Zillow page, we don't see all of those so we actually get more in the API, which is neat. Now, if we do a group by which is a pandas function, we could see that the home values, inventory and sales and rentals are the indicators available with inventory and sales being the most and it was really neat, I skip this part, but when we actually query from quando, it automatically gets our data into a panda's data frame, which is huge. We don't have to actually transform or wrangle this dataset, it's already easy in a row, column type tabular format. Now, if we look at that indicator table, whenever that category is home values, we could see all those home value indicators that we saw before in that drop down, we have home values from one bed to five plus bed, all single family homes, all homes in general. So this is a good way for us to understand what are indicators we can use to grab data. Now if we use the category rent, we could see here that there's two options. There's our SSA, Rs, Na, and their meanings are seasonally adjusted, and one is not. So you could decide which one you'd like to use as well. There's more information on Zillow say, I believe, on what some of these definitions are, and they believe they have documentation outside of this too. Great. So now we have our indicators that we're going to want to get, because we want to get home values and we want to be able to get rental data. Because in the end, we're looking to get price to rent ratio, which I'll describe a little bit more in the next video. Okay, so we know what data points we want to get. But now we need to understand how do we actually select the regions that we want. Now, if we go to that get table, so we go back to what that table looks like down here, we could see that there's this region type. So the way this data is split is by zip code, city, county, neighborhoods, state and Metro. So the way that works is an example. Douglas County, so Douglas County is the overall County. But within that county, if we want to look at specific zip codes that are going to have different region IDs, if we want to look at the list County as a whole, that's going to be its own region ID. So based on how you want to cut the data, how granular you want to get is what regions IDs you'll select, I want to get as granular as possible. So I'm going to select that. And when we go down down here, I'm basically selecting that I want to look at region types up by I did notice that it's hard for me to be able to query what exact city I want out of this. So for example, everything here is separated by a semicolon. But it'd be a lot easier if it's actually in its own columns. So that I could say, for example, get all zip codes with XYZ city. So what I do here is I have different functions. And these were the functions are up top. So we have check state, so I check to see if the state is in the string, county, city and Metro, you might be asking, Well, why don't you just split the semi colon and then just have your five or six columns that you need? Well, sometimes this region field has six different attributes five, or four, which is why I have this piece here called region string length. So I can actually determine how many attributes are in that string. So you can see here
in the second zip code, it's actually missing, I believe, the metro area. So that's why to use some logic in order to figure out is the Metro present, if not, then put none and then the same thing for the rest. Hopefully, I didn't lose you there. But overall, we're just making the data a little bit easier in the structured form, so we could query it later. Now for the third part. So this is actually getting the data, which is probably what you've been looking forward to the most. So in this example, let's say I want to look at my home's original home state, New Jersey. And one of the cities I used to live in South Amboy. I could query this a lot more easily now because I separated a different columns. And I could say state equals New Jersey and city contain something around South and space a so we get one row that returns and it South Amboy and the county of Middlesex County, which is correct zip codes, correct. And it gives us region ID. So this region ID is how we get all data for South Amboy. Next, we want to get the median home price. So if we remember when we go up to the indicators, we have all these different ways to get home values. And this case, we're just going to go for all we're gonna keep it simple. We want to get the home value for for all which should be a median of those. So what we do here is we got quandl, get table, Zillow and then forward slash data. Then we put our indicator which is z all and then our region ID which is 61233. and wallah. Once we do that, we're able to get all the information for South Amboy going back many months on what the home value is. And remember, this is the general population of all the homes in South Amboy. So this is not comprehensive of like one particular property. So you could have someone that is on the low end, and Zillow is able to get that data into a median so we can more easily analyze it. And we could see from March all the way to July of the year 2021, that this home value has increased. Now we're going to want to dive into our case study for the price to rent ratio, we need one more factor. We also need rent, we have home values, not rent yet. So we do the same exact thing was where we have quandl get table it's going to be Zillow forward slash data. Again, our indicator now changes this is only thing that changes it is now RS S A. and that stands for that indicator up here that we have our SSA, which is rentals, seasonally adjusted for all homes including multifamily, then we are going to select the same region. And I asked to show back the first five. And here we have what the rental median is for each of these months. So that's how you're able to use cuando. In order to query that information as Zillow provides for free on a monthly basis. This is really useful if you want to build it into a tool you're doing some sort of research, or you just need to have a programmatic way to analyze things for a larger model that you're working on. Helps has been useful. And don't forget to check out part two, where I go through the case study and we actually look at price to rent ratio. And we've created a really cool graph at the end so we could analyze our data fast. If you haven't already, please subscribe. Thanks