How to Get Rent Data by Zip Code | Part 1
This video will be split into two parts. First, we will get the rental data for all zip codes in a county. Second, we will visualize the data in Tableau so that we can easily spot trends and emerging neighborhoods, we'll be able to do this all for FREE!
Ariel Herrera 0:00
Are you looking to analyze rent prices for an entire city or county? Well in this video I'm going to show you how to get rental data at the zip code level using Python. My name is Ariel Herrera, the analytics area channel, we bridge the gap between real estate and technology. Current rental data is hard to get for free. But now it does not have to be using API's or update data monthly. This video will be split into two parts. First, we will get the rental data for all zip codes in a county. Second, we will visualize the data in Tableau so that we can easily spot trends and emerging neighborhoods, we'll be able to do this all for free. And if you also enjoy data driven solutions like me, then please subscribe as well like this content if you want to see more rent data solutions. Alright, let's get started. For this tutorial, we're going to be using Python or apt to get rental data. So make sure you make a clone of the notebook, which is in the link below. This will open within your own Google Drive so that you could run Python in the cloud. That way, you don't have to have it personally installed on your own machine. So for the first part, what we want to do is run our imports, we're going to use outside libraries. So we can either press play, or we could do Ctrl, Shift Enter in order to run the cell. Now, once this is run, we've loaded in our libraries. And our next step is now optional. This is where I actually pull in my own Google Drive file, where I have all my API keys stored since API keys needs to be private. And don't worry, we'll get to the part where you bring in your API key. So right now I'm just loading into connect with my Google Drive, I gave permissions. And the second cell, what I'm going to be doing is assigning my API key to a variable called Rapid API key. And this is the part where you're going to replace with your own key. So let's look at that. Now, we're going to be using Realty mall property, which is an API that gets information monthly by zip code for rent data. And real Tmall allows us to actually use our API for free, we can make 50 API request anything after that is five cents afterwards. So we're going to do is make a free account with rapid API. And then once you do, you'll be able to come to Realty mogul site. And their about page if you want to learn more about them, you can look on their website, they have not just rental owner data, but they also have property data absentee owner list and more. Now, if we go over to the endpoints, which is what we care about, towards the bottom, we are going to look at the rental market data and points since we want to get rental data for entire zip code. But they do have other endpoints to be aware of, if you wanted to get comparables just for one particular property. So for us, we're just going to need zip code, and we could do is go over to Python. And we can copy the code that's going to be here. Now once you sign up, you'll see that you have your own API key on the right hand side, make sure you copy that and replace it where I have my rapid API key. So the first step we want to do is get data for a single property. So we paste in the sample code that came from real Tmall. And we replace the string with our own rapid API key, we can also replace the zip code that they have. So in this case, I'm going to use the zip code in the Tampa market, which is my Market. So create a variable called zip code. And here I'm going to type in the zip code that I want to get data for 336 10. Now I run this. And once we run this, we're basically passing into the endpoint, please get data for our zip code, and we get lots of information back down to the bedroom level. So for each month, we're going to have information on what the average rent is, whether it's a studio, single bedroom, two bedroom, and more. Next, what we want to do is actually see this data in an easier format. So if we look at it on JSON format, we can now see it's a little bit more readable. If you go all the way to the top,
we can see that we have our zip code as an ID, we also have our average rent. And we have more detail on what that looks like. We could look at Studio bedroom, two bedroom and see max and min rents as well as how many rentals currently exist within that space in the market. And we also have historical data. So let's bring this over into a table that we can actually read with rows and columns. So here we're using Pandas, library and JSON normalize in order to normalize this response. So let's look at the number of rows as well. also the number of columns that we're going to have in our dataset. So we could just modify this quickly to calls and look at the length of the columns. Now we can view the top results, which in this case, we're just going to get one result, because it's just one zip code that we got data for.
Now that we have the data, we could see here that we have a single row. And we have 141 columns, which is a lot. So what it's in these actual columns, well, we could see we have rental data history, dating back to 2020 of April. And we have data that goes all the way to June of 2022, which is the most recent month as of this recording, what we want to do is actually get this historical data, wouldn't it be awesome if we can have a trend analysis to actually see what this rent has changed from from two years ago till now we could see a merging zip codes if we do this. So the first thing that we want to do here is see how can we dynamically pull these columns. So we're going to do is look at all the columns that we have within our data set. And we want to pull the ones that have history, and that are detailed. So what we do here is we're going to loop through each of these columns. So for each C, so C is going to be our variable here, if detailed, and history is in C, and we want to keep those columns. So here, we can wrap this into a list. And now we see we have all of the rental data history columns for every single month, based on us just filtering it out using a for loop. So let's set this to a variable called rent data, Grant detail history column. And now we can actually iterate through each of these columns that we can pull the data from it. Our goal, again, is to have one single table or one data frame that has all historical detail for these rental prices. Now we want to actually pull that date that comes from the column. So we're going to do is look at our first element within the list. And when we run it, we see here that we have 20 2004. So let's split this by a period, so we get closer to getting our element. And this isn't the second piece of our list. So let's index it to two. And now we see we're able to get our date. So we can copy this and just modify it slightly, this will be x instead. And we could set this to a variable so that we have our date as a string. And if we want, we could print this so that we could actually make sure our script is working. I frequently do this when I'm creating for loops is each time I add a step, I just test it using print. And we can see we have all of our columns now have month, as well as here. Next, what we want to do is basically get our column name dynamically. So if we look at one of these columns, their call rental data history with the month and year, and then it has average rent, which in our case, we only care about average rent for this example, so that we could look at average rent across history. So here we set our variable called detail column. And we're going to modify this part. And instead, we're going to replace it with our date string. That way, as we go through all the columns, we're dynamically pulling the average rent for that date. And now we can see that we've set all of our months to have that information. Let's now get this data into a data frame, so that we'll be able to actually view it as rows and columns. So we're going to convert this historical data to a data frame. And we can set up our data frame as underscore DF, since it's temporary, and then wrap it in Panda's data frame. And for our other data frame DF friend, we're going to pull that new detail column that we specified a top once we have our column that we want to get data for wrapping to a data frame, and we're selecting the first element there, we could run this, this works fine. So now let's create a couple of columns that we also want to have in our data frame. And then we'll run it so we can see what this looks like. So our date string that we were able to pull, let's make this into a column so that we know which month and year our data is associated with second, let's do the same for ZIP code as well. Now we want to do is each time we get information on a particular year and month, we want to put this into one single table. So we want to have all historical data together. Therefore we create an empty list that's going to store each of these sub tables that we've created per month in year. So once we run this, we can now look at our results. So if we look at a single data frame for that month and year, which is June of 2022 We have all of our average rent by bedroom. And we also have the zip code associated. So net, let's now join all these data frames together, we use PD dot concat. To do so. And now we have 765 rows of historical data, we can set this to its own data frame name as well, and add some commentary up top so that we can remember why we created this. So now we have our new data frame called DF detail. That's basically all the historical data that we were looking to get. Although we have our data, we do see our important columns, date, string, and zip code all the way at the end. Because these are our unique identifiers, we want to see those columns towards the top. So if we go to functions, we'll see that I already created a function to move columns to front of our table, so make sure to run that function. And then we could go back down so that we can move those two columns to the front part of
our data frame. Great, we can now see that we have our date string at the front of our table. So next, let's just copy the previous code that we have. And we can change the column that we want to move to the front, which is the code. And now we see the zip code is our first column, then date string exactly how we wanted it. But now we want to add some more features, we want to see how has our rent changed from the previous year has a decrease has it increased? Does this mean that it's been increasing for several years, and maybe it's an emerging market relative to other zip codes in the area? Well, here, what we're going to do is first label what is the previous year. So in our case, I'm going to use a lambda function. And before we finish our lambda function, let's just try this out with one single row. So if we grab our date string, and we grab the first item, we'll see here that we're going to have 2020 of oh four, so we want to get the previous year, but this isn't a string. So now we're going to have to actually modify this and bring it back as a date time object. So we're going to wrap this variable with daytime. And we're going to specify what is the format. So here we have year and map. So once we run this, we could see now our object is a date time format, where we have our year, our month, and we want to subtract one single year. So we're going to use relative delta so that we can subtract one year. And if we run this, we'll see now that we have 2019, exactly what we're looking for. But we need to get this back into a string format. So we use our date time string F time function. And now we have one year going back. Awesome. So we could bring that into our lambda function, which will allow us to apply this to every single row within our data frame. And we'll always do is replace test with the column that we're going to be using. And for readability, I'm just going to move this down, run this and now for every single row, we have a column for previous year date, perfect. Next, we're going to add our features so that we can have the previous year data. So we're going to copy our current data frame. And we want to have a second data frame that has data from the previous year. So what we're going to do first is we want to rename our columns. Therefore, we're going to get all of our columns into a list just like we did previously. So we're going to label each column variable SC. So for each column within our table, what we want to do is add a prefix or suffix to it called previous year. Once we correct our variable and change it to C, then run this. You'll see now we have our data frame. But it has previous year for each of our columns that we were going to be able to differentiate which columns are from the last year versus the current ones, we merge these two data frames back together. Great. So now we're going to use DF merge. So it's pandas function called merge. And if we go look at it quickly, you can search it into Google pandas merch, it should be the first link for documentation. And we can look at all the different parameters that we can enter, as well we can see examples of how it's been used. So I highly recommend to go to the docs when you're using new functions so that you can understand them a bit better. So the first step to merge two tables together, is we're going to have our left side So our DF detail than our DF detail previous, we want to join these preserving the left hand side. And for our left data frame, we want to join on previous year date column as well as on bedrooms. So we could take, see that previous year column, and then bedrooms. Next
for our right hand side, so our previous year data frame is going to be a bit different. The columns are instead date string, previous year. And it's going to be bedrooms underscore previous year, it's because that was the suffix that we gave those columns. Now we can look to run this and view the top five, which in this case, we just get no values because you don't have data for 2019. So let's look at the last values of our data frame, which is 2022 of June. And now we could see we have additional columns for 2021 of June, we could see your average rent of this year and our average rent from the previous year. This is huge. This is going to allow us to see trends year over year. So let's remove any duplicates that we have. And now we could see the information for each bedroom. We could see bedroom floor. And this is great. Now our next step is we want to add several features so that we could actually look at year over year changes. Before we do that, let's quickly just look at more months within our dataset. We could see here that for studios, we have a big issue, the average rent is 121,000, which we know cannot be true, we must have some bad data there. How do we identify this bad data and do it in a statistical way? Well, we can do is identify outliers using Z score. So we're going to do here is set up a new column where we have average rent, and we're going to get the z score, then we're going to get our column average rent, we're going to apply a z score here. And we're going to see that we have a new column that calculates the z score for our Z score, we could see that that value that's very high is added three and another value 3.6. So we can easily filter these out by removing any records of average rent of a standard deviation that's greater than two. By filtering on our data frame by two different parameters, we can filter on bedrooms, as well as the average rent score so that we can have our Z score less than two. And we look at this here, we have eliminated studios, because studios in the zip code 336 10 was above our threshold have a z score of two. And let's correct our bedrooms here. So we're not having anything, it's more than five bedrooms. As I've seen with this dataset, there's a lot of missing data, or also anomalies when it's over five, or at five. So now what we could do that we have a filtered dataset, we can drop any columns that aren't really necessary for us. And there are several here that we obtained when we did that join with the current year and the previous year data frames. So let's just remove anything that's not related to average rent price. When we run the sell, we could see we have less columns for our previous year data, we just have information that we care about like the average rent, min, and Max. Awesome. So now we have a data set that is clean and filtered. And we could add our last features here, which are going to analyze our year over year change in average rent for each of these bedroom types. So we do as we set our new column, it's going to be called year over year average rent. And what we want to do is take our newest average rent, which is this case 2022 for July, and we want to do is create the formula that will subtract from last year's rent and divide by it. So I'm going to quickly show you what this formula looks like as well. So that you have an idea of how to do this on your own too. But once we have these columns complete
look over to percentage change calculator. And here this is just a public free percentage change calculator and I use this just to make sure I do my formulas correctly. So we're going to do is go to our table, and we could select the average rent for one of these rows. So for three bedrooms, let's just take this number. So this should be for July sorry June of 2022. And now let's get the pre Previous Year, which is about 1500. So let's copy this, put it into the calculator, hit Calculate. And we could see we should get a value that the rent went up by about 33, or 34%. So now we go back into our notebook. Let's quickly fix this column over here, run it, and once we run it, we can look at our new column. And for that particular row, we are able to see that we have an increase of 33%. So we know that our formula is correct. And now we could just duplicate this to also look not just at average rent year over year, but also total number of rentals.
On a cell, we can now see if we go all the way to the right, we have our additional column of year over year change in total rentals. This is going to help us to see if demand is exceeding supply and how hot an emerging market may be great. So we've been able to get information on a particular zip code, and look historically of how average rent has changed by the bedroom. But we want to be able to get this information not just for one zip code, but for all zip codes for a particular city or county. So let's do that next. In order for us to find all zip codes that are relevant for a city or county, we're going to need to get a data file of all zip codes mapped to city or county. And in order to do that, we can do a quick Google search, which I'm going to show you right now how to do and then I've already put our previous code into functions. So we'll be able to quickly run the same steps that we just took within a matter of seconds to get our dataset that we will then populate into our Tableau visualization within the next video. The first step to obtaining our dataset is by looking on Google US state county zip code data. So I'm gonna click the first link, which is from GitHub, and it says sorry, data is too large, you need to view it raw. So if you look at the data set raw, we can see if we type in Hillsborough County, which is the county for Tampa, that we're able to get all the information on the county, including the cities and the zip codes. Great. So we want to pull this information from within our notebook, I'm going to be copying over some previous code that I've written. So here we have the URL that we just looked at. And by using Pandas read, we're able to see there's 33,000 rows of zip codes and six columns where we get county city and state abbreviation we want to do is get all the information for Hillsborough County within Tampa. So here I take our data frame DF Geo, and I look for Florida and Hillsborough. Now we see that we have all of the zip codes in Hillsborough County, great. However, we do have an issue as we're looking at it, there's some invalid zip codes. It's not actually a real city. And there's no it's not numeric for all five of the elements within the zip code variable. So here we're going to check if it's a valid zip code or not by testing, if it's numeric. So see we have here, check for the column zip code is numeric. And if it's not, then it should be false. So we should see that all of these are true, since all the numbers 335 10 are numeric, but then we see false. Since h h is not numeric, we'll be able to filter on all the bad zip codes, so that we can have those removed from our dataset. And if we quickly look to see how many of those invalid zip codes we have, we can run this quickly when we have that filter. And here we're going to see that of all of our zip codes labeled Hillsborough, there's 4745 are true that they're valid, and the invalid ones are just two of them.
So we could do next is filter to make sure that we only are obtaining those valid zip codes. So I do this here in the next cell. I state for the column valid zip code, only select those that are true. And we can see that we have our data set. And in total, we have 45 valid zip codes, which matches what we saw in the group bought. Perfect. Now we can literally run these zip codes through the code that we've worked with previously. In order to not waste API credits. I'm going to show you a notebook that I've already created that goes through this information and obtains all data for the zip code. So first in the functions above, we want to make sure that We run each of these, we've now taken our getting the market rental data into a zip code where all we do is pass the rapid API key and zip code, we check to make sure the zip code is valid and that it has five elements, then we return the data. We also have functions to get the latest rental data. And also we have a function to get the historical rental data that was the meat of our code. In the last section, we also have our features towards the very bottom. So make sure you run each of those functions, and then go all the way back down to where we just just got all of our valid zip codes. Please be aware before you run this code to look at the pricing. So if you are looking at say 100 zip codes, you're going to exceed the free API. So be aware of that. And if you're going to look for, say zip codes throughout the entire United States, you may want to upgrade to pro Ultra omega for your pricing package. So next we're we're going to do here is iterate through each ZIP code in our list. And there's going to be only a few steps within our for loop. It's pretty clean. Since we have functions. First, we are going to get the rental data once we pass in our key and zip code. Once we have that response, there are two things we were doing, we're going to get the latest rental data. So only for the current month, in this case 2022 of June, we get our letter, latest rental data, and we append it into an empty list. That way we can later consolidate for all zip codes. Also, we want to get historical data. And historical data is going to follow the same steps where we append it to a list, we do a try and accept. So in case there's any errors, we print out which zip code had the error, and we have to sleep for one second. If we go back to the documentation, real team will states that you can only make two requests per second. Therefore, after making those two requests, we pause for one second, we could see here that when I ran this, I was able to get all the data except for one single zip code, which I could always troubleshoot in the future. But at least everything ran. Then I use PD dot concat in order to get all of our data frames into one single large data frame. So we have all zip code data. Now, what's really interesting is being able to look at this information at a historical level, this dataset is a lot larger, it's over 1000 rows, where we have all the zip codes and its historical information for average rent by bedroom, we could see here the breakout of those as well. Then we want to get this into a CSV format so that we can upload this into Tableau for our visualization. So I have a dynamically that things are being saved down or downloaded based on the latest date, as well for the historical context. I merge this with our file that came from GitHub so that we could see county, city and state for our zip code. So next meet me in part two, we're we're going to go over how to create the same exact visualization, but you'll be able to do it with your own city or county. And make sure at the very end of this file that you download it so that you could see all the information and it's going to come up within a CSV file. You're going to have a lot of data here, but we'll be able to work with it really easily within Tableau. And we can see within the spreadsheet, we have our information on average rent by bedroom by zip code. If you found this tutorial useful then please comment below and like this video as a helps to promote for others on YouTube to find this video to help them with their own analysis. All right, and if you haven't already, please subscribe. Thanks