How to Get Zillow Agent Contact Information | Python Tutorial
In this video, I'm going to show you how you can use those agent finder and the zillow.com API. To be able to get all information of agents including name, phone number, reviews, agency and more into a spreadsheet will be able to do this for multiple cities.
Ariel Herrera 0:00
How to get a list of agents to a spreadsheet using Python. In this video, I'm going to show you how you can use those agent finder and the zillow.com API. To be able to get all information of agents including name, phone number, reviews, agency and more into a spreadsheet will be able to do this for multiple cities. My name is Ariel Herrera with the analytics area channel where we bridge the gap between real estate and technology. My passion is to be able to deliver analytic solutions for real estate. And if you want to get the latest content, then make sure that you subscribe as well like this video if you want to see more like it and stick to the end because I'll show you how to automate getting this data for multiple cities. You could use this to feed it into your CRM, and more. Alright, let's get started.
Right now I'm on Zillow agent finder. This allows first time homebuyers or any homebuyers to be able to find agents as well as for sellers to see who they could possibly sell through. Here, you can enter in location for a city, in my case all select Tampa, Florida. Once selected, I could see all agents listed here, up to 25 different pages in total. And I can get information on agents such as their name, phone number, a number of reviews, Agent license, latest review, as well as the review itself. This is really important. If you want to aggregate information about agents in an area. This can be useful if you're looking to see who's the right person for you. Or if you're a wholesaler, and you want to be able to market your deals to different agents that may be working with investors. So how do we actually do this with Python, it would be tedious if we had to actually copy each of these rows individually. Well, that's where we use the zillow.com API. This API basically web scrapes information from Zillow pages. It is not made by Zillow, it is made by a third party as a disclosure, all you have to do to get started is sign up for a rapid API account, which is free. Once you do, you'll be able to come to this page on the left hand side, you'll be able to see all of the endpoints that you could get data for. This includes how to get properties that are currently for sale, as well as those that have sold and information on properties that are off market. Those are in prior videos that I have, which I highly suggest to check out. But in this case, we're going to look directly at agent, what we want to do is find agents by location, we essentially want to replicate the same exact steps that the agent finder has. But instead of having to build our own web scraping bot, we just want to get this data programmatically using an API and Python. So once you sign up, make sure you subscribe to this endpoint. Now the pricing structure is pretty straightforward. You can use this API for free, even though I highly suggest to go to a pro membership so that you can request more data at a lower cost. And you can request more information within a shorter span of time. Now you're gonna go back to endpoints. And within the left hand side, we had all of our endpoints there that in the middle, we can see the parameters that we can select. So if we go back to find agents, once you sign up, you'll see your API key here as well, and that you signed up here. Now if we go down for parameters, we can search on location, we could search specific name of an agent, latitude longitude of all agents within a given coordinates. And what's important here is the page itself. Because from what we could see, there's multiple pages that are here. And we want to be able to extract all agents, not just the first hand. Then on the right hand side, we have sample code. If we look over, we can go to Python request. And we can copy this data if we want to directly try to query this information. Here the sample is looking at a specific name. Even though for our use case, we're going to look at an entire city. If you go over to example responses, you can see the exact information that you'll be able to receive. This includes the length of the agent, as well as reviews, phone numbers, contact information, and more. Now that you have a great overview of what this API is even about, let's now open up the Google collab. notebook which is below, Google collab is a free environment where we can actually run Python code without having to have any of it installed on our machine, we could also use resources that are on the cloud. That way, we're not bombarding our machines. With all this information, the next step you're going to take is going to file and you're going to save a copy of this notebook into your own drive. This will allow you to run the same code that I'm going to run here now. Next, we want to make sure that we import all the libraries that we're going to use. So to run the cell, we can click the play button here, or Ctrl, enter. Once the sale is completed, we'll see a check mark on the left hand side and how many seconds it took. Next, we have functions which I'll get to in just a moment. But let's go straight into trying to get this data. So for me, I have my rapid API keys stored in a file, your rapid API key you could find in code snippets, the rapid API key will be right here. Next to the X rapid API key parameter, you want to copy this key and store in a safe location. But if you want to just go straight into it, you can replace this line of code with your key here, I'm running the next two cells. This is allowing Google collab to connect to my drive. I go into Select My name. And I'm going to allow access for this notebook to read all files in my drive, I have a CSV file called API keys. And that's where I have my rapid API keys stored.
I'm going to run the cell to get the information and assign rapid API key to the string that is there. The next step is to get a single API request.
Let's just test this out, we want to be able to get all agents for a specific location. I'm going to use the same example that I did previously. And it's going to be Tampa, Florida. So here, there's different parts within our request. And our query, we have the location, which is Tampa, Florida. And we're going to select Page Number one, which should be this exact page, our URL is the endpoint that we're getting the data from, in this case, find agent. Next, we're passing in our headers, including our rapid API key, and we're making our request. So let's select Play. And here we're going to get our response, we could see that there's an agent called Duncan duo. And we get information on their last review, which was in 2022. And some more information. But this is not friendly to read at all. What we're going to do next is transform this text file into a JSON format, and view it as a table. So once we click play, we can see that we have different keys. This includes agent status, which I request was a success, we can see the location that we enter, which was Tampa, Florida, the total number of pages, which is 25, how many agents per page, which is 10. So what this is telling us is for our city, Tampa, Florida, the information that we're able to extract on the page is 25 pages total. There's 10 agents per page, which means we can get 250 agents. Now there may be more agents. However, for this API, it is only grabbing those that are in the search here, which for my view, it seems as though Zillow is ranking the top agents higher, so we're likely going to get more active agents in our lists, which is a good thing. But all the agent data is in this column called agents. So we want to be able to extract the data from this column. Which is why in step three, we're going to use JSON normalize, and we want to for our response, get the key agents into a data frame. So if we run this, we'll be able to see all of the data that's squished within that first column. Now in separate rows, we have a total of 10 rows, and we have 40 columns. This includes the agent name, the agent latest review, when the review was, we can also see information on the total number of reviews, as well as the agents, number of stars or rating and their contact phone number. There's some other information that isn't as relevant, and we'll filter on that in just a bit. But this is a great start, we've been able to query a single location and get the information for a single page. But how useful is it if we're just getting these 10 agents, we want to get all the agents so that we can maybe use this as a marketing list, if we're going to try to sell a product, or maybe a deal that we have to these agents. So the next step that we're going to take is section to get all pages for single city. Now, I've made this extremely easy for you, I actually wrapped all the code to do this within a single function called Get agents. So we're going to go back up to the top of the notebook where we have our functions. And here we have two separate functions get single agent requests, which was what we did in that prior section, we had a single location or rapid API key, and we get the data back. So let's run this here. And now get agents is a little bit different. We now have this new parameter called all pages. If all pages is set to true, we basically go get every single page available for us. So in this case, all 25 pages, the code down below has an if else statement. So if all pages is set to true, then we need to go get each page. If there's just one page, we could just return back the response. If there's two pages, we only need to go get the second page. But say in our case, where there's 25 pages, we need to loop through each page. So in this case, we're going to go 24 more times to go get that data. And we get that data in this function, which is the one above. Now the thing here to know is that we need to force our program to sleep, we can technically get all of the pages within only a couple of seconds, if even that. But the API has limitations. If we go back to the API, and we look at pricing, we can see here there's a rate limit two requests per second. And that's so that the servers aren't overloaded. So in our case, if there's two requests per second, we're going to pause, I have sat here for half a second before we go to get the next page. So run this here.
And the last function will be joining all these pages together. Because we have 25 Separate Tables definitely don't want that. We want this to be condensed into one view. Jumping back to section two, we can now use our function, get agents and set all pages to true. Let's run this cell. And we're going to see our script start to get every single page. And we notice that there's a little bit of a pause here. And that's because we set our script to pause to sleep every half a second. Now be mindful that every time you're getting information for a page, you are making a request to the API. So in my case, I have to go for a higher plan, or at least pay several cents to get more data because in total, there's 25 pages. But for the basic plan, you can only get 20 a month, in total, it took 54 seconds to get all the data, which was not bad at all. Now in step two, I'm actually going to shorten down the amount of columns, we have 40 columns, but not all of them are useful, and some of them are duplicate. So I have here, the set of columns that I care about. And I want to transform our list, we have a list of 25 data frames, that's not that useful. So this function basically condenses it into one single table. Now if we run the cell, we could see we have one table with 250 rows, so 250 agents, and we have 13 columns, and we reduced it now we could see the field that we care about a little more closely. We have the agent name, we have the latest review, when that review was how many reviews rating, phone number, Agent license. At this stage, were thinking, yes, we've been able to get all the information from the page for a single city. But what if we want to get agents for multiple cities? What if you're looking to market across not just Tampa, Florida? But in my case? What if I want to look at cities like St. Petersburg, Florida, Lakeland, Florida, Spring Hill, Florida, then we're going to need to shift our script just a little bit. So in this section three, I'm getting data for multiple cities. And the first step here is I'm setting all the cities that I want to a list, I also have an empty list of data frames that I'm going to be inputting here. And this is pretty simple. What I'm doing is I'm looping through each city that I have. So it's going to be five different times we're running the script, then I'm going to get agents. So I'm going to get a data for all pages for the city. Next, I transform this same steps that I did above. And I actually add another column for the location. So I can keep track for which city this was for.
And I bring this all into a single list. And I combined all the data frames. The reason why I'm not running this live is that this will eat up a lot of my API request. So I pre ran this. And afterwards, the result is that I have 250 agents return for each of these locations. Now, some of these locations are closer than others. So there may be overlapping agents. And that's why we need to clean up our data, as well as add some features. So we could view distribution visualization of what our data actually looks like. In the next cell. It's a little bit overwhelming a lot of code. But essentially, what I'm doing is I'm creating new columns. So I'm just going to walk through the first one because it's pretty simple. So here we have the column review, excerpt, review link text. And if we look at that column, it's this one right here. And we see review, June 13 2022. This basically means the last time an agent was reviewed on Zillow, this is useful, but it's in a text format, I want to actually get that date. Therefore, if I want to market to agents that are very active, I can maybe look at agents that were just reviewed in the last two years. So that's why I have this function here, that goes to my new data frame that I copied called DF feet for features. And if there's no value, so if we don't have any reviews, then just return nothing. But if we do have something that returns back, I want to get the last 10 characters, which should be this information here. And if we look at this new column called last review date, we could see that yes, we do have this date. And I actually transformed it into a date time object, so I can get the information like the last year the agent was reviewed, as well. And the additional columns that I also created was number of reviews. So just to integer number of local reviews, whether the agent was reviewed ever or not number of listings, phone number, Agent license, first and last name, which first and last name is going to be a bit tricky, because sometimes is not an individual person, but rather an alias name. Next, we can go down to actually visualize what does our dataset even look like. So here, I'm looking to see how many agents were even reviewed. And a good portion are there's 93% of agents that we found on those pages that actually have some reviews. And those that do have reviews are usually between zero to 19 reviews. So one to 19, I would assume, then on the tail end, we do have some superstar agents that have been reviewed many times. So for our use case, I'm going to look at a wider lens, but you may only want to look at say the top 10% of reviewed agents. And that's something that you could do now, with this dataset, we can also see number of listings. And we could see here that most agents have only one active listing currently on Zillow, which is pretty normal. Now we're at the step we have our data, we have our features, we understand the data set, we want to now download this file so that we could mark it to these agents, the how do we actually reduce the number of agents within our file? Will I only care about agents that were reviewed, so I'm selecting that to be true. Next, I only care about agents that were recently reviewed, because maybe an agent was reviewed back in 2019. And they really haven't been active since. If that's the case, I may be wasting my marketing dollars to market to them. So I'm going to only filter on those agents that have had reviews in the last two years. Next, I'm dropping the location column. And that's because we may have agents that are both in Tampa, Florida, and also in St. Petersburg, Florida since they're so close. And if I actually Drop Location, I can also drop those duplicates. And here we could view our table and we've actually gone down from a third 1250 rows to about 600. So we've reduced our dataset by over 50%. And now we can really be confident that we're honing in our marketing dollars towards agents only one time and also agents who are more so active. The last step here is actually downloading this data, which you could do directly in Google collab. You can upload this into your own CRM or even further automated say by using a tool like Zapier. I hope this has been super useful. I love being able to provide data analytics tips, especially with Python. If you find this type of tutorial useful or you have some ideas to how it can be improved upon them. Please leave comments below as well subscribe if you haven't already. Thanks so much
Transcribed by https://otter.ai