How to Web Scrape Real Estate Websites in 2022

Need to access real estate data from the internet? In this video, I'll cover how to leverage APIs, bulk data providers, and Fiverr solutions to scrape data. This can help to find leads for your real estate business.

Ariel Herrera 00:00

Hey everyone, are you looking to get property data at your fingertips so that you can analyze whether a property is a good deal to purchase or not? Well, the trouble is, this data isn't readily available if you're not a real estate agent. However, there is a loophole through web scraping. Whether you're a programmer or not, I'm going to show you the tools that you need in order to do web scraping in 2022, so that you can get properties at your fingertips and be able to be one of the first ones in your market to extract this data. My name is Ariel Herrera with analytics area channel, we bridge the gap between real estate and technology. I'm a data scientist that loves to be able to bring data and analytics forward in a simple fashion that anyone can understand. So if that's the kind of content that you enjoy, then please subscribe as well, like this videos I know to make more of it. Alright, let's get started.

Ariel Herrera 01:05

Best proxy reviews came out with an awesome article recently by John Grimes that goes over the current tools available in real estate to be able to extract data. I'm going to walk a little bit through this article, but also give you some tips outside of this as to how you can really utilize web scraping other tools and API's. So let's check this out. In this article, John goes over at first, why do we even do web scraping? Well, all the data for sales and purchases are on the MLS. MLS is multiple listed services and it has access through agents. In order to get your agent license, you have to go through a series of training pass an exam. So it's not ideal for everyone who just wants to be able to get their hands on property data, whether you are an investor or researcher to go through that process of becoming an agent. However, as of the last couple of decades, we've been able to have access to the same data. It's in the MLS through online sites like Zillow, realtor.com Trulia, as well as Redfin. Now, the issue here is that these sites don't allow us to download properties so that we could review them at scale. Reason being is that it's really meant for homebuyers and homebuyers aren't looking to do crazy manipulation within calculations and math, they're looking to see if the home is pretty usually for first time homebuyers, if it's somewhere that has good schools that they're planning to have a family already have one. And if it's going to be relatively close to attractions. So it's very different use case, which is why these tools, as I just mentioned, don't have the availability to download things in Excel. Now, this is where web scraping comes into play. If you're totally brand new to web scraping, no worries, simply Web Scraping is just getting data that's on an internet page. And being able to have that in a structured form. What is that structured form, it could be a CSV file that you open up in Google Sheets or Excel, it could be something called XML format, JSON format a lot more, let's just go through a quick walkthrough. So say on this actual site that I'm on, if I copy the title, maybe for some reason, I want to scrape all titles that have ever been on this website. And in my script, I can highlight this go to inspect. And this part on the right hand side, this is all a code called HTML. And if I look at our header, we could see realtor Scraper 2022. So the same exact title. So these elements are what web scrapers do. They go and grab these elements from the webpage, and it eliminates the whole process of manually copying down things. Can you imagine if you had to manually copy and paste property attributes, like bedrooms, bathrooms for everything in your market, it would just be too crazy. So let's see how we could do web scraping today. Now, web scraping before we go into the tools itself, there are some negative effects of it. You may hear websites say we don't allow bots or web scrapers. And that's because it could spam their traffic. So say if it's a small website, maybe a county website that has information on pre foreclosures, well if you have a bot that trying to check when the next Pre Foreclosure comes up, and it's pinging the server every second that could potentially crash everything. So those actual people who would use the site can't use it. So this is why as a lot of regulation around the space, and why web scraping isn't as easy as it used to be. So For every web scraping technique that's out there, there's also a combative way to take it down, as well. So this is why there are certain tools that you could use for web scraping. And of those tools, you could use Python. So writing code from scratch, you could also use readily available tools like apify apify, creates web scraping bots, that allow you to pull elements from a web page. I've used it before, it's super useful. Then there's other web scrapers like Octo parse, which is $75 a month, scrape storm, web, Harvey parsehub.

Ariel Herrera 05:42

Now you might be thinking, okay, great, which is the best one, which one do I use to get properties in my market? Well hold up, there are professionals who already do this. There are people usually out of CS, who have the latest up to date knowledge of how to be a lot of these blockers and walls on these web pages to extract this data, and then consume this data across an entire website like zillow.com. And they provide that in a structured way, via an API. That way, we can get this data by just extracting and putting some parameters like I want all for sale properties in Tampa, rather than having to develop the whole web scraping application ourselves. And this is important because we could end up spending. So if we were reading this in Python, we could end up spending maybe 10, to 20 to 50 hours developing a script, where it might actually the website might change. And then our script is done. So we should really try to rely on those who are working on this stuff every single day. And this is why I highly suggest to move away from tools like octopus. And if you're looking for property data, you try to look for API's already available. So this one in particular is built by a team called API maker. And what they do is in real time, they scrape real estate data from zillow.com. They make this data publicly available at a pretty low cost. So you could use their data for free to extract 20 pills a month, which I'll explain what that is. Or you can go up in their tears, which are pretty affordable. So let's just explore what these endpoints are. Think of endpoints as folders folders with data. And here you get to select which folder you want to get data for. So in this case, zillow.com API provides information on for sale data. So everything for sale within a given market, as well as property details. This will get you information on property estimates like this estimate or rent estimate. So say if you're brand new investor, and you're thinking, I want to automatically get properties, the detail and calculate the cash flow. And you could do that with this specific API. There are several others as of today on rapid API that scrape realtor.com and some other sites. So definitely explore that in more detail. But do be aware that there are sometimes can be conflicts between these API creators and their originating sites. So they are always up and available. So what if at this stage, you're thinking, okay, great, but area, I'm not really looking for just for sale or sold properties, I'm looking to find properties that have motivated sellers, these are people maybe behind on their payments, and they have more of a likelihood to want to sell to me. So in this case, particular niche to look at our pre foreclosures, and pre foreclosures are for free available on county websites. So in this one in particular is for my county, which is the Tampa area, Hillsborough County. And what we could see here as we have a calendar that shows us the foreclosures within the area. So if we look at the date of recording, which is the 23rd, we could see that we have information on foreclosures, we have the address as well. So technically, we can go and try to scrape this data and be able to target our list of potential sellers based on this information. Now, this is something that I used to do in the past, I built a web scraping tool to do this and I thought it was great because it was particular for my market. However, as I wanted to go and expand into other markets and see different pre foreclosures just not in Tampa, but other surrounding areas. I realized I had to continue to build these web scraping tools on my own. This is when I started to leverage fiber. So in my data collection process that I created several months back please check out this video because I actually detail when you should try to cut rate these programming web scraping tools on your own. So if it's for your own county or wanting to just hire it out, you want to work smarter, not harder. And web scraping isn't that technical of a skill,

Ariel Herrera 10:13

there's a lot of people overseas that are able to do this at a very low costs. And all you need to do is create a requirements document, which is basically some screenshots, and you point arrows as to what data you want to pull. And you could typically get someone to pull this information for you for less than $30. And then you can run that script over and over again. Now at this stage, you may be thinking, awesome, I'm gonna go with that use of one to five or create my requirements doc, or you may be you're thinking, hey, I don't know exactly which market I want to focus in on real estate, or I'm looking at many different markets, hiring out met different web scrapers is going to be time consuming, I need to get this data fast. Well, in this case, a platform like prop stream is very, very useful. So going back to that example, when I was on my County's Pre Foreclosure site, and I wanted to be able to get this data, I have built my own web scraping bot. However, sites like prop stream, they actually aggregate the data across counties to be able to extract pre foreclosures. As a background. We don't have one US API or data source to be able to collect all pre foreclosures, it is county based. So this is why these aggregators are so important, because they allow us to look at everything at scale. So say if I select this property, which is 14 810, Daisy Lane in Tampa, I can copy it, bring it over to prop stream, paste that addressed. Select it. And we could see that this property comes up on the right hand side. Now if we look at the details, we can see that this property is labeled already as the stress meaning that the sellers are more motivated for various reasons. And in this case, they are motivated because this house is in pre foreclosure. So this data is already available, I didn't even need to create a web scraping thought and a tool like prop stream. You could look at pre foreclosures at scale. So say if I want to see all the pre foreclosures in this area, I could just search by the zip code here. Or I could search by the city itself. And then I could look at pre foreclosures. And I'm gonna get a list of 1012 pre foreclosures, what you could do is take it a step further by saving your list. So this is technically a remarketing list that you want to continue to observe. And every time there's a new property that hits this list, you're going to get an email, and you basically bypassed everything a web scraping tool would do, because now you're automatically getting that information. And you could do a lot with it. You could quickly analyze your deal on a spreadsheet, you could feed it into your CRM system, or you can reach out directly to the seller. So as a recap, the reason why we do Web Scraping is because real estate data is not available publicly aside sites like realtors, Zillow and Redfin. And in order to get this information, web scraping does take a lot of effort, because there's a lot of blockers in the way. So we have API's, which is developed by web scrapers that are very technical and do this probably every day that can pull this data. However, they're not getting information like pre foreclosures, divorcees information like that. Therefore, it's useful to use tools like prop stream to be able to extract that Prop stream does start at $100 a month, however, you could use it for seven day free trial to get started. I would highly suggest to use a seven day trial for your market and see, compare it your county foreclosures and compare prop stream does it cover everything that you expect? If you're in a smaller market, it may not and in that case, definitely go to fiber and request for a web scraper to develop a custom tool for you to extract this data. That way you could say ahead in your market, on being able to extract the most motivated sellers within your area. I hope this has really been useful. And if you have any tips, suggestions or questions on web scraping, and what approach to take them please leave comments below as well. If you haven't already, please subscribe. Thanks

Previous
Previous

Tips on Buying a New Build in an HOA Community

Next
Next

06-03-22 Tech in Real Estate News | Web Scraping