How to Get Real Estate Listing Data for Puerto Rico | Web Scraping
Are you looking to find data on real estate listings for Puerto Rico, whether it's for properties that are being sold, or up for rent, short term and long term vacation rentals? Well in this video, I'm going to show you how we can extract that data using web scraping.
Ariel Herrera 0:00
Are you looking to find data on real estate listings for Puerto Rico, whether it's for properties that are being sold, or up for rent, short term and long term vacation rentals? Well in this video, I'm going to show you how we can extract that data using web scraping. This is going to be a no code solution. Using browse AI. My name is Ariel Herrera with the analytics area channel where we bridge the gap between real estate and technology. I love data driven solutions and using automation so that you could focus on analysis. And if that's the type of content that you enjoy them, please subscribe, as well like this video. So I know to make more web scraping videos like it. Alright, let's get started. So in our last video, we covered why you would want to invest or live in Puerto Rico as well how to ultimately find rental properties on the clasificados website. Now in this video, we're going to take that a step further, instead of a one off basis of looking at a property. How about we get a list of all of the properties that are available that way, as investors, we can better understand the market and see what are the minimum prices maximum prices median average, how is that changing year over year? What is cash flow going to look like? If I were to invest in Puerto Rico, all these questions start to get answered once we have data. Now web scraping can be challenging, and you do need programming skills. However, with tools like browser AI that use artificial intelligence to web scrape data, we don't need to use code. And this is great because it saves us time on the front end. And then we could pull this data and actually start analyzing it, whether it's in our CRM system within a dashboard, or even in a Jupyter Notebook. So the first step here is you're going to want to click the link below. Even though browse AI is free, you may go later on to a subscription. And if you use the link below, it will be 10% off. So once you use the link, you'll be able to come in here and log in. If you don't already have an account, you'll be able to quickly set one up. And as a quick recap, if you're new to browse AI, it is a no code solution with one click automation. So as an overview, say if we wanted to scrape Twitter, we'd be able to actually mimic the same exact steps that we take manually. And if you see the bot right here, basically browse AI is analyzing our steps that we're taking, and it's going to replicate it automatically. So here we're logging into Twitter, we're looking to capture certain fields. And ultimately, we could save this data and have that it's scheduled on a routine basis so that we're getting the latest information. Now going back to your personal dashboard. If you already have one, you'll see tasks. Either way, you want to create a brand new task, because we want to set this up to take real estate listings from Puerto Rico clasificados. So we're going to select Create a new task. And there's two parts here, extract structured data and monitor site changes. In our case, we're going to focus on extracting structured data. This will allow us to download the listings into a spreadsheet. If you're new to browse, I highly suggest at this stage to quickly look at the demo. But if you're comfortable to move on, then we could go straight into entering the URL that we want to scrape. So let's go back to our clasificados. And on the left hand side we see for sale properties. Let's click that. This brings us to our search site. In this case I'm going to focus in on San Juan, San Juan is one of the largest cities if not the largest in Puerto Rico, and has a large tourism centered around it. So under area or towns, I'm going to scroll down to where I see San Juan. And once I get to San Juan, I'm going to select all of the towns. In this case,
let's imagine that I'm only interested in an apartment. And the price maximum can't be any more than half a mil. Let's look at our listings here. Here we see different condos that we could possibly purchase. Now we could do one of two things, we could start our web scraper here. Or we could start our web scraper back at this stage so that in the future we can put in different towns, so that's dependent on you. In this case, I'm going to go with starting at this page so I have more flexibility. So I'm going to copy the URL up top. And then within browser I am going to enter our origin URL. Now let's start recording our tasks. We're going to see a dialog box pop up. And right now browse AI is going to start recording our actions within this window if any issues arise You can reach out to them directly to support at browse AI. So click OK understood. Now let's mimic the same exact actions that I just took. Let's go to area. And let's select, in this case, one of the San Juan towns. I'm going to select apartments. And I'm going to view the listings. Next, I want to get the information that's on this page. Because there's multiple properties here. When I select Browse AI, on the right hand side, I need to select capture list. So let's click that now. And when we hover over, we can see that there's now boxes around the properties. And this is the information that we want to pull. So let's click enter. And now we can start capturing the text that we want. In this case, we want to get the street name or the name of the listing. So let's click this. We also want the number of bedrooms and bathrooms. So let's click here. As well, we want to select that we want visible text. So let's click that next price, then the type of building apartment as well as the town. The last one that we want here is the initial image of the property. And it's going to ask do we want the image URL and this case yes, let's select that here. And now we can click Enter since we have all the variables we want to obtain. Now once we click enter, browse AI is going to ask us to label each of these fields. So I'm going to call the street address. Then next bedrooms and bathrooms, listing price, property type, town, and lastly, the property image. Now once you hit Enter, you're going to see that all of the properties are now within a list. We can name this list as well, we could scroll down to load more items. We can also click on the next to navigate to the next page. So we can actually get all the listings if we'd like. But just for simplicity for this video, I'm going to stick with the first page and I'm going to name my list and label it as San Juan listings.
Then I'll go into Select on the bottom right capture list browse AI congratulates last well done we just show the robot how to capture the list. If we want to capture more data, we can continue to extract from this page. Otherwise we could finish recording it okay understood. In this case, we're done. So we're going to click Browse AI. And we're going to select finish recording. Our next steps here are very quick just two steps. First Name the task. In this case, I'll just keep it the original name that was labeled, and click Save. And then lastly, browse AI is going to run the task that we had just created. So it's navigating to the website, it's going to go to pull the data. And afterwards we'll be able to review its results. Great. So once Browserify has finished running the recorded task. Now we could see all of our elements within a table, which we can also download into a CSV file. Here we could see the Street Address bedrooms, bathrooms, listing price, property type, town and property image. This is great because it's a start for us to be able to have data across Puerto Rico for listings, which currently isn't really available in platforms like Zillow, Redfin and realtor, really, you have to go to clasificados. And now we can actually consume the data a little bit more structured. Now when you go back to browse AI, scroll to the bottom. And in my case, my video recording was unsuccessful likely because I had several steps that were going on as well. The URL was a little bit tricky to pull the data from, but ultimately, this worked. So what we could do next is say yes, looks good. And now our task is saved down. So in the future, we'd be able to run our task and we could change the limit, we can make it to be larger, so we could accumulate more listings as well. We could take it a step further. by actually using the integration features. browsI is able to integrate with Google Sheets, Zapier, which allows you to integrate with so many more applications, as well as rest API's. I hope this has been useful as an introduction of how you could start accumulating properties for Puerto Rico by using web scraping on clasificados. If you'd like to see future videos to analyze, return on investment and cashflow for properties in Puerto Rico, then please let me know in the comments below. If there's enough interest, then I'll look to create it. Thanks so much for watching. If you haven't already, please subscribe.