Python Join Data | Real Estate Analytics
In this video, we will cover joins in Python. This allows us to combine datasets from separate tables, you will learn how to union and merge datasets.
Ariel Herrera 0:00
In this video, we will cover joins in Python. This allows us to combine datasets from separate tables, you will learn how to union and merge datasets. Let's jump in to Lesson Eight notebook from our table of contents. What are Python joins? I have two different tables in Python, but I'm not sure how to join them. What criteria should I consider? What are the different ways I can join these tables? Well, using Pandas, we'll be able to merge different data frames. So let's start by running our first cell, which is importing our pandas library. Next, we're going to have three different data frames that we import. Let's run the cells. And then we'll view the contents. So the first two are coming from our property estimate sample in GitHub. The first is for single family homes. And second is for condos. Let's add some context to view our data. For each of these data frames, let's view number of rows columns and the first five rows. So in our first data frame, here, S F, we have single family homes. And we looked at this also in the last section. These are single family properties in California varying in different cities, we have information like bedrooms, bathrooms, property, estimate, Min estimate, and some other location data. And we also have a mirror of the same dataset. But instead its property type is condos, we have nine rows and total and 13 columns, whereas in the previous we had 98. So most of our data is in the single family space. Then for our last here, we have tax information. So it also relates to those addresses in our other data frames. But instead of having information like bedrooms, bathrooms and property estimates, we have lot size annual tax and the last sold date. So the first way that we want to join data is by unionizing two datasets. So in the case where we have two data frames with the same columns, and we just want to stack them on top of one another, we can use pandas dot concat, and put our data frames that we want to stack into a list, we can then run this and view the head or look at the length of our data frame. And we could see that we now have 107 rows, this should represent the combination of both of our tables, which if we add 98, and nine, it is 107, we can view our contents of a group by just to make sure that we did this correctly. So let's group by property type. And let's count the number of street addresses.
Here we could see in our data frame that we do now have on one single table are condos and single family properties. So that was your first merge of two different tables. And it was so simple to do with just one line of code. Now we're going to take it a step further by merging datasets that have different amount of columns. So in this case, we want to take information from our properties. And we want to be able to merge this with lot size, annual tax and last sold. So essentially, we want to add three more columns to the end of this so that we could look up maybe all two bedroom homes and see what the annual tax is right now these are separate, so we can't do it. And the way we're going to merge this data is via a left join, which is how you mostly will be working with in this course. Another popular way of joining is an inner join. So let's just dive quickly into what a left join is, we have two different tables. In our case, our left table, for example, would be properties. So all of our listed properties, including the details of bedrooms, bathrooms, and estimates, and our right table is going to be information of properties. But this time with tax information and last sold. When you do a left join, we're saying preserve everything on the left hand side. And if it's included in the right hand side, then bring it in the right table. But if it's not in the left table excluded, imagine our information for the taxes table was for all states, so street addresses nationwide. That would be a problem. If you were to just join on an outer join, which would include everything. We only want to include information from the second table as it relates to the first. This will be more clear as we walk through an example right now. So whenever I do a join, I like to first just view the first row of each of the data frames to remember what columns exist. So I will do DF prop dot head and look at the first row and copy this code over and changes to DF tax. We could see here that there's some similar columns between each of these data frames, we can get these columns that are similar in a more programmatic way. So if we look at DF prop dot columns, we can see all of our columns. Now in this index, let's transform it into a list. Let's give this list a name. Let's call it list one. Let's do the same for DF tax, except we'll call it list two. Now we have two different lists, let's print them each. Each of these lists represent the columns between each of these data frames, what we want to see is the intersection, or what columns are the same between each of these data frames. That way we could join the data between them at this stage. If you're familiar with Excel, this is the same thing as a V lookup, I just copied over this code that is going to allow us to see the intersection between list one and list two. Once we get the intersection, we're going to label this as distinct calls. And we could see here at the State calls is state street address, and city, these are the columns, we can actually join for our dataset because they match and between the data frames go into our panas documentation, we could see what parameters are required when we do a merge, this is going to merge our datasets. And we could say whether we're merging on left or the right table, there's always going to be two tables here, we could look at examples further down. But let's just jump right into our specific example. We're going to use PD dot merge. And this is going to have different parameters. So the first is the data frame, it's going to be on the left side, this will be DF prop, then the right side data frame is going to be DF tax. And we want to join as a left join that way, if there's other data for tax for different states, we're not going to be including that in our data frame. Now how do we want to join this data, we want to join on the columns that they have that are similar, which will be street address, city and state in our data frame. This is our unique key, meaning that we're not going to have the same address multiple times in our data set. So let's copy the stinked columns, paste it. And let's set this to a new variable called DF merge.
Here we can view the contents of our data frame, we now have three additional columns. This is lot size, annual tax and last sold, we were able to bridge these two data frames together by joining them on similar columns that they each had. This was state street address and City. In this lesson we covered union joins and doing a left join for merge, we will touch a lot more on joining different tables. And if it's a little bit fuzzy right now, don't worry, it will be clear as we go along. It's just good for you to understand as of right now that we can merge different datasets together with only a few lines of code and Python. If you'd like to see more detail of pandas versus Excel, then definitely check out the video below. Now we can merge different data sets together with only a single line of code. Congratulations because you have completed the beginners section for Python. Next, we will be covering web scraping. See you there want full access to the introduction to real estate Data Analytics course. Then sign up for the course in the link below. You will learn Python programming all with real estate related examples. This includes web scraping, retrieving data from sources like Zillow, realtor, Redfin, Yahoo Finance, US census and more. If you haven't already, check out the introduction video to the course on YouTube to get a full understanding of what the course has to offer. Also members of the free tech and real estate group on Facebook receive a 20% off the course. See in the next lesson
Transcribed by https://otter.ai