2020年11月22日 星期日

[Python]Datafram merge (inner join, outer join, left, right)

Pandas Merge Tutorial

Basic Merge Using a Dataframe Column

In [28]:
  1. import pandas as pd
  2. df1 = pd.DataFrame({
  3. "city": ["new york","chicago","orlando"],
  4. "temperature": [21,14,35],
  5. })
  6. df1
Out[28]:
city temperature
0 new york 21
1 chicago 14
2 orlando 35
In [29]:
  1. df2 = pd.DataFrame({
  2. "city": ["chicago","new york","orlando"],
  3. "humidity": [65,68,75],
  4. })
  5. df2
Out[29]:
city humidity
0 chicago 65
1 new york 68
2 orlando 75
In [30]:
  1. df3 = pd.merge(df1, df2, on="city")
  2. df3
Out[30]:
city temperature humidity
0 new york 21 68
1 chicago 14 65
2 orlando 35 75

Type Of DataBase Joins

<img src="db_joins.jpg" height="800", width="800">

In [31]:
  1. df1 = pd.DataFrame({
  2. "city": ["new york","chicago","orlando", "baltimore"],
  3. "temperature": [21,14,35, 38],
  4. })
  5. df1
Out[31]:
city temperature
0 new york 21
1 chicago 14
2 orlando 35
3 baltimore 38
In [32]:
  1. df2 = pd.DataFrame({
  2. "city": ["chicago","new york","san diego"],
  3. "humidity": [65,68,71],
  4. })
  5. df2
Out[32]:
city humidity
0 chicago 65
1 new york 68
2 san diego 71
In [33]:
  1. df3=pd.merge(df1,df2,on="city",how="inner")
  2. df3
Out[33]:
city temperature humidity
0 new york 21 68
1 chicago 14 65
In [34]:
  1. df3=pd.merge(df1,df2,on="city",how="outer")
  2. df3
Out[34]:
city temperature humidity
0 new york 21.0 68.0
1 chicago 14.0 65.0
2 orlando 35.0 NaN
3 baltimore 38.0 NaN
4 san diego NaN 71.0
In [35]:
  1. df3=pd.merge(df1,df2,on="city",how="left")
  2. df3
Out[35]:
city temperature humidity
0 new york 21 68.0
1 chicago 14 65.0
2 orlando 35 NaN
3 baltimore 38 NaN
In [36]:
  1. df3=pd.merge(df1,df2,on="city",how="right")
  2. df3
Out[36]:
city temperature humidity
0 new york 21.0 68
1 chicago 14.0 65
2 san diego NaN 71

indicator flag

In [37]:
  1. df3=pd.merge(df1,df2,on="city",how="outer",indicator=True)
  2. df3
Out[37]:
city temperature humidity _merge
0 new york 21.0 68.0 both
1 chicago 14.0 65.0 both
2 orlando 35.0 NaN left_only
3 baltimore 38.0 NaN left_only
4 san diego NaN 71.0 right_only

suffixes

In [38]:
  1. df1 = pd.DataFrame({
  2. "city": ["new york","chicago","orlando", "baltimore"],
  3. "temperature": [21,14,35,38],
  4. "humidity": [65,68,71, 75]
  5. })
  6. df1
Out[38]:
city humidity temperature
0 new york 65 21
1 chicago 68 14
2 orlando 71 35
3 baltimore 75 38
In [39]:
  1. df2 = pd.DataFrame({
  2. "city": ["chicago","new york","san diego"],
  3. "temperature": [21,14,35],
  4. "humidity": [65,68,71]
  5. })
  6. df2
Out[39]:
city humidity temperature
0 chicago 65 21
1 new york 68 14
2 san diego 71 35
In [40]:
  1. df3= pd.merge(df1,df2,on="city",how="outer", suffixes=('_first','_second'))
  2. df3
Out[40]:
city humidity_first temperature_first humidity_second temperature_second
0 new york 65.0 21.0 68.0 14.0
1 chicago 68.0 14.0 65.0 21.0
2 orlando 71.0 35.0 NaN NaN
3 baltimore 75.0 38.0 NaN NaN
4 san diego NaN NaN 71.0 35.0

join

In [58]:
  1. df1 = pd.DataFrame({
  2. "city": ["new york","chicago","orlando"],
  3. "temperature": [21,14,35],
  4. })
  5. df1.set_index('city',inplace=True)
  6. df1
Out[58]:
temperature
city
new york 21
chicago 14
orlando 35
In [59]:
  1. df2 = pd.DataFrame({
  2. "city": ["chicago","new york","orlando"],
  3. "humidity": [65,68,75],
  4. })
  5. df2.set_index('city',inplace=True)
  6. df2
Out[59]:
humidity
city
chicago 65
new york 68
orlando 75
In [60]:
  1. df1.join(df2,lsuffix='_l', rsuffix='_r')
Out[60]:
temperature humidity
city
new york 21 68
chicago 14 65
orlando 35 75

沒有留言:

張貼留言