我的玩具数据df
有三层索引:name
、name
、year
,假设名称和内容的索引列name
和name
都是重复的,所以我需要保留一个.
import pandas as pd
# create MultiIndex
index = pd.MultiIndex.from_tuples([
('name1', 'name1', '2020'),
('name1', 'name1', '2021'),
('name2', 'name2', '2020'),
('name2', 'name2', '2021'),
('name3', 'name3', '2020'),
('name3', 'name3', '2021')
], names=['name', 'name', 'year'])
df = pd.DataFrame({
'quantity': [10, 15, 20, 25, 30, 35],
'price': [100, 150, 200, 250, 300, 350]
}, index=index)
print(df)
好了,我们走吧.
quantity price
name name year
name1 name1 2020 10 100
2021 15 150
name2 name2 2020 20 200
2021 25 250
name3 name3 2020 30 300
2021 35 350
我try 了以下代码,但没有成功:
# Create a Boolean sequence, where TRUE indicates that the index is repeated
duplicates = df.index.duplicated(keep='first')
# Use Bolnes to choose those lines that are not repeated
df = df[~duplicates]
df
好了,我们走吧.
quantity price
name name year
name1 name1 2020 10 100
2021 15 150
name2 name2 2020 20 200
2021 25 250
name3 name3 2020 30 300
2021 35 350
如果我们reset_index()
,然后删除重复的列,我们将得到ValueError: cannot insert name, already exists
.
如何获得以下结果?谢谢.
quantity price
name year
name1 2020 10 100
2021 15 150
name2 2020 20 200
2021 25 250
name3 2020 30 300
2021 35 350