Python3.x 丢弃重复的索引，并在多索引数据帧中保留一个

发布于12月27日

我的玩具数据df有三层索引:name、name、year，假设名称和内容的索引列name和name都是重复的，所以我需要保留一个.

import pandas as pd

# create MultiIndex
index = pd.MultiIndex.from_tuples([
    ('name1', 'name1', '2020'),
    ('name1', 'name1', '2021'),
    ('name2', 'name2', '2020'),
    ('name2', 'name2', '2021'),
    ('name3', 'name3', '2020'),
    ('name3', 'name3', '2021')
], names=['name', 'name', 'year'])

df = pd.DataFrame({
    'quantity': [10, 15, 20, 25, 30, 35],
    'price': [100, 150, 200, 250, 300, 350]
}, index=index)

print(df)

好了，我们走吧.

                  quantity  price
name  name  year                 
name1 name1 2020        10    100
            2021        15    150
name2 name2 2020        20    200
            2021        25    250
name3 name3 2020        30    300
            2021        35    350

我try 了以下代码，但没有成功:

# Create a Boolean sequence, where TRUE indicates that the index is repeated
duplicates = df.index.duplicated(keep='first')

# Use Bolnes to choose those lines that are not repeated
df = df[~duplicates]
df

好了，我们走吧.

                  quantity  price
name  name  year                 
name1 name1 2020        10    100
            2021        15    150
name2 name2 2020        20    200
            2021        25    250
name3 name3 2020        30    300
            2021        35    350

如果我们reset_index()，然后删除重复的列，我们将得到ValueError: cannot insert name, already exists.

如何获得以下结果？谢谢.

            quantity  price
name  year                 
name1 2020        10    100
      2021        15    150
name2 2020        20    200
      2021        25    250
name3 2020        30    300
      2021        35    350

quantity price name year name1 2020 10 100 2021 15 150 name2 2020 20 200 2021 25 250 name3 2020 30 300 2021 35 350

Python3.x 丢弃重复的索引，并在多索引数据帧中保留一个

推荐答案

Python-3.x相关问答推荐

根据样本量随机 Select 组内样本

是否有必要使用Threads()中的args显式地将共享变量传递给Python中的线程函数或直接访问它？

在Pandas 数据帧中为小于5位的邮政编码添加前导零

如何将参数/值从测试方法传递给pytest的fixture函数？

查找值始终为零的行 pandas

Django 模型类方法使用错误的 `self`

我无法直接在 VSCode 中运行该程序，但可以使用 VScode 中的终端运行它

通过 Pandas 通过用户定义函数重命名数据框列

通过在不重新索引的情况下采用最高概率的百分比，有效地转换 0/1 列表中的概率列表

总结基于条件的值，如果不匹配则保留当前值

参数化泛型不能与类或实例判断一起使用

将字符串表示与使用整数值的枚举相关联？

IronPython 3 支持？

导入 python 模块而不实际执行它

如何在多核上运行 Keras？

如何在 jupyter notebook 5 中逐行分析 python 3.5 代码

无论如何我可以在 Google colaboratory 中下载文件吗？

在python中打印下标

命名参数可以与 Python 枚举一起使用吗？

如何创建一个永远在其上运行滚动协程的事件循环？