I have thousands of rows of data in a dataframe as below:

input table

I want to be able to extract only all the rows of data that exist from each start ("Type" column) of platform ("Frame" column) to each end ("Type" column) of platform ("Frame" column) as output below and name the data in class column as P1 (All the rows within the first platform start and platform end), P2 (second platform start and end), P3, P4, etc:

Output table

推荐答案

Welcome to Stack Overflow!

Here's how I would do it. It probably isn't the cleanest solution, but I think it does what you want to do.

# Set up reproductible example (reprex)
import pandas as pd

df = pd.DataFrame({
    "Frame": ["liner", "platform", "liner", "liner", "platform",
              "liner", "platform", "liner", "platform", "liner"],
    "Type": ["group", "start", "single", "single", "end",
             "single", "start", "group", "end", "single"]
})

#    Frame       Type
# 0  liner       group
# 1  plateform   start
# 2  liner       single
# 3  liner       single
# 4  plateform   end
# 5  liner       single
# 6  plateform   start
# 7  liner       group
# 8  plateform   end
# 9  liner       single

Step 1: select rows from start to end of platform

start_indices = df.index[(df.Frame == "platform") & (df.Type == "start")]
end_indices = df.index[(df.Frame == "platform") & (df.Type == "end")]

df = pd.concat([
    df[start:end+1] for start, end in zip(start_indices, end_indices)
])

Step 2: add column with platform number

df["Class"] = (
    pd.Series(
        [f"P{n}" for n in  range(1, len(start_indices) + 1)],
        index=start_indices
    )
    .reindex(df.index)
    .fillna(method="ffill")
)

And here's what you get:

df

#    Frame      Type    Class
# 1  platform   start   P1
# 2  liner      single  P1
# 3  liner      single  P1
# 4  platform   end     P1
# 6  platform   start   P2
# 7  liner      group   P2
# 8  platform   end     P2

Python相关问答推荐

通过增加点的大小绘制 3 维图

不同轴的numpy总和的速度

如何从pyspark中的RDD中获取不同的键作为列表?

如何获得连续条件的最大值为真?

Python试图用plotly覆盖等值线图上的点

使用python中的递归获取列表中最接近的两个数字

正则表达式:如何在负前瞻之前引入非贪婪运算符

基于列名相似性的 Python 3 匹配值

Pandas 保留每组的前 N ​​个值并将其他值设置为 0

如何创建虚拟列进行预测?

我try 使用范围和代码转换元组列表中的元组但是当我使用'for i in list:'try 相同时它不起作用:

Python Pandas,运行总和,基于先前行的值并分组

系统里德所罗门编码方法之间的差异

createsuperuser 在实现自定义用户后给出 KeyError

如何舍入Pandas 数据框中一列集合中的值?

以精确的像素大小保存图形

从数据框中 Select 特定列

如何使用 Selenium 和 Python 查找与其父元素相关的元素

Python本地最小值/最大值,而bin没有改变

根据值对列表中的元素进行排序