I have thousands of rows of data in a dataframe as below:

input table

I want to be able to extract only all the rows of data that exist from each start ("Type" column) of platform ("Frame" column) to each end ("Type" column) of platform ("Frame" column) as output below and name the data in class column as P1 (All the rows within the first platform start and platform end), P2 (second platform start and end), P3, P4, etc:

Output table

推荐答案

Welcome to Stack Overflow!

Here's how I would do it. It probably isn't the cleanest solution, but I think it does what you want to do.

# Set up reproductible example (reprex)
import pandas as pd

df = pd.DataFrame({
    "Frame": ["liner", "platform", "liner", "liner", "platform",
              "liner", "platform", "liner", "platform", "liner"],
    "Type": ["group", "start", "single", "single", "end",
             "single", "start", "group", "end", "single"]
})

#    Frame       Type
# 0  liner       group
# 1  plateform   start
# 2  liner       single
# 3  liner       single
# 4  plateform   end
# 5  liner       single
# 6  plateform   start
# 7  liner       group
# 8  plateform   end
# 9  liner       single

Step 1: select rows from start to end of platform

start_indices = df.index[(df.Frame == "platform") & (df.Type == "start")]
end_indices = df.index[(df.Frame == "platform") & (df.Type == "end")]

df = pd.concat([
    df[start:end+1] for start, end in zip(start_indices, end_indices)
])

Step 2: add column with platform number

df["Class"] = (
    pd.Series(
        [f"P{n}" for n in  range(1, len(start_indices) + 1)],
        index=start_indices
    )
    .reindex(df.index)
    .fillna(method="ffill")
)

And here's what you get:

df

#    Frame      Type    Class
# 1  platform   start   P1
# 2  liner      single  P1
# 3  liner      single  P1
# 4  platform   end     P1
# 6  platform   start   P2
# 7  liner      group   P2
# 8  platform   end     P2

Python相关问答推荐

如何从在虚拟Python环境中运行的脚本中运行需要宿主Python环境的Shell脚本?

基于字符串匹配条件合并两个帧

用NumPy优化a[i] = a[i-1]*b[i] + c[i]的迭代计算

DataFrames与NaN的条件乘法

将9个3x3矩阵按特定顺序排列成9x9矩阵

Python+线程\TrocessPoolExecutor

如何在表中添加重复的列?

使用Python从URL下载Excel文件

Python中的变量每次增加超过1

将scipy. sparse矩阵直接保存为常规txt文件

* 动态地 * 修饰Python中的递归函数

将CSS链接到HTML文件的问题

为用户输入的整数查找根/幂整数对的Python练习

比较两个有条件的数据帧并删除所有不合格的数据帧

在round函数中使用列值

在pandas中,如何在由两列加上一个值列组成的枢轴期间或之后可靠地设置多级列的索引顺序,

为什么在更新Pandas 2.x中的列时,数据类型不会更改,而在Pandas 1.x中会更改?

给定y的误差时,线性回归系数的计算误差

将COLUMN BY GROUP中的值连接为列表,并将其赋值给PANAS数据框中的变量

对齐多个叠置多面Seborn CAT图