I have a pandas dataframe like this:

    col
0   3
1   5
2   9
3   5
4   6
5   6
6   11
7   6
8   2
9   10

that could be created in Python with the code:

import pandas as pd

df = pd.DataFrame(
    {
        'col': [3, 5, 9, 5, 6, 6, 11, 6, 2, 10]
    }
)

I want to find the rows that have a value greater than 8, and also there is at least one row before them that has a value less than 4.

So the output should be:

    col
2   9
9   10

You can see that index 0 has a value equal to 3 (less than 4) and then index 2 has a value greater than 8. So we add index 2 to the output and continue to check for the next rows. But we don't anymore consider indexes 0, 1, 2, and reset the work.

Index 6 has a value equal to 11, but none of the indexes 3, 4, 5 has a value less than 4, so we don't add index 6 to the output.

Index 8 has a value equal to 2 (less than 4) and index 9 has a value equal to 10 (greater than 8), so index 9 is added to the output.

It's my priority not to use any for-loops for the code.

Have you any idea about this?

推荐答案

Boolean indexing to the rescue:

# value > 8
m1 = df['col'].gt(8)

# get previous value <4
# check if any occurred previously
m2 = df['col'].shift().lt(4).groupby(m1[::-1].cumsum()).cummax()

df[m1&m2]

Output:

   col
2    9
9   10

Python相关问答推荐

如何在 django 的 django-ckeditor 中添加自定义模板

如何将 Torch 张量转换为 FastAPI 返回的图像?

使用正则表达式判断特定单词是否出现在其他单词之前

在 for 循环中附加到 numpy 数组或列表 - 哪个更可取?

来自 python defaultdictionary 的给定键的最小值

从对象中的 JMESPath 表达式中提取键

如何处理 Scrapy 的yields 内的错误?

哪一种是在 Django 模型中使用 Python 类型提示的正确方法?

ValueError:xpath 不返回任何 node . python中的Pandas 错误

在未知维度上使用 tf.assert 时出现“TypeError:无法为名称构建 TypeSpec ...”

如何在没有递归的情况下在该函数中返回到函数的开头

AttributeError:“GaussianNB”对象没有属性“var_”

如何在 tensorFlow 和 keras 中为这样的输入和输出示例配置布局

Python:将自定义函数应用于数据框中的多个指定列

根据分组日期值更新列

核对服务器 discord.py 中的每个频道

Python pandas合并/加入复制数据以进行新/旧数据输入

如何使用 BeautifulSoup 只获取必要的

将数据框中的列与单个字符串中的许多项目与单独的列表进行比较并挑选出常见元素

如何通过 CUDA 支持将 BGR NumPy 数组直接传递给 FFMPEG