我有以下代码

df = pd.read_csv("some_data.csv")

candles = [Candle(candle["close"].iloc[0], candle["close"].iloc[-1], max(candle["close"]), min(candle["close"]))
           for _, candle in df.groupby(df.index // ticks)]
candles.reverse()

带着一个装满滴答数据的框架.它可以工作,但感觉有点笨拙-所以我的问题:是不是有可能在第一时间分组的反向嵌套?


这是实际数据的一个片段:

timestamp,close,security_code,volume,bid_volume,ask_volume
2024-02-28 01:00:00.358537+00:00,18002.5,NQ,1,0,1
2024-02-28 01:00:00.890809+00:00,18002.75,NQ,1,1,0
2024-02-28 01:00:00.890809+00:00,18002.75,NQ,1,1,0
2024-02-28 01:00:01.696411+00:00,18002.5,NQ,1,0,1
2024-02-28 01:00:02.268716+00:00,18002.25,NQ,1,0,1
2024-02-28 01:00:02.513397+00:00,18002.5,NQ,1,1,0
2024-02-28 01:00:03.716795+00:00,18002.5,NQ,1,0,1
2024-02-28 01:00:03.892441+00:00,18002.75,NQ,1,1,0
2024-02-28 01:00:03.893664+00:00,18002.25,NQ,1,0,1
2024-02-28 01:00:06.956017+00:00,18002.25,NQ,1,0,1
2024-02-28 01:00:08.144158+00:00,18002.25,NQ,1,1,0
2024-02-28 01:00:08.144158+00:00,18002.25,NQ,1,1,0
2024-02-28 01:00:08.772717+00:00,18002.0,NQ,1,0,1
2024-02-28 01:00:08.772717+00:00,18002.0,NQ,3,0,3
2024-02-28 01:00:09.966515+00:00,18002.25,NQ,1,1,0
2024-02-28 01:00:10.051715+00:00,18002.0,NQ,1,0,1
2024-02-28 01:00:11.053980+00:00,18001.75,NQ,1,0,1
2024-02-28 01:00:11.053980+00:00,18001.75,NQ,1,0,1
2024-02-28 01:00:11.296008+00:00,18002.0,NQ,1,1,0
2024-02-28 01:00:12.050765+00:00,18001.75,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.5,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.5,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.5,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.5,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.5,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.25,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.25,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.25,NQ,1,0,1
2024-02-28 01:00:12.050765+00:00,18001.25,NQ,2,0,2

推荐答案

直观地看,在Groupby内部进行聚合似乎会更有效率.举个小例子:

class Candle:
    def __init__(self, open, close, high, low):
        self.open = open
        self.close = close
        self.high = high
        self.low = low
        
df = pd.DataFrame({ 'close' : random.choices(range(50, 70),k=50) })

df['close'].values
#
# array([66, 67, 57, 65, 64, 63, 59, 54, 57, 50, 58, 67, 69, 53, 54, 53, 54,
#        62, 53, 67, 69, 51, 65, 64, 56, 63, 58, 54, 50, 51, 63, 69, 55, 66,
#        54, 54, 64, 52, 52, 58, 57, 61, 64, 63, 53, 64, 50, 52, 68, 63],
#       dtype=int64)

candles = (df[::-1]
    .groupby(df.index[::-1]//ticks, sort=False)['close']
    .agg(open='last', close='first', high='max', low='min')
    .apply(lambda g:Candle(*g), axis=1)
    .tolist()
)

for c in candles:
    print(c.__dict__)

示例输出:

{'open': 64, 'close': 63, 'high': 68, 'low': 50}
{'open': 57, 'close': 53, 'high': 64, 'low': 53}
{'open': 54, 'close': 58, 'high': 64, 'low': 52}
{'open': 63, 'close': 54, 'high': 69, 'low': 54}
{'open': 63, 'close': 51, 'high': 63, 'low': 50}
{'open': 69, 'close': 56, 'high': 69, 'low': 51}
{'open': 53, 'close': 67, 'high': 67, 'low': 53}
{'open': 58, 'close': 54, 'high': 69, 'low': 53}
{'open': 63, 'close': 50, 'high': 63, 'low': 50}
{'open': 66, 'close': 64, 'high': 67, 'low': 57}

Python相关问答推荐

将DF中的名称与另一DF拆分并匹配并返回匹配的公司

如何在箱形图中添加绘制线的传奇?

在Python中管理打开对话框

如何在Django基于类的视图中有效地使用UTE和RST HTIP方法?

梯度下降:简化要素集的运行时间比原始要素集长

连接一个rabrame和另一个1d rabrame不是问题,但当使用[...]'运算符会产生不同的结果

多指标不同顺序串联大Pandas 模型

多处理队列在与Forking http.server一起使用时随机跳过项目

Polars asof在下一个可用日期加入

如何在Python中使用另一个数据框更改列值(列表)

try 检索blob名称列表时出现错误填充错误""

Flash只从html表单中获取一个值

LocaleError:模块keras._' tf_keras. keras没有属性__internal_'''

OpenGL仅渲染第二个三角形,第一个三角形不可见

如何从比较函数生成ngroup?

当HTTP 201响应包含 Big Data 的POST请求时,应该是什么?  

TypeError:';Locator';对象无法在PlayWriter中使用.first()调用

FileNotFoundError:[WinError 2]系统找不到指定的文件:在os.listdir中查找扩展名

在不中断格式的情况下在文件的特定部分插入XML标签

Pandas 数据框自定义排序功能