Python numba jitClass，记录类型为字符串

发布于04月20日

v3变量是字符串值.我无法使用下面的代码运行，这会出现错误.

import numpy as np
import pandas as pd
from numba.experimental import jitclass
from numba import types
import os

os.environ['NUMBA_VERBOSE'] = '1'

# ----- BEGINNING OF THE MODIFIED PART ----- #
recordType = types.Record([
    ('v', {'type': types.int64, 'offset': 0, 'alignment': None, 'title': None}),
    ('v2', {'type': types.float64, 'offset': 8, 'alignment': None, 'title': None}),
    ('v3', {'type': types.bytes, 'offset': 16, 'alignment': None, 'title': None})
], 32, False)
spec = [
    ('data', types.Array(recordType, 1, 'C', False))
]
# ----- END OF THE MODIFIED PART ----- #

@jitclass(spec)
class Test:
    def __init__(self, data):
        self.data = data

    def loop(self):
        v = self.data['v']
        v2 = self.data['v2']
        v3 = self.data['v3']
        print("Inside loop:")
        print("v:", v)
        print("v2:", v2)
        print("v3:", v3)

# Create a dictionary with the data
data = {'v': [1, 2, 3], 'v2': [1.0, 2.0, 3.0], 'v3': ['a', 'b', 'c']}

# Create the DataFrame
df = pd.DataFrame(data)

# Define the structured array dtype
dtype = np.dtype([
    ('v', np.int64),
    ('v2', np.float64),
    ('v3', 'S10')  # Byte string with maximum length of 10 characters
])

print(df.to_records(index=False))

# Create the structured array
data_array = np.array(list(df.to_records(index=False)), dtype=dtype)

print("Original data array:")
print(data_array)

# Create an instance of the Test class
test = Test(data_array)
test.loop()

错误:

/home/totaljj/miniconda3/bin/conda run -n bt --no-capture-output python /home/totaljj/bt_lite_strategies/test/test_units/test_numba_obj.py 
Traceback (most recent call last):
  File "/home/totaljj/bt_lite_strategies/test/test_units/test_numba_obj.py", line 13, in <module>
    ('v3', {'type': types.bytes, 'offset': 16, 'alignment': None, 'title': None})
AttributeError: module 'numba.core.types' has no attribute 'bytes'
ERROR conda.cli.main_run:execute(124): `conda run python /home/totaljj/bt_lite_strategies/test/test_units/test_numba_obj.py` failed. (See above for error)

Process finished with exit code 1,

import numpy as np import pandas as pd from numba.experimental import jitclass from numba import types import os os.environ['NUMBA_VERBOSE'] = '1' # ----- BEGINNING OF THE MODIFIED PART ----- # recordType = types.Record([ ('v', {'type': types.int64, 'offset': 0, 'alignment': None, 'title': None}), ('v2', {'type': types.float64, 'offset': 8, 'alignment': None, 'title': None}), ('v3', {'type': types.CharSeq(10), 'offset': 16, 'alignment': None, 'title': None}) ], 26, False) spec = [ ('data', types.Array(recordType, 1, 'C', False)) ] # ----- END OF THE MODIFIED PART ----- # @jitclass(spec) class Test: def __init__(self, data): self.data = data def loop(self): v = self.data['v'] v2 = self.data['v2'] v3 = self.data['v3'] print("Inside loop:") print("v:", v) print("v2:", v2) print("v3:", v3) # Create a dictionary with the data data = {'v': [1, 2, 3], 'v2': [1.0, 2.0, 3.0], 'v3': ['a', 'b', 'c']} # Create the DataFrame df = pd.DataFrame(data) # Define the structured array dtype dtype = np.dtype([ ('v', np.int64), ('v2', np.float64), ('v3', 'S10') # Byte string with maximum length of 10 characters ]) print(df.to_records(index=False)) # Create the structured array data_array = np.array(list(df.to_records(index=False)), dtype=dtype) print("Original data array:") print(data_array) # Create an instance of the Test class test = Test(data_array) test.loop()

注意到

请注意，如果收件箱有很多列，将收件箱转换为记录的成本可能会很高，因为Pandas中的内部默认布局(此处使用的布局)通常是(Numpy)数组的dict.记录使用转置布局，该布局仅适用于迭代每一行以及读取大多数字段时.此外，记录往往会阻止任何低级的向量化，即使用MMO指令(可以使代码速度快得多)，尽管并非所有代码都能从中受益.对于少数列，最好像Pandas一样在内部使用多个数组(尤其是其中包含字符串).请阅读this和this，了解有关数组 struct (SoA)与 struct 数组(AoS)的更多信息.

Python numba jitClass，记录类型为字符串

推荐答案

注意到

Python相关问答推荐

无法使用equals_html从网址获取全文

多处理代码在while循环中不工作

更改matplotlib彩色条的字体并勾选标签？

如何记录脚本输出

图像 pyramid .难以创建所需的合成图像

如何使用数组的最小条目拆分数组

Julia CSV for Python中的等效性Pandas index_col参数

在极性中创建条件累积和

NumPy中条件嵌套for循环的向量化

当递归函数的返回值未绑定到变量时，非局部变量不更新：

多处理队列在与Forking http.server一起使用时随机跳过项目

如何更新pandas DataFrame上列标题的de值？

如何在Python中获取`Genericums`超级类型？

Pandas Data Wrangling/Dataframe Assignment

在Python中使用yaml渲染(多行字符串)

基于Scipy插值法的三次样条系数

将字节序列解码为Unicode字符串

无法在盐流道中获得柱子

将相应的值从第2列合并到第1列(Pandas )

两个名称相同但值不同的 Select 都会产生相同的值(discord.py)