Problem

给出了大量的概率和要采用的百分比

probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]  # Percentage of the list size

我想要高效地计算

{20:[0, 0, 0, 0, 1, 0, 0], 30:[0, 0, 1, 0, 1, 0, 0], 40:[0, 0, 1, 0, 1, 0, 1]}

我不能丢失索引值--值必须保持其原始位置

我到目前为止的try 如下:

Solution number 1

def mark_probabilities_for_multiple_N1(probabilities, N_percentages):
    marked_lists = {}
    sorted_indices = sorted(range(len(probabilities)), key=lambda i: probabilities[i], reverse=True)
    list_size = len(probabilities)
    
    for N_percentage in N_percentages:
        N = int(N_percentage * list_size / 100)
        
        marked_lists[N_percentage] = [1 if i in sorted_indices[:N] else 0 for i in range(len(probabilities))]
        
        # Utilize previously calculated marked lists for smaller N values
        for prev_N_percentage in [prev_N for prev_N in marked_lists if prev_N < N_percentage]:
            marked_lists[N_percentage] = [1 if marked_lists[prev_N_percentage][i] == 1 or marked_lists[N_percentage][i] == 1 else 0 for i in range(len(probabilities))]

    return marked_lists

Solution number 2 - use heapq

将(idx,概率_值)映射到heapq,按probability_value排序

def indicies_n_largest(values_with_indicies, percentage) -> set[int]:  # O(1) exists(int)
    """
    Returns a list of indicies for n largest probabilities in the array.

    :param arr: array of probabilities
    :param percentage: percentage of the largest probabilities to be returned
    returns: list of indicies of the largest probabilities
    """
    fraction = percentage / 100
    samples_num = int(len(values_with_indicies) * fraction)
    result = heapq.nlargest(samples_num, values_with_indicies, key=lambda x: x[1])
    return [x[0] for x in result]


def percentage_indicies_map(action_probs, percentages) -> dict[int, set[int]]:
    """
    Given action probabilities and a list of percentages, return a map of actions' indicies that are considered good,
    for each percentage.
    """
    values_wth_indicies = [(i, x) for i, x in enumerate(action_probs)]

    percentage_indicies_map: dict[
        int, set[int]
    ] = {}  # list of indicies of the largest probabilities

    for percentage in percentages:
        percentage_indicies_map[percentage] = indicies_n_largest(values_wth_indicies, percentage)

    return percentage_indicies_map

推荐答案

您可以try :

probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]

out, s = {}, sorted(enumerate(probabilities), key=lambda k: -k[1])
for p in N_percentages:
    ones = set(i for i, _ in s[:round((p / 100) * len(probabilities))])
    out[p] = [int(i in ones) for i in range(len(probabilities))]

print(out)

打印:

{
  20: [0, 0, 0, 0, 1, 0, 0], 
  30: [0, 0, 1, 0, 1, 0, 0], 
  40: [0, 0, 1, 0, 1, 0, 1]
}

Python-3.x相关问答推荐

使用魔方无法从图像中识别单个字符

如何创建多个日志(log)文件

这是重命名极地df列的最好方式吗?

Numpy将3D数组的每个切片相乘以进行转置并对其求和

如何使用PySide6创建切换框架?

在Pandas 数据帧中为小于5位的邮政编码添加前导零

如何创建与导航抽屉一起使用的导航栏

如何确保 GCP Document AI 模型输出与输入文件同名的 JSON?

切片的Python复杂性与元组的星号相结合

双轴上的刻度和标签

将两列合并为一列,将它们制成字典 - pandas - groupby

如何在python 3.10中将列表项(字符串类型)转换为模块函数

通过点和线计算CV2 Homography

魔术8球txt文件列表

Seaborn:注释线性回归方程

为什么 setattr 在绑定方法上失败

无法在 Windows Python 3.5 上安装 Levenshtein 距离包

Python:遍历子列表

如何用pymongo连接远程mongodb

Python pathlib 获取父级相对路径