Python 中具有相同键的哈希字典

发布于12月31日

Context:个

我有一组具有相同密钥但值不同的日志(log).密钥保证为str，值可以是str或None.

例如:

logs = [
 {'id': '1234', 'error': None, 'fruit': 'orange'},
 {'id': '12345', 'error': None, 'fruit': 'apple'}
]

有时，这些日志(log)会被复制.相同的键和值对于词典是相同的.

{'id': '1234', 'error': None, 'fruit': 'orange'}
{'id': '1234', 'error': None, 'fruit': 'orange'}

我正在按如下方式处理它们:

already_seen = set()
for log in logs:
    log_hash = compute_hash(log)
    if log_hash in already_seen:
        continue
    already_seen.add(log_hash)

最初，我使用了以下散列函数:

def compute_hash(log_dict: dict):
    return hash(log_dict.values())

但这gave the same hash for different dictionaries美元.因此，我将该函数更改为:

def compute_hash(log_dict: dict):
    return hash(frozenset(log_dict.values()))

现在它工作正常(据我所知).

Questions:

为什么最初的方法没有奏效？
目前的做法正确吗？其中，正确性是:
- 具有相同值的词典的相同哈希值.
- 具有不同值的词典的哈希值不同.
有没有更好或更简单的方法来解决这个问题？

示例:

dict_1 = {
    "date": "2022-12-30 18:04:35.488",
    "tenant": "hz",
    "action": "REFRESH_KUBE_WORKLOAD",
    "code": "SERVER_SENT_REQ_TO_AGENT",
    "message_id": "ae62aa57-b155-452f-b7bd-740c18110aa7",
    "error": None,
}
dict_2 = {
    "date": "2022-12-30 18:04:35.479",
    "tenant": "hz",
    "action": "REFRESH_KUBE_WORKLOAD",
    "code": "SERVER_FAIL",
    "message_id": "5b0eb8d9-9c6a-4c6d-bd27-b03ac0b865e8",
    "error": " Timeout",
}

print(hash(dict_1.values()) == hash(dict_2.values()))

Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> dv = {1:2, 3:4}.values() >>> bin(id(dv)) '0b11111110101110011010010110000001010101011110000' >>> bin(hash(dv)) '0b1111111010111001101001011000000101010101111'

from collections import defaultdict # We can't use a set to remember dicts with the same custom hash - # that's the problem we were trying to work around in the first place! already_seen = defaultdict(list) for log in logs: log_hash = compute_hash(log) if log in already_seen.get(log_hash, ()): continue already_seen[log_hash].append(log)

from dataclasses import dataclass from typing import Optional # It's not very well explained in the documentation, but # when the type is declared to be immutable using `frozen=True`, # it will have a __hash__ generated as well. @dataclass(frozen=True) class LogEntry: id: str error: Optional[str] fruit: str # convert existing logs logs = [LogEntry(**d) for d in logs] # Now, since the type is hashable, we can directly deduplicate with a set: set(logs)

Python 中具有相同键的哈希字典

推荐答案

Python相关问答推荐

使用新的类型语法正确注释ParamSecdecorator (3.12)

如何根据参数推断对象的返回类型？

未删除映射表的行

如何在python xsModel库中定义一个可选[December]字段，以产生受约束的SON模式

如何在python polars中停止otherate()，当使用when()表达式时？

无论输入分辨率如何，稳定扩散管道始终输出512 * 512张图像

ruamel.yaml dump：如何阻止map标量值被移动到一个新的缩进行？

巨 Python ：逆向猜谜游戏

需要帮助使用Python中的Google的People API更新联系人的多个字段'

我可以不带视频系统的pygame，只用于游戏手柄输入吗？''

将字节序列解码为Unicode字符串

一维不匹配两个数组上的广义ufunc

将像素信息写入文件并读取该文件

如何在Quarto中的标题页之前创建序言页

为什么在更新Pandas 2.x中的列时，数据类型不会更改，而在Pandas 1.x中会更改？

Pandas ，快速从词典栏中提取信息到新栏

PYODBC错误(SQL包含-26272个参数标记，但提供了235872个参数，HY 000)

对列中的数字进行迭代，得到n次重复开始的第一个行号

Pandas查找给定时间戳之前的最后一个值

从`end_date`回溯，如何计算以极为单位的滚动统计量？