在Cerberus中对循环数据 struct 模式建模的正确方法是什么?

try #1:

from cerberus import Validator, schema_registry
schema_registry.add("leaf", {"value": {"type": "integer", "required": True}})
schema_registry.add("tree", {"type": "dict", "anyof_schema": ["leaf", "tree"]})
v = Validator(schema = {"root": {"type": "dict", "schema": "tree"}})

错误:

cerberus.schema.Schema错误: {'root': [{
    'schema': [
        'no definitions validate', {
            'anyof definition 0': [{
                'anyof_schema': ['must be of dict type'], 
                'type': ['null value not allowed'],
            }],
            'anyof definition 1': [
                'Rules set definition tree not found.'
            ],
        },
    ]},
]}

try #2:

上面的错误表明需要tree的规则集定义:

from cerberus import Validator, schema_registry, rules_set_registry
schema_registry.add("leaf", {"value": {"type": "integer", "required": True}})
rules_set_registry.add("tree", {"type": "dict", "anyof_schema": ["leaf", "tree"]})
v = Validator(schema = {"root": {"type": "dict", "schema": "tree"}})

v.validate({"root": {"value": 1}})
v.errors
v.validate({"root": {"a": {"value": 1}}})
v.errors
v.validate({"root": {"a": {"b": {"c": {"value": 1}}}}})
v.errors

输出:

False
{'root': ['must be of dict type']}

对于所有3个例子.

预期行为

理想情况下,我希望以下所有文档都通过验证:

v = Validator(schema = {"root": {"type": "dict", "schema": "tree"}})
assert v.validate({"root": {"value": 1}}), v.errors
assert v.validate({"root": {"a": {"value": 1}}}), v.errors
assert v.validate({"root": {"a": {"b": {"c": {"value": 1}}}}}), v.errors

相关问题

推荐答案

警告

The below is not a complete solution.
If someone has a full working solution with cerberus, please share it, and I will happily mark your answer as the solution.

来self 实际问题的额外限制

树的叶子包含一些必须与我正在验证的文档的另一部分相匹配的键.因此,我的自定义Validator中还有一个额外的is_in验证方法.然而,我找不到一种好方法来为叶子设置子验证器,同时仍然在根目录中保留对文档另一部分的引用.

观察

我现在花在"对抗"cerberus的时间比实现自定义输入验证功能所需的时间还要多,因此现在可以try 一下,或者try jsonschema.(编辑:请参阅下面的try #4.)

Attempt #3: cerberus custom validator

希望下面的逻辑仍然对某人有用.

from cerberus import Validator
from typing import Any


class ManifestValidator(Validator):
    def _validate_type_tree(self: Validator, value: Any) -> bool:
        if not isinstance(value, dict):
            return False
        for v in value.values():
            if isinstance(v, dict):
                if all(key in v for key in KEYS):
                    schema = self._resolve_schema(SCHEMA)
                    validator = self._get_child_validator(
                        document_crumb=v,
                        schema_crumb=(v, "schema"),
                        root_document=self.root_document,
                        root_schema=self.root_schema,
                        schema=schema,
                    )
                    if not validator(v, update=self.update) or validator._errors:
                        self._error(validator._errors)
                        return False
                elif not self._validate_type_tree(v):
                    return False
            else:
                return False
        return True

    def _validate_is_in(self: Validator, path: str, field: str, value: str) -> bool:
        """{'type': 'string'}"""
        document = self.root_document
        for element in path.split("."):
            if element not in document:
                self._error(field, f"{path} does not exist in {document}")
                return False
            document = document[element]
        if not isinstance(document, list):
            self._error(
                field,
                f"{path} does not point to a list but to {document} of type {type(document)}",
            )
            return False
        if value not in document:
            self._error(field, f"{value} is not present in {document} at {path}.")
            return False
        return True

Attempt #4: jsonschema + custom validation logic

from jsonschema import validate


SCHEMA = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type" : "object",
    "properties" : {
        "root": {
            "oneOf": [
                {"$ref": "#/$defs/tree",}, 
                {"$ref": "#/$defs/leaf",},
            ],
        },
    },
    "required": [
        "root",
    ],
    "$defs": {
        "tree": {
            "type": "object",
            "patternProperties": {
                "^[a-z]+([_-][a-z]+)*$": {
                    "oneOf": [
                        {"$ref": "#/$defs/tree",}, 
                        {"$ref": "#/$defs/leaf",},
                    ],
                },
            },
            "additionalProperties": False,
        },
        "leaf": {
            "type": "object",
            "properties": {
                # In reality, the leaf is a more complex object, but as a reduction of my problem:
                "value": {
                    "type": "number",
                },
            },
            "required": [
                "value",
            ],
        },
    },
}


TREES = [
    {"root": {"value": 1}},
    {"root": {"a": {"value": 1}}},
    {"root": {"a": {"b": {"c": {"value": 1}}}}},
    {"root": {"a-subtree": {"b-subtree": {"c-subtree": {"value": 1}}}}},
]


for tree in TREES:
    validate(tree, SCHEMA)

对于我的附加约束(is_in),SON指针/SON相对指针/ $data似乎在更简单的情况下很有用,但出于我的需要,我决定在jsonschema验证之后实现自定义验证逻辑,这是证明文档格式良好的第一步.

资源:

Python相关问答推荐

使用SKLearn KMeans和外部生成的相关矩阵

将每个关键字值对转换为pyspark中的Intramame列

当测试字符串100%包含查询字符串时,为什么t fuzzywuzzy s Process.extractBests不给出100%分数?

Tkinter滑动条标签.我不确定如何删除滑动块标签或更改其文本

NumPy中的右矩阵划分,还有比NP.linalg.inv()更好的方法吗?

遵循轮廓中对象方向的计算线

opencv Python稳定的图标识别

2维数组9x9,不使用numpy.数组(MutableSequence的子类)

比较两个数据帧并并排附加结果(获取性能警告)

从收件箱中的列中删除html格式

Python中的嵌套Ruby哈希

如何记录脚本输出

log 1 p numpy的意外行为

修复mypy错误-赋值中的类型不兼容(表达式具有类型xxx,变量具有类型yyy)

我对我应该做什么以及我如何做感到困惑'

mypy无法推断类型参数.List和Iterable的区别

为什么np. exp(1000)给出溢出警告,而np. exp(—100000)没有给出下溢警告?

python中csv. Dictreader. fieldname的类型是什么?'

ConversationalRetrivalChain引发键错误

导入错误:无法导入名称';操作';