I have code that relies heavily on yaml for cross-language serialization and while working on speeding some stuff up I noticed that yaml was insanely slow compared to other serialization methods (e.g., pickle, json).
所以真正让我吃惊的是,在输出几乎相同的情况下,JSON比YAML快得多.
>>> import yaml, cjson; d={'foo': {'bar': 1}}
>>> yaml.dump(d, Dumper=yaml.SafeDumper)
'foo: {bar: 1}\n'
>>> cjson.encode(d)
'{"foo": {"bar": 1}}'
>>> import yaml, cjson;
>>> timeit("yaml.dump(d, Dumper=yaml.SafeDumper)", setup="import yaml; d={'foo': {'bar': 1}}", number=10000)
44.506911039352417
>>> timeit("yaml.dump(d, Dumper=yaml.CSafeDumper)", setup="import yaml; d={'foo': {'bar': 1}}", number=10000)
16.852826118469238
>>> timeit("cjson.encode(d)", setup="import cjson; d={'foo': {'bar': 1}}", number=10000)
0.073784112930297852
PyYaml的CSafeDumper和cjson都是用C编写的,所以这不是C与Python的速度问题.我甚至向它添加了一些随机数据,以查看cjson是否正在进行缓存,但它仍然比PyYaml快得多.我意识到yaml是json的超集,但在如此简单的输入下,yaml序列化程序怎么会慢两个数量级呢?