Current Postgresql versions have introduced various features for JSON content, but I'm concerned if I really should use them - I mean, there is not yet "best practice" estabilished on what works and what doesn't, or at least I can't find it.

I have a specific example - I have a table about objects which, among other things, contains a list of alternate names for that object. All that data will also be included in a JSON column for retrieval purposes. For example (skipping all the other irrelevant fields).

create table stuff (id serial primary key, data json);
insert into stuff(data) values('{"AltNames":["Name1","Name2","Name3"]}')

I will need some queries in the form "list all objects where one of altnames is 'foobar'." The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index for finding an element in a JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that's not recommended?

当然,classic 的替代方法是为一对多关系添加一个额外的表,其中包含主表的名称和外键;这方面的表现是众所周知的.然而,这也有其自身的缺点,因为这意味着该表和JSON之间的数据重复(可能存在完整性风险);或者创建JSON会在每次请求时动态返回数据,这有其自身的性能损失.

推荐答案

I will need some queries in the form "list all objects where one of altnames is 'foobar'." The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index For Finding Element in JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that's not recommended?

It can be done that way but that doesn't mean that you should. In some sense, the best practice is well documented already (see e.g. using hstore vs using XML vs using EAV vs using a separate table) with a new datatype which, for all intents and practical purposes (besides validation and syntax), is no different from prior unstructured or semi-structured options.

换句话说,这是一头新妆的老猪.

JSON offers the ability to use inverted search tree indexes, in the same way as hstore, array types and tsvectors do. They work fine, but keep in mind that they're primarily designed for extracting points in a neighborhood (think geometry types) ordered by distance, rather than for extracting a list of values in lexicographical order.

To illustrate, take the two plans that Roman's answer outlines:

  • 执行index scan的一个直接遍历磁盘页面,按照索引指示的顺序检索行.
  • The one that does a bitmap index scan starts by identifying every disk page that might contain a row, and reads them as they appear on disk, as if it was (and in fact, precisely like) doing a sequence scan that skips useless areas.

回到你的问题:如果你使用Postgres表作为巨大的JSON存储,凌乱和超大的inverted tree indexes确实会提高应用程序的性能.但它们也不是一颗灵丹妙药,在处理瓶颈时,它们也不会让你达到正确的关系设计.

归根结底,这与您决定使用hstore或EAV时得到的结果没有什么不同:

  1. 如果它需要一个索引(即它经常出现在where子句中,或者更重要的是出现在join子句中),您可能希望数据位于单独的字段中.
  2. If it's primarily cosmetic, JSON/hstore/EAV/XML/whatever-makes-you-sleep-at-night works fine.

Json相关问答推荐

Golang JSON Date Tim.Date()测试请求

如何使用PlayWriter循环访问JSON对象

NIFI-我需要数组的信息,但只需要第一个信息

使用 JOLT 将日期格式转换为 JSON

XSLT 3.0 Json-to-xml,json 包含 html struct

未知的META规范,无法验证.[规范v1.0.1]

如何在 jq 中按 IP 地址排序?

在这种情况下我如何实现 UnmarshalJSON 并且只为一个接口字段定义特殊行为?

从 oracle 数据库中的 json blob 打印值

如何删除 django jsonfield 中的特定项目

Python - 如何将 JSON 文件转换为数据框

使用 jq,将对象数组转换为具有命名键的对象

jQuery.getJSON 和 jQuery.parseJSON 返回 [object Object]?

使用 API 搜索维基百科

使用杰克逊创建一个 json 对象

在自定义 JsonConverter 的 ReadJson 方法中处理空对象

在浏览器中查看 JSON 文件

as_json 没有在关联上调用 as_json

AJAX 将 JavaScript 字符串数组发布到 JsonResult 作为 List 总是返回 Null?

确保数组中的项目属性在 Json Schema 中是唯一的?