使用 PostgreSQL 修剪尾随空格

发布于03月28日

我有一列eventDate，其中包含尾随空格.我试图用PostgreSQL函数TRIM()删除它们.更具体地说，我正在 run :

SELECT TRIM(both ' ' from eventDate) 
FROM EventDates;

然而，尾随空格不会消失.此外，当我try 从日期中修剪另一个字符(例如数字)时，它也不会修剪.如果我读对了the manual，这应该行得通.有什么 idea 吗？

推荐答案

有许多不同的隐形角色.其中许多都有Unicode格式的属性WSpace=Y("空白").但有些特殊字符不被视为"空白"，仍然没有可见的表示形式.维基百科关于space (punctuation)和whitespace characters的优秀文章应该会给你一个 idea .

<rant>Unicode sucks in this regard: introducing lots of exotic characters that mainly serve to confuse people.</rant>

默认情况下，The standard SQL trim() function只修剪基本拉丁空格字符(Unicode:U+0020/ASCII 32).rtrim() and ltrim()个变种也一样.你的电话也只针对那个特定的角色.

使用带有regexp_replace()的正则表达式.

拖尾的

要删除字符串中的all trailing white space(但不是空格inside)，请执行以下操作:

SELECT regexp_replace(eventdate, '\s+$', '') FROM eventdates;

The regular expression explained:
\s ... regular expression class shorthand for [[:space:]]
- which is the set of white-space characters - see limitations below
+ ... 1 or more consecutive matches
$ ... end of string

演示:

SELECT regexp_replace('inner white   ', '\s+$', '') || '|'

inner white|

是的，这是single反斜杠(\).相关答案中的详细信息:

SQL select where column begins with \

主要的

要删除100(但不是字符串中的空白):

regexp_replace(eventdate, '^\s+', '')

^ .. 字符串开头

二者都

要删除100，可以链接以上函数调用:

regexp_replace(regexp_replace(eventdate, '^\s+', ''), '\s+$', '')

Or you can combine both in a single call with two branches.
Add 'g' as 4th parameter to replace all matches, not just the first:

regexp_replace(eventdate, '^\s+|\s+$', '', 'g')

但通常情况下，substring()的速度会更快:

substring(eventdate, '\S(?:.*\S)*')

\S ... everything but white space
(?:re) ... non-capturing set of parentheses
.* ... any string of 0-n characters

或者其中之一:

substring(eventdate, '^\s*(.*\S)')
substring(eventdate, '(\S.*\S)')  -- only works for 2+ printing characters

(102) ... Capturing set of parentheses

有效地获取第一个非空白字符，以及最后一个非空白字符(如果可用)之前的所有内容.

空白？

还有一些related characters which are not classified as "whitespace" in Unicode个字符，所以不包含在字符类[[:space:]]中.

这些文字在pgAdmin中以不可见的字形打印:"蒙古语元音"、"零宽度空格"、"零宽度非连接符"、"零宽度连接符":

SELECT E'\u180e', E'\u200B', E'\u200C', E'\u200D';

'᠎' | '' | '‌' | '‍'

还有两个，在pgAdmin中打印为visible个字形，但在我的浏览器中不可见:"word joiner"、"零宽度非中断空间":

SELECT E'\u2060', E'\uFEFF';
'⁠' | ''

最终，字符是否呈现为不可见也取决于用于显示的字体.

To remove all of these as well, replace '\s' with '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]' or '[\s᠎‌‍⁠]' (note trailing invisible characters!).
Example, instead of:

regexp_replace(eventdate, '\s+$', '')

使用:

regexp_replace(eventdate, '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]+$', '')

或者:

regexp_replace(eventdate, '[\s᠎‌‍⁠]+$', '')  -- note invisible characters

局限性

还有Posix character class [[:graph:]]个代表"可见字符".例子:

substring(eventdate, '([[:graph:]].*[[:graph:]])')

它在每个设置中都能可靠地处理ASCII字符(可以归结为[\x21-\x7E]个)，但除此之外，您当前(包括第10页)还依赖于底层操作系统(定义ctype)提供的信息，以及可能的语言环境设置.

严格来说，every指的是一个字符类，但似乎与graph等不太常用的字符类存在更多分歧.但是，您可能需要向字符类[[:space:]](简写\s)添加更多字符才能捕获所有空白字符.Like: \u2007, \u202f and \u00a0 seem to also be missing for @XiCoN JFS

The manual:

在括号表达式中，括在括号中的字符类的名称

我的.

还要注意这个限制是fixed with Postgres 10:

修复正则表达式对大字符的字符类处理

以前，这样的角色从未被认为属于

使用 PostgreSQL 修剪尾随空格

推荐答案

拖尾的

主要的

二者都

空白？

局限性

Postgresql相关问答推荐

Redis作为postgreSQL嵌套数据的缓存

是否可以在psql查询输出中删除新行末尾的+号？

如何在PostgreSQL中更改分区的表空间？

即使不对订阅服务器进行任何修改，Postgres逻辑复制也可能发生冲突吗？

自左联接的POSTGIS更新

在Go中从Kafka读取并写入PostgreSQL时如何处理错误？

如何创建一个触发器来传播对主键表的更新？

postgres 如何计算多列哈希？

无法从在 wsl 2 上运行的服务之一连接到在 wsl 2 上的容器中运行的 postgres 数据库

在 KnexJS 中插入

我可以在 Ruby on Rails 上编写 PostgreSQL 函数吗？

PostgreSQL 返回准确或最接近查询日期的日期

在postgres的同一列中存储不同数据类型的合理方法？

如何在可选参数上查询 postgres？

mysql_insert_id 替代 postgresql

如何在 PostgreSQL 中创建 guid

从没有行的计数中获取 0 值

如何减少存储(缩减)我的 RDS 实例？

postgresql中的移动平均线

如何在postgres中将整数分钟转换为间隔