测试 SQL 查询的最佳方法

发布于04月16日

我遇到了一个问题，就是我们不断有复杂的SQL查询出错.本质上，这会导致向不正确的客户发送邮件以及其他类似的"问题".

每个人创建这样的SQL查询的经验是什么？我们每隔一周创建一组新的数据.

以下是我的一些 idea 和局限性:

Creating test data虽然这将证明我们拥有所有正确的数据，但并不强制排除生产中的异常情况.这些数据在今天被认为是错误的，但在10年前可能是正确的；它没有文档记录，因此我们只有在提取数据后才知道它.
Create Venn diagrams and data maps这似乎是测试查询设计的可靠方法，但不能保证实现是正确的.它让开发人员提前规划，并在编写时思考正在发生的事情.

谢谢你的意见.

推荐答案

你不会编写一个有200行长函数的应用程序.您可以将这些长函数分解为更小的函数，每个函数都有一个明确定义的职责.

为什么要这样编写SQL？

Decompose your queries,就像你分解你的功能一样.这使得它们更短、更简单、更容易理解、更容易重构.它允许您在它们之间添加"垫片"，并在它们周围添加"包装"，就像您在过程代码中所做的那样.

你是怎么做到的？通过将查询所做的每一件重要的事情放入视图中.然后从这些简单的视图中生成compose个更复杂的查询，就像从更原始的函数中生成更复杂的函数一样.

最棒的是，对于most个视图组合，您将从RDBMS中获得完全相同的性能.(对某些人来说，你不会这样做；那又怎样？过早优化是万恶之源.首先要正确编码，如果需要，就进行then次优化.)

Here's an example of using several view to decompose a complicated query.

在本例中，由于每个视图只添加一个转换，因此可以独立测试每个视图以发现错误，并且测试非常简单.

下面是示例中的基表:

create table month_value( 
    eid int not null, month int, year int,  value int );

这个表格有缺陷，因为它使用了两列，月份和年份，来表示一个数据，一个绝对月份.以下是我们对新的计算列的说明:

我们将这样做作为一个线性变换，这样它的排序与(年、月)相同，并且对于任何(年、月)元组，只有一个值，所有值都是连续的:

create view cm_absolute_month as 
select *, year * 12 + month as absolute_month from month_value;

现在我们要测试的是我们规范中固有的内容，即对于任何元组(年、月)，都有一个且只有一个(绝对月)，并且(绝对月)是连续的.让我们写一些测试.

我们的测试将是一个SQL select查询，具有以下 struct :一个测试名称和一个case语句连接在一起.测试名称只是一个任意字符串.case语句只有case when个测试语句then 'passed' else 'failed' end个.

测试语句将只是SQL Select (子查询)，必须为true才能通过测试.

这是我们的第一个测试:

--a select statement that catenates the test name and the case statement
select concat( 
-- the test name
'For every (year, month) there is one and only one (absolute_month): ', 
-- the case statement
   case when 
-- one or more subqueries
-- in this case, an expected value and an actual value 
-- that must be equal for the test to pass
  ( select count(distinct year, month) from month_value) 
  --expected value,
  = ( select count(distinct absolute_month) from cm_absolute_month)  
  -- actual value
  -- the then and else branches of the case statement
  then 'passed' else 'failed' end
  -- close the concat function and terminate the query 
  ); 
  -- test result.

运行该查询将生成以下结果:For every (year, month) there is one and only one (absolute_month): passed

只要月_值中有足够的测试数据，该测试就有效.

我们还可以添加一个测试，以获得足够的测试数据:

select concat( 'Sufficient and sufficiently varied month_value test data: ',
   case when 
      ( select count(distinct year, month) from month_value) > 10
  and ( select count(distinct year) from month_value) > 3
  and ... more tests 
  then 'passed' else 'failed' end );

现在让我们测试它的连续性:

select concat( '(absolute_month)s are consecutive: ',
case when ( select count(*) from cm_absolute_month a join cm_absolute_month b 
on (     (a.month + 1 = b.month and a.year = b.year) 
      or (a.month = 12 and b.month = 1 and a.year + 1 = b.year) )  
where a.absolute_month + 1 <> b.absolute_month ) = 0 
then 'passed' else 'failed' end );

现在，让我们将测试(只是查询)放入一个文件中，并针对数据库运行该脚本.事实上，如果我们将视图定义存储在一个脚本(或多个脚本，我建议每个相关视图一个文件)中，以针对数据库运行，我们可以将每个视图的测试添加到same个脚本中，这样(重新)创建视图的行为也会运行视图的测试.这样，当我们重新创建视图时，我们都会得到回归测试，当视图创建与生产运行时，视图也会在生产中进行测试.

测试 SQL 查询的最佳方法

推荐答案

Sql相关问答推荐

如何并行SELECT和RESET？

postgresql插入json不工作

如何转换和汇总行数

如何退回当年的所有参赛作品？""

Oracle SQL-将结果列在单行中

为什么在postgres中，横向连接比相关子查询快？

计算周时出现SQL错误结果

数据库SQL-CTE命名空间(错误？)使用临时视图

明细表中没有记录如何更新主表的值为0

如何在AWS Athena中 Select JSON数组的最后一个元素？

JSON_VALUE 不适用于提取的 json 中的嵌套路径

错误：postgresql 中缺少表评级的 FROM 子句条目

如何对 jsonb 中的字段执行求和，然后使用结果过滤查询的输出

SQL Server - 判断 ids 层次 struct 中的整数 (id)

为什么 get_json_object() 无法从存储在 Hive SQL 表中的 JSON 中提取值？

如何在 DAX 中通过 Window 函数应用订单

添加一列并根据其他列值进行填充

连续几天购买的客户

查找具有相同连接列数据的所有记录

pyspark 将列转换为行