我只使用了两个临时内联表来计算表大小和Google BigQuery的总数据集大小.这里的数据集称为测试.
一旦创建了这两个临时表(分别称为TABLE_SIZES和DATASET_SIZE),我进行了笛卡尔连接,这应该是可以的,因为DATSET_SIZE只有一列,即DB_SIZE.
WITH table_sizes AS (
SELECT
table_id
, row_count
, ROUND(SUM(size_bytes) / (1024*1024), 2) AS size_in_mb
FROM test.__TABLES__
GROUP BY
table_id,
row_count
), dataset_size AS
(
SELECT
ROUND(SUM(size_bytes)/1024/1024, 2) AS DB_size
FROM test.__TABLES__
)
SELECT
t.table_id
, t.row_count
, t.size_in_mb
, ROUND(t.size_in_mb*100/d.DB_size,1) AS percent
FROM
table_sizes t,
dataset_size d
ORDER BY
t.size_in_mb DESC
LIMIT 5
;
它会产生以下输出
[{
"table_id": "randomdata",
"row_count": "11100330",
"size_in_mb": "1651.42",
"percent": "97.6"
}, {
"table_id": "ocod_full",
"row_count": "95535",
"size_in_mb": "39.98",
"percent": "2.4"
}, {
"table_id": "DUMMY",
"row_count": "10000",
"size_in_mb": "0.99",
"percent": "0.1"
}, {
"table_id": "abcd",
"row_count": "2",
"size_in_mb": "0.0",
"percent": "0.0"
}, {
"table_id": "json",
"row_count": "0",
"size_in_mb": "0.0",
"percent": "0.0"
}]
如果任何人有更好的建议,请添加