SQL 只需要 GROUP BY SELECT 的一列

发布于05月27日

This is the dataset I'm using:个

country_or_area	year	comm_code	commodity	flow	trade_usd	weight_kg	quantity_name	quantity	category
Belgium	2016	920510	Brass-wind instru...	Export	571297	3966.0	Number of items	4135.0	92_musical_instru...
Guatemala	2008	660200	Walking-sticks, s...	Export	35022	5575.0	Number of items	10089.0	66_umbrellas_walk...
Barbados	2006	220210	Beverage waters, ...	Re-Export	81058	44458.0	Volume in litres	24113.0	22_beverages_spir...
Tunisia	2016	780411	Lead foil of a th...	Import	4658	121.0	Weight in kilograms	121.0	78_lead_and_artic...
Lithuania	1996	560110	Sanitary towels, ...	Export	76499	5419.0	Weight in kilograms	5419.0	56_wadding_felt_n...

This is the question I need to answer:个

2016年(按流量类型)商业化程度最高的商品(按出现次数计算)

I need to group by flow only, but I don't know how can I do it.个

query = '''
        SELECT flow, commodity, MAX(quantity) quantity
        FROM (
          SELECT flow, commodity, COUNT(*) quantity
          FROM transactions
          WHERE year = 2016
          GROUP BY flow, commodity
        )
        GROUP BY flow
        '''
spark.sql(query).show(10)

The result I'm expecting is something like this:个

[('Export', ('Sweet biscuits, waffles and wafers', 24)),
 ('Import', ('Baking powders, prepared', 27)),
 ('Re-Export', ('Glues or adhesives, prepared nes, package > 1kg', 8)),
 ('Re-Import', ('Footwear,sole rubber/plastic,upper textile, not sport', 5))]

SELECT flow, commodity, cnt FROM ( SELECT flow, commodity, COUNT(*) AS cnt, ROW_NUMBER() OVER(PARTITION BY flow, commodity ORDER BY COUNT(*)) rn FROM transactions WHERE year = 2016 GROUP BY flow, commodity ) WHERE rn = 1

SELECT flow, commodity, cnt FROM ( SELECT flow, commodity, COUNT(*) AS cnt, MAX(COUNT(*)) OVER(PARTITION BY flow, commodity) maxcnt FROM transactions WHERE year = 2016 GROUP BY flow, commodity ) WHERE cnt = maxcnt

SQL 只需要 GROUP BY SELECT 的一列

推荐答案

Sql相关问答推荐

如何在SQL Server中列出从当前月份开始的过go 10年中的月份

用于匹配红旗和绿旗的SQL查询

没有循环的SQL更新多个XML node 值

如何在PostgreSQL中对第1，1，1，1，2，2，2，2行进行编号

将日期时间转换为日期格式

正在try 从SQL获取最新的ID和一个唯一名称

嵌套Json对象的SQL UPDATE WHERE

从单个表达式中的分隔字符串中取平均值

在SQL中转换差异表的多列

递归 CTE 附加为行

如何对 jsonb 中的字段执行求和，然后使用结果过滤查询的输出

PostgreSQL中如何提取以特定字符开头的字符串中的所有单词？

SQL获取两个日期范围之间的计数

PostgreSQL - 从同一张表中获取值

在 SQL 查询中创建滚动日期

字符串从更改到表列和查询中的一行的转换错误

MIN MAX 值与条件绑定

如何将多行的查询结果合并为一行

条件意外地显着降低性能的地方

从不同的表中 Select 包含单词列表的记录