Based on this question How to query GHTorrent's (SQL-like language) for country/city/users number/repositories number? and first query here https://ghtorrent.org/gcloud.html, I am trying to get an sql query to get the most common coding language per country and ideally per month/year from the GHtorrent bigquery database. I have tried to edit this answer code https://stackoverflow.com/a/65460166/10624798/, but fail to get the correct join. My ideal outcome would looks something like this

country Year Month Language Number of commits total_bytes
US 2016 Jan Python 10000 46789390
CH 2016 Jan Java 20000 5679304

基本上,我不太擅长创建SQL查询.

推荐答案

我判断了您传递的查询的两个示例,然后找到了公共值project_id,并修改了第二个示例,以获得提交的project_idcreated_date.然后,正如你所提到的,我决定将created_date格式化为年份和月份,并将其添加为过滤器.

然后,我将这两个示例合并到一个CTE中,只需SELECT个所需列的名称.

最后,我只使用了ROW_NUMBER来表示按国家/年/月列出的每种语言处理字节的最大值.

WITH ltb as(
 select pl3.lang, sum(pl3.size) as total_bytes, pl3.project_id
from (
 select pl2.bytes as size, pl2.language as lang, pl2.project_id
 from (
   select pl.language as lang, max(pl.created_at) as latest, pl.project_id as project_id
   from `ghtorrent-bq.ght.project_languages` pl
     join `ghtorrent-bq.ght.projects` p on p.id = pl.project_id
   where p.deleted is false
     and p.forked_from is null
   group by lang, project_id
 ) pl1 join `ghtorrent-bq.ght.project_languages` pl2 on pl1.project_id = pl2.project_id
                                       and pl1.latest = pl2.created_at
                                       and pl1.lang = pl2.language
) pl3
group by pl3.lang, pl3.project_id
order by total_bytes desc
), fprt as(
SELECT country_code, count(*) AS NoOfCommits, c.project_id,
FORMAT_TIMESTAMP("%m", c.created_at)
 AS formattedmonth,FORMAT_TIMESTAMP("%b", c.created_at)
 AS formattedmonthname, FORMAT_TIMESTAMP("%Y", c.created_at)
 AS formattedyear,
FROM `ghtorrent-bq.ght.commits` AS c
JOIN `ghtorrent-bq.ght.users` AS u
ON c.Committer_Id = u.id
WHERE NOT u.fake and country_code is not null
GROUP BY country_code, c.project_id, formattedmonth, formattedyear, formattedmonthname
ORDER BY NoOfCommits DESC
), almst as(
SELECT country_code,formattedmonth, formattedmonthname, formattedyear, lang, NoOfCommits, total_bytes FROM fprt JOIN ltb
on ltb.project_id=fprt.project_id
where country_code is not null
)
SELECT country_code, formattedyear as year, formattedmonthname as month, lang, NoOfCommits, total_bytes
FROM
(
   SELECT *, ROW_NUMBER() OVER (PARTITION BY country_code, formattedyear, formattedmonth ORDER BY total_bytes DESC) rn
   FROM almst
) t
WHERE rn = 1
ORDER BY formattedyear asc, formattedmonth asc

输出:

enter image description here

Mysql相关问答推荐

完全相同的A B表达在SQL中的不同上下文中意外返回不同的结果

—符号在哪里条件""

MySQL连接序列

根据上一行S列结果计算列的查询

为什么MySQL派生条件下推优化不起作用

将 GORM 与自定义连接表和外键结合使用

Eloquent:通过国外模型得到平均分

MySQL 8.0.33 Select json列时出现错误:排序内存不足,请考虑增加服务器排序缓冲区大小

按唯一列排序,但保持匹配的列在一起

在 Spring Data jpa 中的 @Query 中无法识别本机查询

如何创建将行转换为列的 MySQL 查询?

如何从将分钟、天、月和年存储在不同列中的表中查询数据

如何查看 rdsadmin 在 mysql 实例上的权限?

自动登录 phpMyAdmin

仅在 MYSQL DATEDIFF 中显示小时数

我可以在单个 Amazon RDS 实例上创建多少个数据库

使用 PHP 和 MySQL 存储和显示 unicode 字符串 (हिन्दी)

Mysql:创建表时将DATETIME的格式设置为'DD-MM-YYYY HH:MM:SS'

按 15 分钟间隔对 mysql 查询进行分组

PHP mySQL - 将新记录插入表中,主键自动递增