简而言之,我在总结计数和聚合函数时遇到了问题,条件是相同的.
假设我有这个数据框:
library(dplyr)
df = tbl_df(data.frame(
company=c("Acme", "Meca", "Emca", "Acme", "Meca", "Emca"),
year=c("2011", "2010", "2009", "2011", "2010", "2013"),
product=c("Wrench", "Hammer", "Sonic Screwdriver", "Fairy Dust",
"Kindness", "Helping Hand"),
price=c("5.67", "7.12", "12.99", "10.99", NA, FALSE)))
它创建了这个数据框架(本质上):
company year product price
1 Acme 2011 Wrench 5.67
2 Meca 2010 Hammer 7.12
3 Emca 2009 Sonic Screwdriver 12.99
4 Acme 2011 Fairy Dust 10.99
5 Meca 2010 Kindness NA
... ... ... ... ...
n Emca 2013 Helping Hand FALSE
假设我想要df <- group_by(df, company, year, product)
,然后在一个集合(即数据帧)中获得以下信息:
- 每个价目表的计数(包括NA、False)
- 每种情况下的计数均为"NA"
- 不含NA和False的平均价格
-
最高价格
summarize(df, count = n()) #satisfies first item obviously
我很难找到其他人.我想我需要使用管道操作员?如果是的话,有人能提供一些指导吗?
这是我try 过的,显然是错误的,但我不确定下一步该怎么做:
summarize(df,
total.count = n(),
count = filter(df, is.na(price)),
avg.price = filter(df, !is.na(price), price != FALSE),
max.price = max(filter(df, !is.na(price), price != FALSE))
是的,我已经审阅了文档,我相信答案在那里,但它们可能太先进了,我无法理解.提前谢谢!