计算每个两个分组变量的组合的列总和
计算每个两个分组变量的组合的列总和
此问题已经有答案了:
我有一个数据集,看起来像这样:
Type Age count1 count2 Year Pop1 Pop2 TypeDescrip A 35 1 1 1990 30000 50000 alpha A 35 3 1 1990 30000 50000 alpha A 45 2 3 1990 20000 70000 alpha B 45 2 1 1990 20000 70000 beta B 45 4 5 1990 20000 70000 beta
我想要将匹配Type和Age列的行的计数相加。因此,理想情况下我会得到一个类似于这样的数据集:
Type Age count1 count2 Year Pop1 Pop2 TypeDescrip A 35 4 2 1990 30000 50000 alpha A 45 2 3 1990 20000 70000 alpha B 45 6 6 1990 20000 70000 beta
我试过使用嵌套的duplicated()
语句,例如以下内容:
typedup = duplicated(df$Type) bothdup = duplicated(df[(typedup == TRUE),]$Age)
但是这将返回重复age或type的索引,并不一定是在一个行中具有重复数据的情况。
我也试过tapply:
tapply(c(df$count1, df$count2), c(df$Age, df$Type), sum)
但这个输出难以处理。我想要最后得到一个data.frame。
我不想使用for循环,因为我的数据集很大。
admin 更改状态以发布 2023年5月24日
尝试一下
library(dplyr) df1 %>% group_by(Type, Age) %>% summarise_each(funs(sum)) # Type Age count1 count2 #1 A 35 4 2 #2 A 45 2 3 #3 B 45 6 6
在更新版本的 dplyr
中
df1 %>% group_by(Type, Age) %>% summarise_all(sum)
或者使用 base R
aggregate(.~Type+Age, df1, FUN=sum) # Type Age count1 count2 #1 A 35 4 2 #2 A 45 2 3 #3 B 45 6 6
或
library(data.table) setDT(df1)[, lapply(.SD, sum), .(Type, Age)] # Type Age count1 count2 #1: A 35 4 2 #2: A 45 2 3 #3: B 45 6 6
更新
基于新的数据集,
df2 %>% group_by(Type, Age,Pop1, Pop2, TypeDescrip) %>% summarise_each(funs(sum), matches('^count')) # Type Age Pop1 Pop2 TypeDescrip count1 count2 #1 A 35 30000 50000 alpha 4 2 #2 A 45 20000 70000 beta 2 3 #3 B 45 20000 70000 beta 6 6
数据
df1 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L, 35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L, 1L, 3L, 1L, 5L)), .Names = c("Type", "Age", "count1", "count2" ), class = "data.frame", row.names = c(NA, -5L)) df2 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L, 35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L, 1L, 3L, 1L, 5L), Year = c(1990L, 1990L, 1990L, 1990L, 1990L), Pop1 = c(30000L, 30000L, 20000L, 20000L, 20000L), Pop2 = c(50000L, 50000L, 70000L, 70000L, 70000L), TypeDescrip = c("alpha", "alpha", "beta", "beta", "beta")), .Names = c("Type", "Age", "count1", "count2", "Year", "Pop1", "Pop2", "TypeDescrip"), class = "data.frame", row.names = c(NA, -5L))