数据集中值的 Python 总和

发布于05月14日

I have this dataframe (ID is a string and Value a float):

ID              Value
1               0.0
1.1             0.0
1.2             0.0
1.2.1           27508.42
1.2.2           25861.82
1.3             0.0
1.3.1           0.0
1.3.1.1         0.0
1.3.1.2         0.0
1.3.1.3         30396.25

Whose structure works like this:

1
├── 1.1  
├── 1.2  
│   ├── 1.2.1  
│   └── 1.2.2  
└── 1.3  
    └── 1.3.1  
        ├── 1.3.1.1
        ├── 1.3.1.2    
        └── 1.3.1.3

And need for the value of the 'parent' node to be the sum of the leaves. So:

ID              Value
1               83766.489    (1.1 + 1.2 + 1.3)
1.1             0.0
1.2             53370.24     (1.2.1 + 1.2.2)
1.2.1           27508.42
1.2.2           25861.82
1.3             30396.25     (1.3.1)
1.3.1           30396.25     (1.3.1.1 + 1.3.1.2 + 1.3.1.3)
1.3.1.1         0.0
1.3.1.2         0.0
1.3.1.3         30396.25

How can I group the IDs? Using groupby wont work since all the IDs are unique. Should I change the structure of the dataframe to better reflect the logic of the schema?

推荐答案

Another solution (assuming column ID is sorted):

def counter(x):
    out = []
    for id_, v in zip(x.index, x):
        s = sum(
            v
            for a, v in out
            if a.startswith(id_) and id_.count(".") == a.count(".") - 1
        )
        out.append((id_, s + v))
    return [v for _, v in out]


print(df.set_index("ID")[::-1].apply(counter)[::-1].reset_index())

Prints:

        ID     Value
0        1  83766.49
1      1.1      0.00
2      1.2  53370.24
3    1.2.1  27508.42
4    1.2.2  25861.82
5      1.3  30396.25
6    1.3.1  30396.25
7  1.3.1.1      0.00
8  1.3.1.2      0.00
9  1.3.1.3  30396.25