Ich muss aufeinanderfolgende Zeilen mit denselben Werten für eine Liste von Spalten neu gruppieren. Dank dessen habe ich herausgefunden, wie man es für eine Spalte macht, aber ich kann es nicht für mehr als eine funktionieren lassen.
Meine Frage ist ziemlich nah an dieser, aber ich kann es auch nicht so machen, wie ich es möchte.
Hier ist ein funktionierendes Snippet, in dem die Spalten user
, group
, value1
und value2
identisch sein müssen, um die Zeilen neu zu gruppieren:
#! /bin/python3
import pandas as pd
data = [{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random123"},{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random456"},{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random158"},{"user":"jack","group":"administration","value1":"foo","value2":5,"value3":"random487"},{"user":"jack","group":"administration","value1":"foo","value2":5,"value3":"random435"},{"user":"jack","group":"administration","value1":"bar","value2":3,"value3":"random483"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random431"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random478"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random759"},{"user":"jack","group":"administration","value1":"bar","value2":3,"value3":"random431"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random478"}]
df = pd.DataFrame(data)
print(df)
print("----")
grouped = df.groupby(((df['value2'].shift()!= df['value2'])).cumsum())
for k, v in grouped:
print(f'[group {k}]')
print(v)
Es gibt dies aus:
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 3]
user group value1 value2 value3
8 jack administration bar 3 random483
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 4]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 5]
user group value1 value2 value3
12 jack administration bar 3 random431
13 jack administration foo 3 random478
Aber das brauche ich:
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
[group 3]
user group value1 value2 value3
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 4]
user group value1 value2 value3
8 jack administration bar 3 random483
[group 5]
user group value1 value2 value3
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 6]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 7]
user group value1 value2 value3
12 jack administration bar 3 random431
[group 8]
user group value1 value2 value3
13 jack administration foo 3 random478
Ich habe mehrere Spalten in der Gruppe ausprobiert, aber ohne Erfolg:
grouped = df.groupby(((df[['user', 'value2']].shift()!= df[['user', 'value2']])).cumsum())
#returns
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
Lösung des Problems
Erstellen Sie aufeinanderfolgende Gruppen, indem Sie Spalten aus der Liste mit vergleichen DataFrame.any
und dann die kumulative Summe hinzufügen:
cols = ['user','group','value1','value2']
grouped = df.groupby(((df[cols].shift()!= df[cols]).any(axis=1)).cumsum())
for k, v in grouped:
print(f'[group {k}]')
print(v)
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
[group 3]
user group value1 value2 value3
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 4]
user group value1 value2 value3
8 jack administration bar 3 random483
[group 5]
user group value1 value2 value3
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 6]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 7]
user group value1 value2 value3
12 jack administration bar 3 random431
[group 8]
user group value1 value2 value3
13 jack administration foo 3 random478
Keine Kommentare:
Kommentar veröffentlichen