这里有两张桌子-
Employees
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Jim | 90000 | 1 |
| 3 | Henry | 80000 | 2 |
| 4 | Sam | 60000 | 2 |
| 5 | Max | 90000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Required Output
Output:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Jim | 90000 |
| Sales | Henry | 80000 |
| IT | Max | 90000 |
+------------+----------+--------+
问题是找到每个部门的最高工资.
我的Python代码是这样的-
import pandas as pd
def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
df = employee.merge(department, how = "inner", left_on ="departmentId", right_on = "id")
df["rank"] = employee.groupby("departmentId")["salary"].rank(method = "dense", ascending=False)
return( df.query("rank == 1")[["name_y","name_x","salary"]].rename(columns = {"name_y":"Department", "name_x":"Employee", "salary":"Salary"}))
这是错误的,因为groupby.rank
返回错误的结果-
id_x name_x salary departmentId id_y name_y rank
0 1 Joe 70000 1 1 IT 2.0
1 2 Jim 90000 1 1 IT 1.0
2 5 Max 90000 1 1 IT 1.0
3 3 Henry 80000 2 2 Sales 2.0
4 4 Sam 60000 2 2 Sales 1.0
为什么Henry
的排名是2,不应该是1吗?我到底做错了什么?