我有一个包含"name"列的数据集US Baby Names.尽管这样做没有多大意义,但我正在努力寻找该专栏中的中位数名称. 也就是说,将名字按升序排列后,根据名字的频率,将有一个"中间值",这就是我想要找到的,而不必对整个列(Pandas Series)进行实际排序,然后找到中间最大的名字.因此,我需要一种简单的内置方法来查找中间名称.
~*~
编辑[协调世界时(UTC)5:51 ]:名称的中位数应基于名称的字母/词典顺序. 此外,下面是CSV文件的一部分(第一行是标题):
,Id,Name,Year,Gender,State,Count
11349,11350,Emma,2004,F,AK,62
11350,11351,Madison,2004,F,AK,48
11351,11352,Hannah,2004,F,AK,46
11352,11353,Grace,2004,F,AK,44
11353,11354,Emily,2004,F,AK,41
11354,11355,Abigail,2004,F,AK,37
~*~
我try 了内置的PANDA Medium()方法,但它对非数字值并不真正有效,尽管将numeric_only
属性设置为False
:
import pandas as pd
baby_names = pd.read_csv(
"Pandas_DataMart\\DataMart\\06_Stats\\US_Baby_Names\\US Baby Names.xlsx")
print(baby_names['Name'].median(numeric_only=False))
在Midate()方法的内部工作过程中,有一系列错误行,但最终我得到的结果是:
TypeError: could not convert string to float: 'Emma'
因此,似乎不适用于非数字值.还是我做错了什么?
以下是完整的错误消息以供参考:
Traceback (most recent call last):
File "C:\Users\JohnDoe\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\nanops.py", line 720, in nanmedian
values = values.astype("f8")
ValueError: could not convert string to float: 'Emma'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\...\BabyNames.py", line 18, in <module>
print(baby_names['Name'].median(numeric_only=False))
File "C:\Users\JohnDoe\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\generic.py", line 10802, in median
return NDFrame.median(self, axis, skipna, level, numeric_only, **kwargs)
File "C:\Users\JohnDoe\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\generic.py", line 10374, in median
return self._stat_function(
File "C:\Users\JohnDoe\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\generic.py", line 10354, in _stat_function
return self._reduce(
File "C:\Users\JohnDoe\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\series.py", line 4392, in _reduce
return op(delegate, skipna=skipna, **kwds)
File "C:\Users\JohnDoe\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\nanops.py", line 156, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
File "C:\Users\JohnDoe\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\nanops.py", line 723, in nanmedian
raise TypeError(str(err)) from err
TypeError: could not convert string to float: 'Emma'