我还是个新手,我正试着把列表理解(S.Molin的Pandas数据分析)转换成一个"普通"的for循环,只是为了练习.
最初,数据来自CSV文件,并使用Numpy加载.结果是将每个CSV行作为单个数组(VOID类型),如下所示:
array([('2018-10-13 11:10:23.560', '262km NW of Ozernovskiy, Russia', 'mww', 6.7, 'green', 1), ('2018-10-13 04:34:15.580', '25km E of Bitung, Indonesia', 'mww', 5.2, 'green', 0), ('2018-10-13 00:13:46.220', '42km WNW of Sola, Vanuatu', 'mww', 5.7, 'green', 0), ('2018-10-12 21:09:49.240', '13km E of Nueva Concepcion, Guatemala', 'mww', 5.7, 'green', 0), ('2018-10-12 02:52:03.620', '128km SE of Kimbe, Papua New Guinea', 'mww', 5.6, 'green', 1)], dtype=[('time', '<U23'), ('place', '<U37'), ('magType', '<U3'), ('mag', '<f8'), ('alert', '<U5'), ('tsunami', '<i4')])个
我try 修改它,以便将每一列作为值的数组获得,其键是列的名称:
{'time': array(['2018-10-13 11:10:23.560', '2018-10-13 04:34:15.580','2018-10-13 00:13:46.220', '2018-10-12 21:09:49.240', '2018-10-12 02:52:03.620'], dtype='<U23'), 'place': array(['262km NW of Ozernovskiy, Russia', '25km E of Bitung, Indonesia', '42km WNW of Sola, Vanuatu','13km E of Nueva Concepcion, Guatemala','128km SE of Kimbe, Papua New Guinea'], dtype='<U37'), 'magType': array(['mww', 'mww', 'mww', 'mww', 'mww'], dtype='<U3'), 'mag': array([6.7, 5.2, 5.7, 5.7, 5.6]), 'alert': array(['green', 'green', 'green', 'green', 'green'], dtype='<U5'), 'tsunami': array([1, 0, 0, 0, 1])}个
用于此目的的列表理解为:
array_dict = {col: np.array([row[i] for row in data]) for i, col in enumerate(data.dtype.names)}
到目前为止,我得到的解决方案是:
d ={}
for i,col in enumerate(data.dtype.names):
for row in data:
d[col].append(row[i])
我得到以下错误:
*---------
KeyError Traceback (most recent call last)
Input In [51], in <cell line: 2>()
2 for i,col in enumerate(data.dtype.names):
3 for row in data:
----> 4 d[col].append(row[i])
KeyError: 'time'*
我在网上做了一些调查,这可能与数据类型列"Time"有关.我的猜测是,在列表理解中,每一列都被直接创建为NumPy数组,而在这里,我并没有事先将其设置为NumPy数组(因此数据类型的问题),但我确信我是错的.
任何帮助都将不胜感激.非常感谢!