我正试着用winsorize个数据集.我在多个层面上做到了这一点.
第一个:我需要基于比率的窗口化,该比率基于TotalAsset(我的数据集中的一列).
FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'])
然后,我使用相同的代码(为了使用更少的内存)并将值提取为NumPy数组,然后应用Winsorazation(我需要删除top/Button 5%).
winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05])
然后我需要把它改回数据帧.错误实际上发生在这里.
pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'],axis=0).values,limits[0.05,0.05]),columns=item)
然后将它乘以*FirmMonthlyAccountingData['totalAssets']
,这样我就可以得到原始值.
Copy_of_firmmonthlydata[item]=pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'],axis=0).values,limits[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']
最后,我需要使用for循环为所有列执行此操作,以便尽可能地节省内存.
columns_to_winsorize= ['Mcap', 'first', 'second', 'third']
for item in columns_to_winsorize:
Copy_of_firmmonthlydata[item]=pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']
但我得到了这个错误
TypeError Traceback (most recent call last)
Cell In[27], line 10
3 columns_to_winsorize= ['Mcap', 'first', 'second']
9 for item in columns_to_winsorize:
---> 10 Copy_of_firmmonthlydata=pd.DataFrame(winsorize(FirmMonthlyAccountingData[f'{item}'].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']
File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\frame.py:722, in DataFrame.__init__(self, data, index, columns, dtype, copy)
720 # a masked array
721 data = sanitize_masked_array(data)
--> 722 mgr = ndarray_to_mgr(
723 data,
724 index,
725 columns,
726 dtype=dtype,
727 copy=copy,
728 typ=manager,
729 )
731 elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):
732 if data.dtype.names:
733 # i.e. numpy structured array
File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\internals\construction.py:333, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
324 values = sanitize_array(
325 values,
326 None,
(...)
329 allow_2d=True,
330 )
332 # _prep_ndarraylike ensures that values.ndim == 2 at this point
--> 333 index, columns = _get_axes(
334 values.shape[0], values.shape[1], index=index, columns=columns
335 )
337 _check_values_indices_shape_match(values, index, columns)
339 if typ == "array":
File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\internals\construction.py:738, in _get_axes(N, K, index, columns)
736 columns = default_index(K)
737 else:
--> 738 columns = ensure_index(columns)
739 return index, columns
...
5066 f"{cls.__name__}(...) must be called with a collection of some "
5067 f"kind, {repr(data)} was passed"
5068 )
TypeError: Index(...) must be called with a collection of some kind, 'Mcap' was passed
任何帮助都将不胜感激.