我的DataFrame看起来像这样:

car_sales DataFrame

现在我try 使用以下命令:

car_sales['Price'] = car_sales['Price'].str.replace('[\$\,\.]', '').astype(int)

以及

car_sales['Price'] = car_sales['Price'].astype(str).str.replace('[\$\,\.]', '').astype(int)

但我得到以下错误:

ValueError                                Traceback (most recent call last)
Cell In[87], line 1
----> 1 car_sales['Price'] = car_sales['Price'].astype(str).str.replace('[\$\,\.]', '').astype(int)
      2 car_sales

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py:6324, in NDFrame.astype(self, dtype, copy, errors)
   6317     results = [
   6318         self.iloc[:, i].astype(dtype, copy=copy)
   6319         for i in range(len(self.columns))
   6320     ]
   6322 else:
   6323     # else, only a single dtype is given
-> 6324     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6325     return self._constructor(new_data).__finalize__(self, method="astype")
   6327 # GH 33113: handle empty frame or series

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\managers.py:451, in BaseBlockManager.astype(self, dtype, copy, errors)
    448 elif using_copy_on_write():
    449     copy = False
--> 451 return self.apply(
    452     "astype",
    453     dtype=dtype,
    454     copy=copy,
    455     errors=errors,
    456     using_cow=using_copy_on_write(),
    457 )

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\managers.py:352, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353     result_blocks = extend_blocks(applied, result_blocks)
    355 out = type(self).from_blocks(result_blocks, self.axes)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\blocks.py:511, in Block.astype(self, dtype, copy, errors, using_cow)
    491 """
    492 Coerce to the new dtype.
    493 
   (...)
    507 Block
    508 """
    509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    513 new_values = maybe_coerce_values(new_values)
    515 refs = None

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\dtypes\astype.py:242, in astype_array_safe(values, dtype, copy, errors)
    239     dtype = dtype.numpy_dtype
    241 try:
--> 242     new_values = astype_array(values, dtype, copy=copy)
    243 except (ValueError, TypeError):
    244     # e.g. _astype_nansafe can fail on object-dtype of strings
    245     #  trying to convert to float
    246     if errors == "ignore":

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\dtypes\astype.py:187, in astype_array(values, dtype, copy)
    184     values = values.astype(dtype, copy=copy)
    186 else:
--> 187     values = _astype_nansafe(values, dtype, copy=copy)
    189 # in pandas we don't store numpy str dtypes, so convert to object
    190 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\dtypes\astype.py:138, in _astype_nansafe(arr, dtype, copy, skipna)
    134     raise ValueError(msg)
    136 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
    137     # Explicit copy, or required since NumPy can't view from / to object.
--> 138     return arr.astype(dtype, copy=True)
    140 return arr.astype(dtype, copy=copy)

ValueError: invalid literal for int() with base 10: '$4,000.00'

推荐答案

您必须将regex=True设置为str.replace的参数.但是,为什么要go 掉小数点呢?转换为float而不是int,或同时链接或使用pd.to_numericdowncast.

car_sales['P1'] = car_sales['Price'].str.replace('[\$\,]', '', regex=True).astype(float)

car_sales['P2'] = car_sales['Price'].str.replace('[\$\,]', '', regex=True).astype(float).astype(int)

car_sales['P3'] = pd.to_numeric(car_sales['Price'].str.replace('[\$\,]', '', regex=True), downcast='integer')

输出:

>>> car_sales
       Price      P1    P2    P3
0  $4,000.00  4000.0  4000  4000
1  $5,000.00  5000.0  5000  5000

显然,如果您还想像@wjandrea所描述的那样删除小数点,则只需修改regex:

car_sales['P4'] = pd.to_numeric(car_sales['Price'].str.replace('[\$\,\.]', '', regex=True), downcast='integer')

输出:

>>> car_sales
       Price      P1    P2    P3      P4
0  $4,000.00  4000.0  4000  4000  400000
1  $5,000.00  5000.0  5000  5000  500000

Python相关问答推荐

回归回溯-2D数组中的单词搜索

在Arrow上迭代的快速方法.Julia中包含3000万行和25列的表

无法使用equals_html从网址获取全文

配置Sweetviz以分析对象类型列,而无需转换

如何在msgraph.GraphServiceClient上进行身份验证?

即使在可见的情况下也不相互作用

替换字符串中的多个重叠子字符串

韦尔福德方差与Numpy方差不同

为什么我的Python代码在if-else声明中的行之前执行if-else声明中的行?

对某些列的总数进行民意调查,但不单独列出每列

删除最后一个pip安装的包

如何避免Chained when/then分配中的Mypy不兼容类型警告?

. str.替换pandas.series的方法未按预期工作

从dict的列中分钟

加速Python循环

如何从数据库上传数据到html?

如何使用Pandas DataFrame按日期和项目汇总计数作为列标题

numpy.unique如何消除重复列?

Gunicorn无法启动Flask应用,因为无法将应用解析为属性名或函数调用.'"'' "

如何在海上配对图中使某些标记周围的黑色边框