我的test.csv包含许多NaN:

"Time","Y1","Y2","Y3"
"s","celsius","celsius","celsius"
"0.193","","",""
"0.697","","1",""
"1.074","","","-27"
"1.579","10","",""
"2.083","","5",""
"3.123","15","","-28"
"5.003","","",""

当我try 使用Interpolate填充有效点之间的缺失数据时,它会用完整的整数填充它:

import pandas as pd
df = pd.read_csv("test.csv")
df.loc[1:, "Y3"] = pd.to_numeric(df.loc[1:, "Y3"])
df.loc[1:, "Y3"] =  df.loc[1:, "Y3"].interpolate(method='linear').ffill()  #method='time' , method='index'

>>> print (df)
    Time       Y1       Y2       Y3
0      s  celsius  celsius  celsius
1  0.193      NaN      NaN      NaN
2  0.697      NaN        1      NaN
3  1.074      NaN      NaN      -27
4  1.579       10      NaN      -27  <<-----
5  2.083      NaN        5      -27  <<-----
6  3.123       15      NaN      -28
7  5.003      NaN      NaN      -28

我可以用bill修复列开头的NaN,但是如何用-27.3、-27.6这样的小数值填充-27到-28之间的点呢?

推荐答案

问题是您在第一行有字符串.

df.loc[1:, "Y3"] = pd.to_numeric(df.loc[1:, "Y3"])不会将dtype更改为数字类型.

你不应该把标题作为一行,使用MultiIndex:

df = pd.read_csv("test.csv", header=[0, 1])

然后:

df['Y3'] = df['Y3'].interpolate(method='linear').ffill()

输出:

    Time      Y1      Y2         Y3
       s celsius celsius    celsius
0  0.193     NaN     NaN        NaN
1  0.697     NaN     1.0        NaN
2  1.074     NaN     NaN -27.000000
3  1.579    10.0     NaN -27.333333
4  2.083     NaN     5.0 -27.666667
5  3.123    15.0     NaN -28.000000
6  5.003     NaN     NaN -28.000000

Python相关问答推荐

Pandas 密集排名具有相同值,按顺序排列

双情节在单个图上切换-pPython

修剪Python框架中的尾随NaN值

使用decorator 重复超载

如何在telegram 机器人中发送音频?

使用Python Great Expectations和python-oracledb

Django注释:将时差转换为小数或小数

根据给定日期的状态过滤查询集

如何从具有多个嵌入选项卡的网页中Web抓取td类元素

运行回文查找器代码时发生错误:[类型错误:builtin_index_or_system对象不可订阅]

如何使用pandasDataFrames和scipy高度优化相关性计算

如何让剧作家等待Python中出现特定cookie(然后返回它)?

查找两极rame中组之间的所有差异

Pandas 都是(),但有一个门槛

使用@ guardlasses. guardlass和注释的Python继承

使用groupby方法移除公共子字符串

如何指定列数据类型

Python中的变量每次增加超过1

从Windows Python脚本在WSL上运行Linux应用程序

基于另一列的GROUP-BY聚合将列添加到Polars LazyFrame