早上好

我正在编写以下代码:

df1 = pd.DataFrame()
df1['Date'] = ["29/07/2021", "29/07/2021", "29/07/2021", "29/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "31/07/2021", "31/07/2021", "01/08/2021", "01/08/2021", "02/08/2021", "02/08/2021"]
df1['Time'] = ["06:48:00", "06:48:00", "06:56:00", "06:56:00", "07:14:00", "07:14:00", "07:40:00", "07:40:00", "08:42:00", "08:42:00", "08:52:00", "08:52:00", "09:07:00", "09:07:00"]
df1["Column1"] = ['NaN', 'NaN', 0.038807581, 0.018807581, 0.025931434, 0.025163517, 0.026561283, 0.027743659, 0.028854, 0.000383506, 0.000543031, 0.000342, 'NaN', 'NaN']
df1["Column2"] = [0.000270475, 0.000313769,  'NaN', 'NaN', 0.000483506, 0.000643031,  0.000533131,  0.000543031,  0.000342, 0.056263517, 0.042163517, 0.035163517, 0.025163517, 0.026363517]
df2 = pd.DataFrame()
df2['Date'] = ["29/07/2021", "29/07/2021", "29/07/2021", "29/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "31/07/2021", "31/07/2021", "01/08/2021", "01/08/2021", "02/08/2021", "02/08/2021"]
df2['Time'] = ["06:48:00", "06:48:00", "06:56:00", "06:56:00", "07:14:00", "07:14:00", "07:40:00", "07:40:00", "08:42:00", "08:42:00", "08:52:00", "08:52:00", "09:07:00", "09:07:00"]
df2["Column1"] = [0.041807581, 0.019607581, 'NaN', 'NaN', 0.025931434, 0.025163517, 0.026561283, 0.027743659, 0.028854, 0.000383506, 'NaN', 'NaN', 0.000313769, 0.000413769]
df2["Column2"] = [0.000270475, 0.000313769,  0.000383506,  0.000583506, 'NaN', 'NaN',  0.000533131,  0.000543031, 'NaN', 'NaN', 0.042163517, 0.035163517, 0.025163517, 0.026363517]
diff_df = pd.concat([df1, df2]).drop_duplicates().reset_index(drop=True)

输出如下:

+------------+----------+-------------+-------------+
|    Date    |   Time   |   Column1   |   Column2   |
+------------+----------+-------------+-------------+
| 29/07/2021 | 06:48:00 | NaN         | 0,000270475 |
| 29/07/2021 | 06:48:00 | NaN         | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,038807581 | NaN         |
| 29/07/2021 | 06:56:00 | 0,018807581 | NaN         |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,000483506 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,000643031 |
| 30/07/2021 | 07:40:00 | 0,026561283 | 0,000533131 |
| 30/07/2021 | 07:40:00 | 0,027743659 | 0,000543031 |
| 31/07/2021 | 08:42:00 | 0,028854    | 0,000342    |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,056263517 |
| 01/08/2021 | 08:52:00 | 0,000543031 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,000342    | 0,035163517 |
| 02/08/2021 | 09:07:00 | NaN         | 0,025163517 |
| 02/08/2021 | 09:07:00 | NaN         | 0,026363517 |
| 29/07/2021 | 06:48:00 | 0,041807581 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,019607581 | 0,000313769 |
| 29/07/2021 | 06:56:00 | NaN         | 0,000383506 |
| 29/07/2021 | 06:56:00 | NaN         | 0,000583506 |
| 30/07/2021 | 07:14:00 | 0,025931434 | NaN         |
| 30/07/2021 | 07:14:00 | 0,025163517 | NaN         |
| 31/07/2021 | 08:42:00 | 0,028854    | NaN         |
| 31/07/2021 | 08:42:00 | 0,000383506 | NaN         |
| 01/08/2021 | 08:52:00 | NaN         | 0,042163517 |
| 01/08/2021 | 08:52:00 | NaN         | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,000313769 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,000413769 | 0,026363517 |
+------------+----------+-------------+-------------+

我需要的是,如果有一个数据值的Na值,考虑到它们具有相同的日期和时间,它将考虑使用另一个值.

+------------+----------+-------------+-------------+
|    Date    |   Time   |   Column1   |   Column2   |
+------------+----------+-------------+-------------+
| 29/07/2021 | 06:48:00 | 0,041807581 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,019607581 | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,038807581 | 0,000383506 |
| 29/07/2021 | 06:56:00 | 0,018807581 | 0,000583506 |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,000483506 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,000643031 |
| 30/07/2021 | 07:40:00 | 0,026561283 | 0,000533131 |
| 30/07/2021 | 07:40:00 | 0,027743659 | 0,000543031 |
| 31/07/2021 | 08:42:00 | 0,028854    | 0,000342    |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,056263517 |
| 01/08/2021 | 08:52:00 | 0,000543031 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,000342    | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,000313769 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,000413769 | 0,026363517 |
+------------+----------+-------------+-------------+

感谢您抽出时间,祝您度过愉快的一天!

EDIT########

diff_df.Column1 = diff_df.Column1.fillna(diff_df.Column2)
diff_df.Column2 = diff_df.Column2.fillna(diff_df.Column1)

将提供我不需要的输出:

+------------+----------+-------------+-------------+
|    Date    |   Time   |   Column1   |   Column2   |
+------------+----------+-------------+-------------+
| 29/07/2021 | 06:48:00 | 0,000270475 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,000313769 | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,038807581 | 0,038807581 |
| 29/07/2021 | 06:56:00 | 0,018807581 | 0,018807581 |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,000483506 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,000643031 |
| 30/07/2021 | 07:40:00 | 0,026561283 | 0,000533131 |
| 30/07/2021 | 07:40:00 | 0,027743659 | 0,000543031 |
| 31/07/2021 | 08:42:00 | 0,028854    | 0,000342    |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,056263517 |
| 01/08/2021 | 08:52:00 | 0,000543031 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,000342    | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,025163517 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,026363517 | 0,026363517 |
| 29/07/2021 | 06:48:00 | 0,041807581 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,019607581 | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,000383506 | 0,000383506 |
| 29/07/2021 | 06:56:00 | 0,000583506 | 0,000583506 |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,025931434 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,025163517 |
| 31/07/2021 | 08:42:00 | 0,028854    | 0,028854    |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,000383506 |
| 01/08/2021 | 08:52:00 | 0,042163517 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,035163517 | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,000313769 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,000413769 | 0,026363517 |
+------------+----------+-------------+-------------+

推荐答案

如果可能的话,使用DataFrame.fillna将值转换为索引不同的日期时间:

df1 = df1.replace('NaN', np.nan)
df2 = df2.replace('NaN', np.nan)

df = df1.set_index(['Date','Time']).fillna(df2.set_index(['Date','Time'])).reset_index()
print (df)

          Date      Time   Column1   Column2
0   29/07/2021  06:48:00  0.041808  0.000270
1   29/07/2021  06:48:00  0.019608  0.000314
2   29/07/2021  06:56:00  0.038808  0.000384
3   29/07/2021  06:56:00  0.018808  0.000584
4   30/07/2021  07:14:00  0.025931  0.000484
5   30/07/2021  07:14:00  0.025164  0.000643
6   30/07/2021  07:40:00  0.026561  0.000533
7   30/07/2021  07:40:00  0.027744  0.000543
8   31/07/2021  08:42:00  0.028854  0.000342
9   31/07/2021  08:42:00  0.000384  0.056264
10  01/08/2021  08:52:00  0.000543  0.042164
11  01/08/2021  08:52:00  0.000342  0.035164
12  02/08/2021  09:07:00  0.000314  0.025164
13  02/08/2021  09:07:00  0.000414  0.026364

如果两个数据帧之间始终有相同的索引和相同的行:

df1 = df1.replace('NaN', np.nan)
df2 = df2.replace('NaN', np.nan)

df = df1.fillna(df2)
print (df)
          Date      Time   Column1   Column2
0   29/07/2021  06:48:00  0.041808  0.000270
1   29/07/2021  06:48:00  0.019608  0.000314
2   29/07/2021  06:56:00  0.038808  0.000384
3   29/07/2021  06:56:00  0.018808  0.000584
4   30/07/2021  07:14:00  0.025931  0.000484
5   30/07/2021  07:14:00  0.025164  0.000643
6   30/07/2021  07:40:00  0.026561  0.000533
7   30/07/2021  07:40:00  0.027744  0.000543
8   31/07/2021  08:42:00  0.028854  0.000342
9   31/07/2021  08:42:00  0.000384  0.056264
10  01/08/2021  08:52:00  0.000543  0.042164
11  01/08/2021  08:52:00  0.000342  0.035164
12  02/08/2021  09:07:00  0.000314  0.025164
13  02/08/2021  09:07:00  0.000414  0.026364

Python相关问答推荐

Django文件上传不起作用:文件未出现在媒体目录或数据库中

我可以使用极点优化这个面向cpu的pandas代码吗?

了解shuffle在NP.random.Generator.choice()中的作用

绘制系列时如何反转轴?

由于瓶颈,Python代码执行太慢-寻求性能优化

数字梯度的意外值

拆分pandas列并创建包含这些拆分值计数的新列

Polars:使用列值引用when / then表达中的其他列

跟踪我已从数组中 Select 的样本的最有效方法

如果索引不存在,pandas系列将通过索引获取值,并填充值

Django mysql图标不适用于小 case

带条件计算最小值

ModuleNotFound错误:没有名为flags.State的模块; flags不是包

Stacked bar chart from billrame

driver. find_element无法通过class_name找到元素'""

如何在Python中找到线性依赖mod 2

转换为浮点,pandas字符串列,混合千和十进制分隔符

LocaleError:模块keras._' tf_keras. keras没有属性__internal_'''

OpenCV轮廓.很难找到给定图像的所需轮廓

如果有2个或3个,则从pandas列中删除空格