我有一个CSV格式的数据:

"/some/page-1.md","title","My title 1"
"/some/page-1.md","description","My description 1"
"/some/page-1.md","type","Tutorial"
"/some/page-1.md","index","True"
"/some/page-2.md","title","My title 2"
"/some/page-2.md","description","My description 2"
"/some/page-2.md","type","Tutorial"
"/some/page-2.md","index","False"
"/some/page-2.md","custom_1","abc"
"/some/page-3.md","title","My title 3"
"/some/page-3.md","description","My description 3"
"/some/page-3.md","type","Tutorial"
"/some/page-3.md","index","True"
"/some/page-3.md","custom_2","def"

我正在读Pandas DataFrame:

df = pd.read_csv(csvFile, index_col=False, dtype=object, header=None)
print(df)

输出如下:

                  0            1                 2
0   /some/page-1.md        title        My title 1
1   /some/page-1.md  description  My description 1
2   /some/page-1.md         type          Tutorial
3   /some/page-1.md        index              True
4   /some/page-2.md        title        My title 2
5   /some/page-2.md  description  My description 2
6   /some/page-2.md         type          Tutorial
7   /some/page-2.md        index             False
8   /some/page-2.md     custom_1               abc
9   /some/page-3.md        title        My title 3
10  /some/page-3.md  description  My description 3
11  /some/page-3.md         type          Tutorial
12  /some/page-3.md        index              True
13  /some/page-3.md     custom_2               def

我想把它转换成下面的格式的DataFrame,其中第一个头是"file",值来自第0列.其他标题取自第1列,值取自第2列:

              file       title       description      type  index  custom_1  custom_2
0  /some/page-1.md  My title 1  My description 1  Tutorial   True       NaN       NaN
1  /some/page-2.md  My title 2  My description 2  Tutorial  False       abc       NaN
2  /some/page-3.md  My title 3  My description 3  Tutorial   True       NaN       def

有没有一种方法可以用Pandas 做这个?

推荐答案

我已将您的第一列名称更改为file、header和value.所以,可以轻松地处理你想要的.你需要使用pivot_table个方法来达到你的目标.最后的代码如下所示.

df = pd.DataFrame(data, columns=["file", "header", "value"])


result = df.pivot_table(index='file', columns='header', values='value', aggfunc='first').reset_index()

result = result[result.index.notna()]

你的输出将是这样的.因此,我们需要删除"标题"标签.

header             file custom_1 custom_2       description  index       title      type
0       /some/page-1.md      NaN      NaN  My description 1   True  My title 1  Tutorial
1       /some/page-2.md      abc      NaN  My description 2  False  My title 2  Tutorial
2       /some/page-3.md      NaN      def  My description 3   True  My title 3  Tutorial

为了删除"页眉"标签,您需要使用用途:

result.columns.name = None

最终的输出将是

              file custom_1 custom_2       description  index       title      type
0  /some/page-1.md      NaN      NaN  My description 1   True  My title 1  Tutorial
1  /some/page-2.md      abc      NaN  My description 2  False  My title 2  Tutorial
2  /some/page-3.md      NaN      def  My description 3   True  My title 3  Tutorial

Python相关问答推荐

Polars比较了两个预设-有没有方法在第一次不匹配时立即失败

Deliveryter Notebook -无法在for循环中更新matplotlib情节(保留之前的情节),也无法使用动画子功能对情节进行动画

对某些列的总数进行民意调查,但不单独列出每列

需要计算60,000个坐标之间的距离

如何在solve()之后获得症状上的等式的值

如何将多进程池声明为变量并将其导入到另一个Python文件

Geopandas未返回正确的缓冲区(单位:米)

如何杀死一个进程,我的Python可执行文件以sudo启动?

如何防止Pandas将索引标为周期?

ruamel.yaml dump:如何阻止map标量值被移动到一个新的缩进行?

如何从pandas DataFrame中获取. groupby()和. agg()之后的子列?

freq = inject在pandas中做了什么?''它与freq = D有什么不同?''

在我融化极点数据帧之后,我如何在不添加索引的情况下将其旋转回其原始形式?

如何使用大量常量优化代码?

如何根据一定条件生成段id

多索引数据帧到标准索引DF

在聚合中使用python-polars时如何计算模式

Python:在cmd中添加参数时的语法

如何使用Polars从AWS S3读取镶木地板文件

将鼠标悬停在海运`pairplot`的批注/高亮显示上