Python 在Polars中， Select 所有以Pattern结尾的列，并添加不带Pattern的新列

发布于02月24日

我有以下数据框:

import polars as pl
import numpy as np

df = pl.DataFrame({
    "nrs": [1, 2, 3, None, 5],
    "names_A0": ["foo", "ham", "spam", "egg", None],
    "random_A0": np.random.rand(5),
    "A_A2": [True, True, False, False, False],
})
digit = 0

对于每个名称以字符串suf =f'_A{digit}'结尾的列X，我想将一个相同的列添加到df，其名称与X相同，但没有suf.

在本例中，我需要向原始数据帧df添加列names和random，其内容分别与列names_A0和random_A0相同.

字符串后缀操作

此方法使用

- polars.selectors.ends_with # find columns ending with string
- string.removesuffix        # remove suffix from end of string

正在翻译为

import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names_A0": ["foo", "ham", "spam", "egg", None],
        "random_A0": np.random.rand(5),
        "A_A2": [True, True, False, False, False],
    }
)
digit = 0
suffix = f'_A{digit}'

print(
    # keep original A0 columns
    df.with_columns(
        cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
    ),
    # shape: (5, 6)
    # ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
    # │ nrs  ┆ names_A0 ┆ random_A0 ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---      ┆ ---       ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ str      ┆ f64       ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ foo      ┆ 0.713324  ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ ham      ┆ 0.980031  ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ spam     ┆ 0.242768  ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ egg      ┆ 0.528783  ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ null     ┆ 0.583206  ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴──────────┴───────────┴───────┴───────┴──────────┘


    # drop original A0 columns
    df.select(
        ~cs.ends_with(suffix),
        cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
    ),
    # shape: (5, 4)
    # ┌──────┬───────┬───────┬──────────┐
    # │ nrs  ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴───────┴───────┴──────────┘

    sep='\n\n'
)

正则表达式

或者，您可以使用正则表达式来检测一系列后缀模式

- polars.selectors.matches  # find columns matching a pattern
- re.sub                    # substitute in string based on pattern

我们需要确保我们的模式以'$'结束，以锚定模式到字符串的末尾.

import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names_A0": ["foo", "ham", "spam", "egg", None],
        "random_A0": np.random.rand(5),
        "A_A2": [True, True, False, False, False],
    }
)
digit=0
suffix = fr'_A{digit}$'

print(
    # keep original A0 columns
    df.with_columns(
        cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
    ),
    # shape: (5, 6)
    # ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
    # │ nrs  ┆ names_A0 ┆ random_A0 ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---      ┆ ---       ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ str      ┆ f64       ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ foo      ┆ 0.713324  ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ ham      ┆ 0.980031  ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ spam     ┆ 0.242768  ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ egg      ┆ 0.528783  ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ null     ┆ 0.583206  ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴──────────┴───────────┴───────┴───────┴──────────┘


    # drop original A0 columns
    df.select(
        ~cs.matches(suffix),
        cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
    ),
    # shape: (5, 4)
    # ┌──────┬───────┬───────┬──────────┐
    # │ nrs  ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴───────┴───────┴──────────┘

    sep='\n\n'
)

Python 在Polars中， Select 所有以Pattern结尾的列，并添加不带Pattern的新列

推荐答案

字符串后缀操作

正则表达式

Python相关问答推荐

Pandas 第二小值有条件

DataFrame groupby函数从列返回数组而不是值

试图找到Python方法来部分填充numpy数组

如何在Python中将returns.context. DeliverresContext与Deliverc函数一起使用？

对于一个给定的数字，找出一个整数的最小和最大可能的和

如果值不存在，列表理解返回列表

Polars：用氨纶的其他部分替换氨纶的部分

删除字符串中第一次出现单词后的所有内容

如何请求使用Python将文件下载到带有登录名的门户网站？

OR—Tools中CP—SAT求解器的IntVar设置值

如何保持服务器发送的事件连接活动？

让函数调用方程

在两极中过滤

matplotlib + python foor loop

干燥化与列姆化的比较

有没有办法让Re.Sub报告它所做的每一次替换？

对于数组中的所有元素，Pandas SELECT行都具有值

合并Pandas中的数据帧，但处理不存在的列

极地数据帧：ROLING_SUM向前看

是否从Python调用SHGetKnownFolderPath？