我正在try 整理以下列表:

我想将连接的单词分开,但似乎它们不会正确地使用代表首字母缩略词的大写单词(例如PVP、MMORPG、MOBA、DeFi).

目前,我的正则表达式代码如下:

重新.类型列表中ele的子(r"(\w)([A-Z]),r"\1\2",ele)

正如你在下面看到的,它有时有效,有时无效:

[‘可Collection 的开放世界虚拟世界’、‘育种卡PV P’、‘自动战斗者育种策略’、‘迷你游戏开放世界虚拟世界’、‘动作模拟运动’、‘冒险MM OStrategy’、‘冒险休闲拼图’、‘运动’、‘可Collection 的科幻虚拟世界’、‘战斗皇室成员体育MO BA’、‘行动PV PShooter’、‘P VP科幻塔防御’、‘行动战斗皇室’、‘P VP科幻射手’、‘育种’Collection 品采矿、Collection 品德菲体育、动作冒险射击、城市建筑Collection 品模拟、动作策略、冒险开放世界、繁殖赛车运动、开放世界虚拟世界、Collection 品闲置、动作冒险、卡片Collection 品PV P、战斗皇家幻想MO-BA、城市建筑、建筑MM OStrategy、冒险MM或PG、动作冒险闲置、,"M OB AR PG Strategy"、"M MO RP GStrategy"、"卡片可Collection 空闲"、"开放世界PV PR PG"、"德菲MM OSpace"、"可Collection "、"卡片可Collection PV P"、"自动战车De Fi RP G"、"冒险MM OOpen World"、"可Collection 开放世界虚拟世界"、"可Collection 空闲RP G"、"卡片可Collection PV P"、"动作冒险PV P"、"科幻射手生存"、"动作策略"、"街机小游戏","繁殖PV练习"、"M OB AP VP"、"动作体育"、"P VP基于空间转弯"、"M MO战略塔防御"]

你能帮我看看哪个正则表达式在这方面做得最好吗?或者regex不适用于这个列表?谢谢

推荐答案

这很难,因为你有可能粘在一起的单词.如果你有这样一个列表,它是可解的.

下面是一段代码,您可以使用并增强它,以获得更好的输出精度:

import re
l = ['Collectible Open-World Virtual-World', 'Breeding Card PV P', 'Auto-Battler Breeding Strategy', 'Minigame Open-World Virtual-World', 'Action Simulation Sports', 'Adventure MM OStrategy', 'Adventure Casual Puzzle', 'Sports', 'Collectible Sci-Fi Virtual-World', 'Battle-Royalee Sports MO BA', 'Action PV PShooter', 'P VP Sci-Fi Tower-Defense', 'Action Battle-Royale', 'P VP Sci-Fi Shooter', 'Breeding Collectible Mining', 'Collectible De Fie Sports', 'Action Adventure Shooter', 'City-Building Collectible Simulation', 'Action Strategy', 'Adventure Open-World', 'Breeding Racing Sports', 'Open-World Virtual-World', 'Collectible Idle', 'Action Adventure', 'Card Collectible PV P', 'Battle-Royale Fantasy MO BA', 'City-Building', 'Building MM OStrategy', 'Adventure MM OR PG', 'Action Adventure Idle', 'M OB AR PG Strategy', 'M MO RP GStrategy', 'Card Collectible Idle', 'Open-World PV PR PG', 'De Fi MM OSpace', 'Collectible', 'Card Collectible PV P', 'Auto-Battler De Fi RP G', 'Adventure MM OOpen-World', 'Collectible Open-World Virtual-World', 'Collectible Idle RP G', 'Card Collectible PV P', 'Action Adventure PV P', 'Sci-Fi Shooter Survival', 'Action Strategy', 'Arcade Minigame', 'Breeding PV PRacing', 'M OB AP VP', 'Action Sports', 'P VP Space Turn-based', 'M MO Strategy Tower-Defense']
l = [''.join(s.split()) for s in l]
allcaps = ['RPG', 'MOBA', 'PVP', 'MMO']
rx_1 = re.compile(r'[a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z])')
rx_2 = re.compile( fr"\b(?:{r'|'.join(allcaps)})(?=[A-Za-z])" )
rx_3 = re.compile( fr"(?<=[A-Za-z])(?:{r'|'.join(allcaps)})\b" )
for s in l:
    print( r'{} => {}'.format(s, rx_3.sub(r" \g<0>", rx_2.sub(r"\g<0> ", rx_1.sub(r"\g<0> ", s)))) )

Python demo.输出:

CollectibleOpen-WorldVirtual-World => Collectible Open-World Virtual-World
BreedingCardPVP => Breeding Card PVP
Auto-BattlerBreedingStrategy => Auto-Battler Breeding Strategy
MinigameOpen-WorldVirtual-World => Minigame Open-World Virtual-World
ActionSimulationSports => Action Simulation Sports
AdventureMMOStrategy => Adventure MMO Strategy
AdventureCasualPuzzle => Adventure Casual Puzzle
Sports => Sports
CollectibleSci-FiVirtual-World => Collectible Sci-Fi Virtual-World
Battle-RoyaleeSportsMOBA => Battle-Royalee Sports MOBA
ActionPVPShooter => Action PVP Shooter
PVPSci-FiTower-Defense => PVP Sci-Fi Tower-Defense
ActionBattle-Royale => Action Battle-Royale
PVPSci-FiShooter => PVP Sci-Fi Shooter
BreedingCollectibleMining => Breeding Collectible Mining
CollectibleDeFieSports => Collectible De Fie Sports
ActionAdventureShooter => Action Adventure Shooter
City-BuildingCollectibleSimulation => City-Building Collectible Simulation
ActionStrategy => Action Strategy
AdventureOpen-World => Adventure Open-World
BreedingRacingSports => Breeding Racing Sports
Open-WorldVirtual-World => Open-World Virtual-World
CollectibleIdle => Collectible Idle
ActionAdventure => Action Adventure
CardCollectiblePVP => Card Collectible PVP
Battle-RoyaleFantasyMOBA => Battle-Royale Fantasy MOBA
City-Building => City-Building
BuildingMMOStrategy => Building MMO Strategy
AdventureMMORPG => Adventure MMO RPG
ActionAdventureIdle => Action Adventure Idle
MOBARPGStrategy => MOBA RPG Strategy
MMORPGStrategy => MMO RPG Strategy
CardCollectibleIdle => Card Collectible Idle
Open-WorldPVPRPG => Open-World PVP RPG
DeFiMMOSpace => De Fi MMO Space
Collectible => Collectible
CardCollectiblePVP => Card Collectible PVP
Auto-BattlerDeFiRPG => Auto-Battler De Fi RPG
AdventureMMOOpen-World => Adventure MMO Open-World
CollectibleOpen-WorldVirtual-World => Collectible Open-World Virtual-World
CollectibleIdleRPG => Collectible Idle RPG
CardCollectiblePVP => Card Collectible PVP
ActionAdventurePVP => Action Adventure PVP
Sci-FiShooterSurvival => Sci-Fi Shooter Survival
ActionStrategy => Action Strategy
ArcadeMinigame => Arcade Minigame
BreedingPVPRacing => Breeding PVP Racing
MOBAPVP => MOBA PVP
ActionSports => Action Sports
PVPSpaceTurn-based => PVP Space Turn-based
MMOStrategyTower-Defense => MMO Strategy Tower-Defense

[a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z])正则表达式(见its demo)匹配

  • [a-z](?=[A-Z])-紧跟着大写字母的小写字母
  • |-或
  • [A-Z](?=[A-Z][a-z])-大写字母后跟大写字母和小写字母.

我们在这些匹配之后添加一个空格.

rx_2rx_3个正则表达式是从ALLCAPS单词列表中构建的,并根据另一个字母出现在哪一侧,在左侧或右侧添加一个空格.

Python相关问答推荐

使用Beautiful Soup获取第二个srcset属性

如何使用bs 4从元素中提取文本

将numpy矩阵映射到字符串矩阵

抓取rotowire MLB球员新闻并使用Python形成表格

如何从具有不同len的列表字典中创建摘要表?

我们可以为Flask模型中的id字段主键设置默认uuid吗

对象的`__call__`方法的setattr在Python中不起作用'

优化器的运行顺序影响PyTorch中的预测

在含噪声的3D点网格中识别4连通点模式

导入...从...混乱

如何在图中标记平均点?

使用BeautifulSoup抓取所有链接

如何在Pyplot表中舍入值

python panda ExcelWriter切换动态公式到数组公式

Maya Python脚本将纹理应用于所有对象,而不是选定对象

Polars map_使用多处理对UDF进行批处理

GPT python SDK引入了大量开销/错误超时

在电影中向西北方向对齐""

需要帮助使用Python中的Google的People API更新联系人的多个字段'

利用SCIPY沿第一轴对数组进行内插