我在一个文本文件中有一些超链接.我想将第一行上的链接与下一个相邻行进行比较,并按编号创建链接. 例如,

请考虑以下相邻链接

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/4/

这里的输出文件将是:

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/2/
https://gp.to/ab/394/las69-02-09-2020/3/
https://gp.to/ab/394/las69-02-09-2020/4/

同样地,我也需要为其他线路做……

示例输入:

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/4/
https://gp.to/ab/563/dimp-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/3/
https://gp.to/ab/39443/lis-22-04-2018/
https://gp.to/ab/39443/lis-22-04-2018/2/
https://gp.to/ab/39443/madi-22-04-2018/
https://gp.to/ab/39443/madi-22-04-2018/5/

输出示例:

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/2/
https://gp.to/ab/394/las69-02-09-2020/3/
https://gp.to/ab/394/las69-02-09-2020/4/
https://gp.to/ab/563/dimp-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/2/
https://gp.to/ab/39443/omegs-02-07-2023/3/
https://gp.to/ab/39443/lis-22-04-2018/
https://gp.to/ab/39443/lis-22-04-2018/2/
https://gp.to/ab/39443/madi-22-04-2018/
https://gp.to/ab/39443/madi-22-04-2018/2/
https://gp.to/ab/39443/madi-22-04-2018/3/
https://gp.to/ab/39443/madi-22-04-2018/4/
https://gp.to/ab/39443/madi-22-04-2018/5/

我试过...

# Function to extract the number from a URL
def extract_number(url):
    parts = url.split('/')
    for part in parts[::-1]:
        if part.isdigit():
            return int(part)
    return None

# Read the input file
with open('input.txt', 'r') as input_file:
    lines = input_file.readlines()

output_lines = []

# Iterate through the input lines and generate output lines
for i in range(len(lines)):
    current_url = lines[i].strip()
    output_lines.append(current_url)

    if i + 1 < len(lines):
        next_url = lines[i + 1].strip()
        current_number = extract_number(current_url)
        next_number = extract_number(next_url)

        if current_number is not None and next_number is not None:
            for num in range(current_number + 1, next_number):
                new_url = current_url.rsplit('/', 1)[0] + '/' + str(num) + '/'
                output_lines.append(new_url)

# Write the output to a file
with open('output.txt', 'w') as output_file:
    output_file.writelines(output_lines)

但我没有得到想要的输出.

推荐答案

这是另一种 Select .它假定顺序是相关的,因此如果发现相同的URL(或其基数,不包括尾随数字)具有中间不同的URL,则它将是"新的".

library(dplyr)
tibble(url=vec) %>%
  mutate(
    urlbase = sub("/\\d+/?$", "/", url),
    num = as.integer(sub("/$", "", stringr::str_extract(url, "(?<=/)(\\d+)/?$"))),
    grp = consecutive_id(urlbase)
  ) %>%
  group_by(grp, urlbase) %>%
  mutate(
    num = if (n() > 1) coalesce(num, row_number()) else num
  ) %>%
  reframe(
    num = seq.int(max(coalesce(num, 1L))),
    url = paste0(urlbase, if_else(num == 1L, "", paste0(as.character(num), "/")))
  )
# # A tibble: 15 × 4
#      grp urlbase                                    num url                                       
#    <int> <chr>                                    <int> <chr>                                     
#  1     1 https://gp.to/ab/394/las69-02-09-2020/       1 https://gp.to/ab/394/las69-02-09-2020/    
#  2     1 https://gp.to/ab/394/las69-02-09-2020/       2 https://gp.to/ab/394/las69-02-09-2020/2/  
#  3     1 https://gp.to/ab/394/las69-02-09-2020/       3 https://gp.to/ab/394/las69-02-09-2020/3/  
#  4     1 https://gp.to/ab/394/las69-02-09-2020/       4 https://gp.to/ab/394/las69-02-09-2020/4/  
#  5     2 https://gp.to/ab/563/dimp-02-07-2023/        1 https://gp.to/ab/563/dimp-02-07-2023/     
#  6     3 https://gp.to/ab/39443/omegs-02-07-2023/     1 https://gp.to/ab/39443/omegs-02-07-2023/  
#  7     3 https://gp.to/ab/39443/omegs-02-07-2023/     2 https://gp.to/ab/39443/omegs-02-07-2023/2/
#  8     3 https://gp.to/ab/39443/omegs-02-07-2023/     3 https://gp.to/ab/39443/omegs-02-07-2023/3/
#  9     4 https://gp.to/ab/39443/lis-22-04-2018/       1 https://gp.to/ab/39443/lis-22-04-2018/    
# 10     4 https://gp.to/ab/39443/lis-22-04-2018/       2 https://gp.to/ab/39443/lis-22-04-2018/2/  
# 11     5 https://gp.to/ab/39443/madi-22-04-2018/      1 https://gp.to/ab/39443/madi-22-04-2018/   
# 12     5 https://gp.to/ab/39443/madi-22-04-2018/      2 https://gp.to/ab/39443/madi-22-04-2018/2/ 
# 13     5 https://gp.to/ab/39443/madi-22-04-2018/      3 https://gp.to/ab/39443/madi-22-04-2018/3/ 
# 14     5 https://gp.to/ab/39443/madi-22-04-2018/      4 https://gp.to/ab/39443/madi-22-04-2018/4/ 
# 15     5 https://gp.to/ab/39443/madi-22-04-2018/      5 https://gp.to/ab/39443/madi-22-04-2018/5/ 

数据

vec <- c("https://gp.to/ab/394/las69-02-09-2020/", "https://gp.to/ab/394/las69-02-09-2020/4/", "https://gp.to/ab/563/dimp-02-07-2023/", "https://gp.to/ab/39443/omegs-02-07-2023/", "https://gp.to/ab/39443/omegs-02-07-2023/3/", "https://gp.to/ab/39443/lis-22-04-2018/", "https://gp.to/ab/39443/lis-22-04-2018/2/", "https://gp.to/ab/39443/madi-22-04-2018/", "https://gp.to/ab/39443/madi-22-04-2018/5/")

Python相关问答推荐

获取2个字节之间的异或

两极:滚动组,起始指数由不同列设置

使用pandas MultiIndex进行不连续 Select

Pandas 按照特殊规则保留每n行

Pandas使用过滤器映射多列

在后台运行的Python函数

如何将我的位置与光强度数据匹配到折射图案曲线中?

将HLS纳入媒体包

在Python中对分层父/子列表进行排序

将整组数组拆分为最小值与最大值之和的子数组

Pandas 有条件轮班操作

如何让程序打印新段落上的每一行?

使用setuptools pyproject.toml和自定义目录树构建PyPi包

基于索引值的Pandas DataFrame条件填充

在极性中创建条件累积和

SQLAlchemy bindparam在mssql上失败(但在mysql上工作)

如何杀死一个进程,我的Python可执行文件以sudo启动?

python中csv. Dictreader. fieldname的类型是什么?'

为什么调用函数的值和次数不同,递归在代码中是如何工作的?

用SymPy在Python中求解指数函数