这段代码可以完成这项工作.我们使用以下假设将两行中的标题分开:text in the two lines whose indices overlap or are immediately adjacent belongs to the same heading;when both lines have a space in a particular position, we can assume that the material on each side belongs to separate headings.不需要正则表达式.
# read in the 2 lines:
line1 = ' CLASS CLOCK NET '
line2 = ' ID# PLACE PLACE FINISHER TIME TIME PACE '
# pad the shorter among the lines, so that both are equally long:
linediff = len(line1) - len(line2)
if linediff > 0:
line2 += ' ' * linediff
else:
line1 += ' ' * (-linediff)
length = len(line1)
# go through both lines character-by-character:
top, bottom = [], []
i = 0
while i < length:
# skip indices where both lines have a space:
if line1[i] == ' ' and line2[i] == ' ':
i += 1
else:
# find the first j to the right of i for which
# both lines have a space:
j = i
while (j < length) and (line1[j] != ' ' or line2[j] != ' '):
j += 1
# copy the lines from position i (inclusive)
# to j (exclusive) into top and bottom:
top.append(line1[i:j])
bottom.append(line2[i:j])
# we are done with one heading and advance i:
i = j
# top:
# [' ', ' ', 'CLASS', ' ', ' CLOCK', ' NET', ' ']
# bottom:
# ['ID#', 'PLACE', 'PLACE', 'FINISHER', 'TIME ', 'TIME ', 'PACE']
headers = []
for str1, str2 in zip(top, bottom):
# remove leading/trailing spaces from each partial heading:
s1, s2 = str1.strip(), str2.strip()
# merge partial headings
# (strip is needed because one of the two might be empty):
headers.append((s1 + ' ' + s2).strip())
# headers:
# ['ID#', 'PLACE', 'CLASS PLACE', 'FINISHER', 'CLOCK TIME', 'NET TIME', 'PACE']
请注意,该问题实际上与HTML无关,因此不需要任何特殊的HTML处理.