我在创建一个正则表达式来提取与时间XX:XX am or pm相关的短语时遇到了问题
import re
hh, mm, am_pm = "", "", "" #each hour group element in str format
times_output = [] #list that must accumulate all time in "XX:XX am or pm" format, 'X' is a int value
伪正则表达式模式(对于这种类型的示例输入字符串)
"sense" \s* entre las \s* "XX:XX am or pm" \s* y las \s* "XX:XX am or pm"
"XX:XX am or pm" ---> "sense"
"XX:XX am or pm" ---> "sense"
"sense" a las "XX:XX am or pm"
"XX:XX am or pm" ---> "sense"
"sense" a las "XX:XX am or pm", a las "XX:XX am or pm" o a las "XX:XX am or pm"
"XX:XX am or pm" ---> "sense"
"XX:XX am or pm" ---> "sense"
"XX:XX am or pm" ---> "sense"
(...|.|,|;) \s* "sense1" \s* (a las|de las|) \s* "XX:XX am or pm" \s* "sense2"
"XX:XX am or pm" ---> "sense1" + "sense2"
"22:00 pm" ---> "ya que a las" + "empieza el show"
在这种情况下,"ya que"和"a las"将被移除
Regex pattern to extract times from the input sentence no matter what is before or after the times pattern个
Example 1:个
input_text = "puede ser peligroso salir entre las 18:00 pm y las 20:00 pm hs, por ello yo pienso que seria mejor salir a las 21:00 pm, a las 21:15 pm o a las 21:30 pm ya que a las 22:00 pm empezaria el show"
#sense_pattern = r"(?P()\s.+?)" #THE REGEX THAT I NEED
civil_time_pattern = r'(\d{1,2})[\s|:]*(\d{0,2})\s*(am|pm)?'
#civil_time_unit_list = re.search(civil_time_pattern, input_text_all_in_minus)
civil_time_unit_list = re.findall(civil_time_pattern, input_text_all_in_minus)
在这种情况下,这是更重要的only for the time regex
try:
hh = civil_time_unit_list[0][0]
if (hh == ""): hh = "00"
except IndexError: hh = "00"
try:
mm = civil_time_unit_list[0][1]
if (mm == ""): mm = "00"
except IndexError: mm = "00"
try:
am_pm = civil_time_unit_list[0][2]
if (am_pm == ""): am_pm = "am"
except IndexError: am_pm = "am"
time_output = (hh + ":" + mm + " " + am_pm).strip()
#remove unnecessary connectors in the <<sense>>
sense = sense.replace("entre las", "")
sense = sense.replace("y las", "")
sense = sense.replace("entre las", "")
sense = sense.replace("a las", "")
sense = sense.replace("ya que", "")
然后,只需创建带有时间表名称的文件,并在其中写入相关的含义
time_output_file = time_output + ".txt"
with open(time_output_file, 'w') as f:
f.write(sense)
最后,文件应该如下所示(对于本例)……
18:00 pm.txt ----> 'puede ser peligroso salir'
20:00 pm.txt ----> 'puede ser peligroso salir'
21:00 pm.txt ----> 'por ello yo pienso que seria mejor salir'
21:15 pm.txt ----> 'por ello yo pienso que seria mejor salir'
21:30 pm.txt ----> 'por ello yo pienso que seria mejor salir'
22:00 pm.txt ----> 'empezaria el show'