下面是一个如何使用re
模块解析文本的示例:
import pandas as pd
mystr = " Here is a summary of the key financial trends for XYZ based on the earnings call transcript:\n\nHeader 0:\n- Q4 revenue was $2.7 billion, down 12% sequentially and 16% year-over-year due to broad-based weakness\n- Gross margin was 70.2%, down sequentially and year-over-year\n- Operating margin was 44.7%. FY2023 operating margin was 48.9%, down 50 basis points \n- Q4 EPS was $2.01, slightly above outlook\n\nHeader 1: \n- Q4 inventory decreased by $70 million sequentially to 188 days\n- Reduced Q4 OpEx by $60M sequentially through discretionary cuts and lower variable comp\n\nHeader 2:\n- Industrial revenue down 19% sequentially and 20% year-over-year in Q4 on broad-based weakness\n- Automotive revenue down slightly sequentially, up 14% year-over-year in Q4\n- Communications revenue down 6% sequentially, 32% year-over-year in Q4 \n- Consumer revenue down 6% sequentially, 28% year-over-year in Q4\n\nHeader 3:\n- Q1 revenue guidance $2.5 billion +/- $100 million. Expect all end markets down sequentially\n- Expect inventory correction to taper through 1H of FY2024"
all_data = []
for header, group in re.findall(
r"^([^-].*?):(.*?)(?=^[^-].*?:|\Z)", mystr, flags=re.S | re.M
):
header = header.strip()
for line in re.findall(r"^\s*-\s*(.+?)\s*$", group, flags=re.M):
all_data.append((header, line))
df = pd.DataFrame(all_data, columns=["header", "point"])
print(df)
打印:
header point
0 Header 0 Q4 revenue was $2.7 billion, down 12% sequentially and 16% year-over-year due to broad-based weakness
1 Header 0 Gross margin was 70.2%, down sequentially and year-over-year
2 Header 0 Operating margin was 44.7%. FY2023 operating margin was 48.9%, down 50 basis points
3 Header 0 Q4 EPS was $2.01, slightly above outlook
4 Header 1 Q4 inventory decreased by $70 million sequentially to 188 days
5 Header 1 Reduced Q4 OpEx by $60M sequentially through discretionary cuts and lower variable comp
6 Header 2 Industrial revenue down 19% sequentially and 20% year-over-year in Q4 on broad-based weakness
7 Header 2 Automotive revenue down slightly sequentially, up 14% year-over-year in Q4
8 Header 2 Communications revenue down 6% sequentially, 32% year-over-year in Q4
9 Header 2 Consumer revenue down 6% sequentially, 28% year-over-year in Q4
10 Header 3 Q1 revenue guidance $2.5 billion +/- $100 million. Expect all end markets down sequentially
11 Header 3 Expect inventory correction to taper through 1H of FY2024