Python 如何在Pandas df中拆分\n\n和\n作为单独列的字符串

发布于12月13日

我有一个字符串，每个标题下都有标题和项目符号.如何将其转换为具有2列-标题和点-的数据帧？还想清理点列中的文本以删除项目符号字符:、-.

mystr=' Here is a summary of the key financial trends for XYZ based on the earnings call transcript:\n\nHeader 0:\n- Q4 revenue was $2.7 billion, down 12% sequentially and 16% year-over-year due to broad-based weakness\n- Gross margin was 70.2%, down sequentially and year-over-year\n- Operating margin was 44.7%. FY2023 operating margin was 48.9%, down 50 basis points  \n- Q4 EPS was $2.01, slightly above outlook\n\nHeader 1:  \n- Q4 inventory decreased by $70 million sequentially to 188 days\n- Reduced Q4 OpEx by $60M sequentially through discretionary cuts and lower variable comp\n\nHeader 2:\n- Industrial revenue down 19% sequentially and 20% year-over-year in Q4 on broad-based weakness\n- Automotive revenue down slightly sequentially, up 14% year-over-year in Q4\n- Communications revenue down 6% sequentially, 32% year-over-year in Q4  \n- Consumer revenue down 6% sequentially, 28% year-over-year in Q4\n\nHeader 3:\n- Q1 revenue guidance $2.5 billion +/- $100 million. Expect all end markets down sequentially\n- Expect inventory correction to taper through 1H of FY2024'

预期输出:

import pandas as pd mystr = " Here is a summary of the key financial trends for XYZ based on the earnings call transcript:\n\nHeader 0:\n- Q4 revenue was $2.7 billion, down 12% sequentially and 16% year-over-year due to broad-based weakness\n- Gross margin was 70.2%, down sequentially and year-over-year\n- Operating margin was 44.7%. FY2023 operating margin was 48.9%, down 50 basis points \n- Q4 EPS was $2.01, slightly above outlook\n\nHeader 1: \n- Q4 inventory decreased by $70 million sequentially to 188 days\n- Reduced Q4 OpEx by $60M sequentially through discretionary cuts and lower variable comp\n\nHeader 2:\n- Industrial revenue down 19% sequentially and 20% year-over-year in Q4 on broad-based weakness\n- Automotive revenue down slightly sequentially, up 14% year-over-year in Q4\n- Communications revenue down 6% sequentially, 32% year-over-year in Q4 \n- Consumer revenue down 6% sequentially, 28% year-over-year in Q4\n\nHeader 3:\n- Q1 revenue guidance $2.5 billion +/- $100 million. Expect all end markets down sequentially\n- Expect inventory correction to taper through 1H of FY2024" all_data = [] for header, group in re.findall( r"^([^-].*?):(.*?)(?=^[^-].*?:|\Z)", mystr, flags=re.S | re.M ): header = header.strip() for line in re.findall(r"^\s*-\s*(.+?)\s*$", group, flags=re.M): all_data.append((header, line)) df = pd.DataFrame(all_data, columns=["header", "point"]) print(df)

header point 0 Header 0 Q4 revenue was $2.7 billion, down 12% sequentially and 16% year-over-year due to broad-based weakness 1 Header 0 Gross margin was 70.2%, down sequentially and year-over-year 2 Header 0 Operating margin was 44.7%. FY2023 operating margin was 48.9%, down 50 basis points 3 Header 0 Q4 EPS was $2.01, slightly above outlook 4 Header 1 Q4 inventory decreased by $70 million sequentially to 188 days 5 Header 1 Reduced Q4 OpEx by $60M sequentially through discretionary cuts and lower variable comp 6 Header 2 Industrial revenue down 19% sequentially and 20% year-over-year in Q4 on broad-based weakness 7 Header 2 Automotive revenue down slightly sequentially, up 14% year-over-year in Q4 8 Header 2 Communications revenue down 6% sequentially, 32% year-over-year in Q4 9 Header 2 Consumer revenue down 6% sequentially, 28% year-over-year in Q4 10 Header 3 Q1 revenue guidance $2.5 billion +/- $100 million. Expect all end markets down sequentially 11 Header 3 Expect inventory correction to taper through 1H of FY2024

Python 如何在Pandas df中拆分\n\n和\n作为单独列的字符串

推荐答案

Python相关问答推荐

使用itertools出现第n个子串

手动为pandas中的列上色

如何才能将每个组比上一组增加N %？

Polars Select 多个元素产品

如何计算列表列行之间的公共元素

无法使用equals_html从网址获取全文

Python多处理：当我在一个巨大的pandas数据框架上启动许多进程时，程序就会陷入困境

不理解Value错误：在Python中使用迭代对象设置时必须具有相等的len键和值

PMMLPipeline._ fit()需要2到3个位置参数，但给出了4个位置参数

Streamlit应用程序中的Plotly条形图中未正确显示Y轴刻度

无法使用DBFS File API路径附加到CSV In Datricks(OSError Errno 95操作不支持)

对象的`call`方法的setattr在Python中不起作用'

将JSON对象转换为Dataframe

计算分布的标准差

调用decorator返回原始函数的输出

为什么numpy. vectorize调用vectorized函数的次数比vector中的元素要多？

干燥化与列姆化的比较

计算空值

并行编程：同步进程

如何根据rame中的列值分别分组值