Python - 处理非结构化数据

Python - 处理非结构化数据 首页 / 数据科学入门教程 / Python - 处理非结构化数据

以行和列格式存在的数据,或可以轻松转换为行和列的数据,以便以后可以很好地适合数据库的数据称为结构化数据,例如CSV,TXT,XLS文件等。

读取数据

在下面的示例中,无涯教程获取一个文本文件并读取该文件,其中分离了其中的每一行,接下来,可以将输出分为更多的行和单词。

无涯教程网

链接:https://www.learnfk.comhttps://www.learnfk.com/python-data-science/python-processing-unstructured-data.html

来源:LearnFk无涯教程网

filename = 'path\input.txt'  

with open(filename) as fn:  

# Read each line
   ln = fn.readline()

# Keep count of lines
   lncnt = 1
   while ln:
       print("Line {}: {}".format(lncnt, ln.strip()))
       ln = fn.readline()
       lncnt += 1

当执行上面的代码时,它将产生以下输出。

Line 1: Python is an interpreted high-level programming language for general-purpose programming. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.
Line 2: Python features a dynamic type system and automatic memory management. It supports multiple programming paradigms, including object-oriented, imperative, functional and procedural, and has a large and comprehensive standard library.
Line 3: Python interpreters are available for many operating systems. CPython, the reference implementation of Python, is open source software and has a community-based development model, as do nearly all of its variant implementations. CPython is managed by the non-profit Python Software Foundation.

计数词频

可以使用计数器功能如下计算文件中单词的频率。

from collections import Counter

with open(r'pathinput2.txt') as f:
               p = Counter(f.read().split())
               print(p)

当无涯教程执行上面的代码时,它将产生以下输出。

Counter({'and': 3, 'Python': 3, 'that': 2, 'a': 2, 'programming': 2, 'code': 1, '1991,': 1, 'is': 1, 'programming.': 1, 'dynamic': 1, 'an': 1, 'design': 1, 'in': 1, 'high-level': 1, 'management.': 1, 'features': 1, 'readability,': 1, 'van': 1, 'both': 1, 'for': 1, 'Rossum': 1, 'system': 1, 'provides': 1, 'memory': 1, 'has': 1, 'type': 1, 'enable': 1, 'Created': 1, 'philosophy': 1, 'constructs': 1, 'emphasizes': 1, 'general-purpose': 1, 'notably': 1, 'released': 1, 'significant': 1, 'Guido': 1, 'using': 1, 'interpreted': 1, 'by': 1, 'on': 1, 'language': 1, 'whitespace.': 1, 'clear': 1, 'It': 1, 'large': 1, 'small': 1, 'automatic': 1, 'scales.': 1, 'first': 1})

祝学习愉快!(内容编辑有误?请选中要编辑内容 -> 右键 -> 修改 -> 提交!)

技术教程推荐

Service Mesh实践指南 -〔周晶〕

Go语言核心36讲 -〔郝林〕

深入剖析Kubernetes -〔张磊〕

趣谈Linux操作系统 -〔刘超〕

张汉东的Rust实战课 -〔张汉东〕

物联网开发实战 -〔郭朝斌〕

恋爱必修课 -〔李一帆〕

Web漏洞挖掘实战 -〔王昊天〕

商业思维案例笔记 -〔曹雄峰〕

好记忆不如烂笔头。留下您的足迹吧 :)