无涯教程网

关注我们

Ruby - XML、XSLT

Ruby - XML、XSLT

什么是XML?

可扩展标签语言(XML)是一种类似于HTML或SGML的标签语言。

XML是一种可移植的开放源语言，它使程序员能够开发可由其他应用程序读取的应用程序，而无需考虑操作系统/开发语言。

XML结构和API

XML解析器有两种不同的风格-

SAX - 在这里，您可以为感兴趣的事件注册回调，然后使解析器继续处理文档。当文档很大或有内存限制时，此函数非常有用，它会在从磁盘读取文件时解析文件，并且整个文件永远不会存储在内存中。
DOM - 这是万维网联盟的建议，其中，将整个文件读入内存并以分层(基于树)的形式存储以表示XML文档的所有函数。

SAX是只读的，而DOM允许更改XML文件。由于这两个不同的API实际上相互补充，因此没有理由不能在大型项目中同时使用它们。

无涯教程网

创建XML

操纵XML的最常见方法是使用Sean Russell的REXML库。自2002年以来，REXML已成为标准Ruby发行版的一部分。

REXML是符合XML 1.0标准的纯Ruby XML处理器。它是一个非验证处理器，通过了所有OASIS非验证一致性测试。

对于所有的XML代码示例，使用一个简单的XML文件作为输入-

<collection shelf="New Arrivals">
   <movie title="Enemy Behind">
      <type>War, Thriller</type>
      <format>DVD</format>
      <year>2003</year>
      <rating>PG</rating>
      <stars>10</stars>
      <description>Talk about a US-Japan war</description>
   </movie>
   <movie title="Transformers">
      <type>Anime, Science Fiction</type>
      <format>DVD</format>
      <year>1989</year>
      <rating>R</rating>
      <stars>8</stars>
      <description>A schientific fiction</description>
   </movie>
   <movie title="Trigun">
      <type>Anime, Action</type>
      <format>DVD</format>
      <episodes>4</episodes>
      <rating>PG</rating>
      <stars>10</stars>
      <description>Vash the Stampede!</description>
   </movie>
   <movie title="Ishtar">
      <type>Comedy</type>
      <format>VHS</format>
      <rating>PG</rating>
      <stars>2</stars>
      <description>Viewable boredom</description>
   </movie>
</collection>

DOM解析

首先以树方式解析XML数据。需要 rexml/document 库；通常为了方便起见，通常会做一个include REXML导入顶级名称空间。

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile=File.new("movies.xml")
xmldoc=Document.new(xmlfile)

# Now get the root element
root=xmldoc.root
puts "Root element : " + root.attributes["shelf"]

# This will output all the movie titles.
xmldoc.elements.each("collection/movie"){ 
   |e| puts "Movie Title : " + e.attributes["title"] 
}

# This will output all the movie types.
xmldoc.elements.each("collection/movie/type") {
   |e| puts "Movie Type : " + e.text 
}

# This will output all the movie description.
xmldoc.elements.each("collection/movie/description") {
   |e| puts "Movie 描述 : " + e.text 
}

这将产生以下输出-

Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War, Thriller
Movie Type : Anime, Science Fiction
Movie Type : Anime, Action
Movie Type : Comedy
Movie 描述 : Talk about a US-Japan war
Movie 描述 : A schientific fiction
Movie 描述 : Vash the Stampede!
Movie 描述 : Viewable boredom

SAX 解析

为了以面向流的方式处理相同的数据 movies.xml 文件，无涯教程将定义一个 listener 类，其方法将成为目标来自解析器的回调。

注意-不建议对小文件使用类似SAX的解析，这仅是一个演示示例。

#!/usr/bin/ruby -w

require 'rexml/document'
require 'rexml/streamlistener'
include REXML

class MyListener
   include REXML::StreamListener
   def tag_start(*args)
      puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
   end

   def text(data)
      return if data =~ /^\w*$/# whitespace only
      abbrev=data[0..40] + (data.length > 40 ? "..." : "")
      puts "  text   :   #{abbrev.inspect}"
   end
end

list=MyListener.new
xmlfile=File.new("movies.xml")
Document.parse_stream(xmlfile, list)

这将产生以下输出-

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
   text   :   "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
   text   :   "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
   text   :   "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
   text   :   "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
   text   :   "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
   text   :   "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
   text   :   "Viewable boredom"

XPath和Ruby

查看XML的另一种方法是XPath。这是一种伪语言，描述了如何在XML文档中定位特定元素和属性，并将该文档视为逻辑顺序树。

REXML通过 XPath 类具有XPath支持。如上所见，它假设基于树的解析(文档对象模型)。

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile=File.new("movies.xml")
xmldoc=Document.new(xmlfile)

# Info for the first movie found
movie=XPath.first(xmldoc, "//movie")
p movie

# Print out all the movie types
XPath.each(xmldoc, "//type") { |e| puts e.text }

# Get an array of all of the movie formats.
names=XPath.match(xmldoc, "//format").map {|x| x.text }
p names

这将产生以下输出-

<movie title='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT4R

XSLT4R由Michael Neumann编写，可以在RAA的"库"部分的XML下找到。 XSLT4R使用简单的命令行界面，尽管它也可以在第三方应用程序中使用，以转换XML文档。

XSLT4R需要XMLScan才能运行，它包含在XSLT4R归档文件中，并且也是100％的Ruby模块。可以使用标准Ruby安装方法(即ruby install.rb)安装这些模块。具有以下语法-

ruby xslt.rb stylesheet.xsl document.xml [arguments]

如果要在应用程序中使用XSLT4R，则可以包括XSLT并输入所需的参数。这是示例-

require "xslt"

stylesheet=File.readlines("stylesheet.xsl").to_s
xml_doc=File.readlines("document.xml").to_s
arguments={ 'image_dir' => '/....' }
sheet=XSLT::Stylesheet.new( stylesheet, arguments )

# output to StdOut
sheet.apply( xml_doc )

# output to 'str'
str=""
sheet.output=[ str ]
sheet.apply( xml_doc )

祝学习愉快！(内容编辑有误？请选中要编辑内容 -> 右键 -> 修改 -> 提交！)

技术教程推荐

深入浅出gRPC -〔李林锋〕

React实战进阶45讲 -〔王沛〕

Nginx核心知识150讲 -〔陶辉〕

浏览器工作原理与实践 -〔李兵〕

Java业务开发常见错误100例 -〔朱晔〕

流程型组织15讲 -〔蒋伟良〕

React Hooks 核心原理与实战 -〔王沛〕

中间件核心技术与实战 -〔丁威〕

快手 · 移动端音视频开发实战 -〔展晓凯〕

好记忆不如烂笔头。留下您的足迹吧 :)