Ruby XML

Education is not limited to just classrooms. It can be gained anytime, anywhere... - Ravi Ranjan (M.Tech-NIT)

Ruby XML (REXML)

XML is eXtensible Markup Language like HTML. It allows programmers to develop applications that can be read by other applications irrespective of operating system and developmental language used.

It keeps track of small to medium amounts of data without any SQL based technique in backend.

REXML is a pure Ruby XML processor. It represents a full XML document including PIs, doctype, etc. An XML document has a single child that can be accessed by root(). If you want to have an XML declaration for a created document, you must add one. REXML documents do not write a default declaration for you.

REXML was inspired by Electric XML library for Java. Its API is easy to use, small size and have followed the Ruby methodology for method naming and code flow.

It supports both tree and stream document parsing. Steam parsing is 1.5 times faster than tree parsing. However, in stream parsing you don't get access to some features like XPath.


REXML features:

  • It is written 100 percent in Ruby.
  • It contains less than 2000 lines of code, hence, lighter in weight.
  • Its methods and classes are easy to understand.
  • It is shipped with Ruby installation. No need to install it separately.
  • It is used for both DOM and SAX parsing.

Parsing XML and accessing elements

Let's start with parsing an XML document:

  1. require "rexml/document"  
  2. file = File.new( "trial.xml" )  
  3. doc = REXML::Document.new file  

In the above code, line 3 parses the supplied file.

Example:

  1. require 'rexml/document'   
  2.   
  3. include REXML   
  4.   
  5. file = File.new("trial.xml")   
  6. doc = Document.new(file)   
  7. puts docs

In the above code, the require statement loads the REXML library. Then include REXML indicates that we don't have to use names like REXML::Document. We have created trial.xml file. Document is shown on the screen.

Output:

Ruby XML 1

 

The Document.new method takes IO, String object or Document as its argument. This argument specifies the source from which XML document has to be read.

If a Document constructor takes a Document as argument, all its element nodes are cloned to new Document object. If the constructor takes a String argument, string will be expected to contain an XML document.


XML with "Here Document"

A here Document is a way to specify a text block, preserving line breaks, whitespaces or identation with text.

A here Document is constructed using a command followed by "<<" followed by a token string.

In Ruby, there should be no space between "<<" and token string.

Example:

Here, we use here Document info. All the characters including newlines between <

For XML parsing examples, we will use following XML file code as input:

file trial.xml

  1. #!/usr/bin/ruby -w   
  2.   
  3. require 'rexml/document'   
  4. include REXML   
  5. xmlfile = File.new("trial.xml")   
  6. xmldoc = Document.new(xmlfile)   
  7.   
  8. # Now get the root element   
  9. root = xmldoc.root   
  10. puts "Root element : " + root.attributes["shelf"]   
  11.   
  12. # This will output all the cloth titles.   
  13. xmldoc.elements.each("collection/clothing"){   
  14.    |e| puts "cloth Title : " + e.attributes["title"]   
  15. }   
  16.   
  17. # This will output all the cloth types.   
  18. xmldoc.elements.each("collection/clothing/type") {   
  19.    |e| puts "cloth Type : " + e.text   
  20. }   
  21.   
  22. # This will output all the cloth description.   
  23. xmldoc.elements.each("collection/clothing/description") {   
  24.    |e| puts "cloth Description : " + e.text   
  25. }  

Ruby XML DOM-Like Parsing

We will parse our XML data in tree fashion. The above file trial.xml code is taken as input.

  1. #!/usr/bin/ruby -w   
  2.   
  3. require 'rexml/document'   
  4. include REXML   
  5.   
  6. xmlfile = File.new("trial.xml")   
  7. xmldoc = Document.new(xmlfile)   
  8.   
  9. # Now get the root element   
  10. root = xmldoc.root   
  11. puts "Root element : " + root.attributes["shelf"]   
  12.   
  13. # This will output all the cloth titles.   
  14. xmldoc.elements.each("collection/clothing"){   
  15.    |e| puts "cloth Title : " + e.attributes["title"]   
  16. }   
  17.   
  18. # This will output all the cloth types.   
  19. xmldoc.elements.each("collection/clothing/type") {   
  20.    |e| puts "cloth Type : " + e.text   
  21. }   
  22.   
  23. # This will output all the cloth description.   
  24. xmldoc.elements.each("collection/clothing/description") {   
  25.    |e| puts "cloth Description : " + e.text   
  26. }  

Output:

Ruby XML 2


Ruby XML SAX-Like Parsing

We will parse our XML data in stream fashion. The above file trial.xml code is taken as input. Here, we will define a listener class whose methods will be targeted for callbacks from the parser.

It is advisable that do not use SAX-like parsing for a small file.

  1. #!/usr/bin/ruby -w   
  2.   
  3. require 'rexml/document'   
  4. require 'rexml/streamlistener'   
  5. include REXML   
  6.   
  7. class MyListener   
  8.   include REXML::StreamListener   
  9.   def tag_start(*args)   
  10.     puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"   
  11.   end   
  12.   
  13.   def text(data)   
  14.     return if data =~ /^w*$/     # whitespace only   
  15.     abbrev = data[0..40] + (data.length > 40 ? "..." : "")   
  16.     puts "  text   :   #{abbrev.inspect}"   
  17.   end   
  18. end   
  19.   
  20. list = MyListener.new

Output:

Ruby XML 3