Home > Language Reference > Classes

XmlParser Class

  + Object
    + XmlParser

Description

The XmlParser class implements a small, fast, non-validating XML parser. It accepts any data stream as input and simply raises events as tags are encountered. This class is instantiable and derivable but not cloneable.

For simplicity reasons, this parser ignores !DOCTYPE tags. Other <! tags, such as !ENTITY or !ELEMENT are not supported, with the exception of <![CDATA[ sections which are correctly processed. Standard HTML entities, such as &eacute; are recognized.

Parsing XML content

An XML document is always structured as a tree. The parser uses an internal stack to remember its state while traversing the tree. Each time an opening tag is encountered, a new state is pushed onto the stack, along with some information such as attributes values and space handling mode, then a StartElement event is raised. Each time a closing tag is encountered, the EndElement event is raised, then the relevant state (the one at the top of the stack if the XML document is well-formed) is poped off the stack. Regular data, that is text that is not part of any tag is passed to your application through the CharData event.

At any moment, the stack reflects which tags have been opened and not closed yet. For example, consider the following file:

<html>
  <head>
    <title>Foo</title>
  </head>
  <body>
    <h1>Headline</h1>
    <p>Sample text</p>
  </body>
</html>

When the parser encounters Foo, the content of the stack is:

html
head
title

When the parser encounters Sample text, the content of the stack is:

html
body
p

And so on. From any event handler, you can call the Tag and Attributes properties to crawl the stack and determine which tags are currently opened, while the StackDepth property return the number of states on the stack. You can also use the UserData property to associate some user value to each state on the stack.

Typically, you'll build a finite state automaton whose transitions are driven by the StartElement event to process an XML file. Consider the following content:

<phonebook>
  <contact>
    <name>John</name>
    <phone>+1555713951</phone>
  </contact>
  <contact>
    <name>Jack</name>
    <phone>+1555253456</phone>
  </contact>
</phonebook>

The following code can be used to parse this file. Note the state of the automaton is stored onto the parser stack itself, using the UserData property. This exempts us from handling the EndElement event to restore the state of the automaton when a tag is closed:

Private Xml As New XmlParser  ' The parser itself
Private sName As String       ' The name of the contact being parsed
Private sPhone As String      ' The phone number of the current being parsed

Private Sub Button1_Click()
  Dim vfs As New VFSVolume
  Dim f As New StreamFile

  ' Open the file on the expansion card and initiate 
  ' the parse.

  vfs.FindFirstVolume
  f.Open vfs.Reference, "/phonebook.xml", hbModeOpenExisting+hbModeReadOnly
  Xml.Parse f
  f.Close 
End Sub

Private Sub Xml_StartElement(ByVal sTag As String, ByRef eSpaceMode As HbSpaceMode)
  Dim state As Integer
  
  ' An opening tag is encountered. Determine the current state: if
  ' this is the first tag, it is zero, otherwise retrieve it from the
  ' parser stack.

  state = 0
  If Xml.StackDepth > 1 Then state = Xml.UserData(1)

  ' Depending on the state, check the tag name and switch
  ' to another state, or abort.

  Select Case state
    Case 0
      If sTag="phonebook" Then
        state = 1
      Else
        Xml.Abort
      End If
      
    Case 1
      If sTag="contact" Then
        sName = ""
        sPhone = ""
        state = 2
      Else
        Xml.Abort
      End If
      
    Case 2
      If sTag="name" Then
        state = 3
      ElseIf sTag = "phone" Then
        state = 4
      Else
        Xml.Abort
      End If
      
    Case Else
      Xml.Abort
  End Select
  
  ' Store the new state onto the parser stack, and set the space
  ' mode to hbSpaceTrim to trim all blank characters.

  Xml.UserData(0) = state
  eSpaceMode = hbSpaceTrim
End Sub

Private Sub Xml_EndElement(ByVal sTag As String)
  ' A closing </contact> tag is encountered. Simply
  ' display the retrieved name and phone number.

  If sTag = "contact" Then MsgBox "Name = " & sName & "\nPhone = " & sPhone
End Sub

Private Sub Xml_CharData(ByVal sText As String)
  ' Text data is encountered. Depending on the current state,
  ' store it either as name or either as phone number.

  Select Case Xml.UserData(0)
    Case 3
      sName = sText
    Case 4
      sPhone = sText
    Case Else
      Xml.Abort
  End Select
End Sub

In this example, the parser aborts if an unexpected tag or data is encountered; a more realistic example would envisage special states to handle errors and recover properly after an unexpected tag.

The parser does not automatically concatenate successive text and <![CDATA[ sections. For example, if the document contains the following line:

<tag>This is <![CDATA[raw]]> text</tag>

Then the CharData event will be raised three times, once passing the string "This is ", another time passing the string "raw" and a last time passing the string " text".

Text encoding

The XmlParser class can handle UTF-8, UTF-16 in both big endian and little endian byte order, Latin-1, Latin-9 and Windows CP 1252 code pages. However, HB++ does not support wide chars and all strings are converted first to the Palm OS code page before being returned to your code. The SubstChar property defines which character to use to replace characters in the input flow that could not be mapped to the Palm OS code page.

The encoding scheme is automatically detected when the first line of the document is read, either because a byte order mark (BOM) was found, or because the encoding is not ambiguous (UTF-16 big endian and little endian) or because it is specified in the initial <?xml... ?> tag. The Charset property can be used to change the encoding scheme afterward. Of course, if the scheme you specify does not correspond to the scheme the file actually use, the parser may crash.

Error handling

Errors are reported to your code through the ParseError event. Some errors, such as a missing closing tag, are recoverable. The parse continues after the error is reported. Other errors, such as a missing double quote (") around a tag attribute, causes the parser to abort immediately.

Members

MembersDescription
Abort Interrupts the parser.
Attributes Attributes associated to a tag.
CharData Raised when the parser encounters raw data.
Charset Encoding scheme of the document.
EndElement Raised when the parser encounters a closing tag.
Parse Starts parsing the document.
ParseError Raised when an error occured parsing the document.
StackDepth Current stack depth.
StartElement Raised when the parser encounters an opening tag.
SubstChar Substitution character.
Tag Tag name.
UserData User data associated with a tag.
Inherited from ObjectDescription
ClassID Returns the type identifier corresponding to the actual class of the object.
Implements Determines whether the object implements the features of a given class.
Iterate Event raised to iterate over the elements of a container object.
Recipient Recipient of events sent by the object.
Serialize Event raised to serialize the object content into a stream.

System requirements

SystemMinimal versionRemarks
Palm OSPalm OS 3.0N/A