Home > Language Reference > Classes
XmlParser Class
+ Object
+ XmlParser
Description
The XmlParser class implements a small, fast, non-validating XML parser. It accepts any data stream as input and simply raises events as tags are encountered. This class is instantiable and derivable but not cloneable.
For simplicity reasons, this parser ignores !DOCTYPE tags. Other <! tags, such as !ENTITY or !ELEMENT are not supported, with the exception of <![CDATA[ sections which are correctly processed. Standard HTML entities, such as é are recognized.
Parsing XML content
An XML document is always structured as a tree. The parser uses an internal stack to remember its state while traversing the tree. Each time an opening tag is encountered, a new state is pushed onto the stack, along with some information such as attributes values and space handling mode, then a StartElement event is raised. Each time a closing tag is encountered, the EndElement event is raised, then the relevant state (the one at the top of the stack if the XML document is well-formed) is poped off the stack. Regular data, that is text that is not part of any tag is passed to your application through the CharData event.
At any moment, the stack reflects which tags have been opened and not closed yet. For example, consider the following file:
<html>
<head>
<title>Foo</title>
</head>
<body>
<h1>Headline</h1>
<p>Sample text</p>
</body>
</html>When the parser encounters Foo, the content of the stack is:
html head title
When the parser encounters Sample text, the content of the stack is:
html body p
And so on. From any event handler, you can call the Tag and Attributes properties to crawl the stack and determine which tags are currently opened, while the StackDepth property return the number of states on the stack. You can also use the UserData property to associate some user value to each state on the stack.
Typically, you'll build a finite state automaton whose transitions are driven by the StartElement event to process an XML file. Consider the following content:
<phonebook>
<contact>
<name>John</name>
<phone>+1555713951</phone>
</contact>
<contact>
<name>Jack</name>
<phone>+1555253456</phone>
</contact>
</phonebook>The following code can be used to parse this file. Note the state of the automaton is stored onto the parser stack itself, using the UserData property. This exempts us from handling the EndElement event to restore the state of the automaton when a tag is closed:
Private Xml As New XmlParser ' The parser itself
Private sName As String ' The name of the contact being parsed
Private sPhone As String ' The phone number of the current being parsed
Private Sub Button1_Click()
Dim vfs As New VFSVolume
Dim f As New StreamFile
' Open the file on the expansion card and initiate
' the parse.
vfs.FindFirstVolume
f.Open vfs.Reference, "/phonebook.xml", hbModeOpenExisting+hbModeReadOnly
Xml.Parse f
f.Close
End Sub
Private Sub Xml_StartElement(ByVal sTag As String, ByRef eSpaceMode As HbSpaceMode)
Dim state As Integer
' An opening tag is encountered. Determine the current state: if
' this is the first tag, it is zero, otherwise retrieve it from the
' parser stack.
state = 0
If Xml.StackDepth > 1 Then state = Xml.UserData(1)
' Depending on the state, check the tag name and switch
' to another state, or abort.
Select Case state
Case 0
If sTag="phonebook" Then
state = 1
Else
Xml.Abort
End If
Case 1
If sTag="contact" Then
sName = ""
sPhone = ""
state = 2
Else
Xml.Abort
End If
Case 2
If sTag="name" Then
state = 3
ElseIf sTag = "phone" Then
state = 4
Else
Xml.Abort
End If
Case Else
Xml.Abort
End Select
' Store the new state onto the parser stack, and set the space
' mode to hbSpaceTrim to trim all blank characters.
Xml.UserData(0) = state
eSpaceMode = hbSpaceTrim
End Sub
Private Sub Xml_EndElement(ByVal sTag As String)
' A closing </contact> tag is encountered. Simply
' display the retrieved name and phone number.
If sTag = "contact" Then MsgBox "Name = " & sName & "\nPhone = " & sPhone
End Sub
Private Sub Xml_CharData(ByVal sText As String)
' Text data is encountered. Depending on the current state,
' store it either as name or either as phone number.
Select Case Xml.UserData(0)
Case 3
sName = sText
Case 4
sPhone = sText
Case Else
Xml.Abort
End Select
End SubIn this example, the parser aborts if an unexpected tag or data is encountered; a more realistic example would envisage special states to handle errors and recover properly after an unexpected tag.
The parser does not automatically concatenate successive text and <![CDATA[ sections. For example, if the document contains the following line:
<tag>This is <![CDATA[raw]]> text</tag>
Then the CharData event will be raised three times, once passing the string "This is ", another time passing the string "raw" and a last time passing the string " text".
Text encoding
The XmlParser class can handle UTF-8, UTF-16 in both big endian and little endian byte order, Latin-1, Latin-9 and Windows CP 1252 code pages. However, HB++ does not support wide chars and all strings are converted first to the Palm OS code page before being returned to your code. The SubstChar property defines which character to use to replace characters in the input flow that could not be mapped to the Palm OS code page.
The encoding scheme is automatically detected when the first line of the document is read, either because a byte order mark (BOM) was found, or because the encoding is not ambiguous (UTF-16 big endian and little endian) or because it is specified in the initial <?xml... ?> tag. The Charset property can be used to change the encoding scheme afterward. Of course, if the scheme you specify does not correspond to the scheme the file actually use, the parser may crash.
Error handling
Errors are reported to your code through the ParseError event. Some errors, such as a missing closing tag, are recoverable. The parse continues after the error is reported. Other errors, such as a missing double quote (") around a tag attribute, causes the parser to abort immediately.
Members
| Members | Description |
| Abort | Interrupts the parser. |
| Attributes | Attributes associated to a tag. |
| CharData | Raised when the parser encounters raw data. |
| Charset | Encoding scheme of the document. |
| EndElement | Raised when the parser encounters a closing tag. |
| Parse | Starts parsing the document. |
| ParseError | Raised when an error occured parsing the document. |
| StackDepth | Current stack depth. |
| StartElement | Raised when the parser encounters an opening tag. |
| SubstChar | Substitution character. |
| Tag | Tag name. |
| UserData | User data associated with a tag. |
| Inherited from Object | Description |
| ClassID | Returns the type identifier corresponding to the actual class of the object. |
| Implements | Determines whether the object implements the features of a given class. |
| Iterate | Event raised to iterate over the elements of a container object. |
| Recipient | Recipient of events sent by the object. |
| Serialize | Event raised to serialize the object content into a stream. |
System requirements
| System | Minimal version | Remarks |
| Palm OS | Palm OS 3.0 | N/A |