alnds
February 20th, 2009, 12:19 AM
I have a very large number of a very large-sized set of XML documents. I need to parse them and extract the content of the elements but since these are large files, I intend to use SAX rather than DOM. However, when I use XMLTextReader, it is difficult to parse elements that have text + some more elements and then some more text. How can I do this? I'm using C#.
I have put up an example of the type of XML file I am referring to.
<?xml version="1.0"?>
<catalog>
<book id="bk101">This is the first book by
<author>Gambardella, Matthew</author> which has the name
<title>XML Developer's Guide</title> on
<genre>Computer</genre>and is available at a price of
<price>$44.95</price>. It was published on
<publish_date>2000-10-01</publish_date>
in<place>London</place>
</book>
....
...
...
...
</catalog>
I need to extract the inner elements separately and also the text "This is the first book by Gambardella, Matthew which has the name XML Developer's Guide on Computer and is available at a price of $44.95. It was published on 2000-10-01 in London.
Could someone please guide me on how to do this? A code snippet would be more than welcome. Thanks in advance to anyone who replies.
I have put up an example of the type of XML file I am referring to.
<?xml version="1.0"?>
<catalog>
<book id="bk101">This is the first book by
<author>Gambardella, Matthew</author> which has the name
<title>XML Developer's Guide</title> on
<genre>Computer</genre>and is available at a price of
<price>$44.95</price>. It was published on
<publish_date>2000-10-01</publish_date>
in<place>London</place>
</book>
....
...
...
...
</catalog>
I need to extract the inner elements separately and also the text "This is the first book by Gambardella, Matthew which has the name XML Developer's Guide on Computer and is available at a price of $44.95. It was published on 2000-10-01 in London.
Could someone please guide me on how to do this? A code snippet would be more than welcome. Thanks in advance to anyone who replies.