Click to See Complete Forum and Search --> : Bad performance validating an XML file with XML Schemas.


miriamca
September 30th, 2005, 07:40 AM
Hi all,

I have been trying xerces 2.6.0 (xerces C++) validating some XML files against XML Schemas. I have realized that xerces performance decreases a lot when you reference bigger XML Schemas.

For simplicity, I have done some test with SAXPrint sample distributed with xerces and try with different XML files and different XML Schemas. Some XML Schemas are about 100-300 lines and there is not a problem when parsing the XML file and validating it. But I have an XML Schema that is about 1100 lines (a little bit bigger) and it takes an hour to execute SAXPrint example. The XML files that I have are all small.

These are the features I have set to execute SAXPrint.exe:

SAXPrint.exe -v=always -n -s "myFile.xml"

-v=xxx Validation scheme [always | never | auto*].
-n Enable namespace processing.
-s Enable schema processing.

Has anybody know why I get such a poor performance when I use bigger XML Schemas?. Any idea to improve it?.

Thanks in advance.

Benjay
September 30th, 2005, 10:09 AM
Try using the SCMPrint project and run the test. It's specifically designed to read schemas instead of plain XML files and I suspect (without having tested it myself) that it may have better performance results with complex schemas.

That being said, I have noticed performance issues with the Xerces library and complex schemas when I'm using code very similar to that found in the SCMPrint example. It takes up to 12 minutes to load the grammar for the schema I am parsing. However, the schema I'm parsing is very complex (many thousand lines), so I think some delay is to be expected as it loads the schema into memory.

Does anyone know if it's possible to validate an xml file against a schema (using Xerces) without loading the entire schema into memory all at once?

miriamca
October 5th, 2005, 06:13 AM
Thank you for you replay. I tried what you told me. I did some tests with SCMPrint project, specifying my Schema (the biggest I have) and these are my results.

If I just specify the Schema file, for example:

SCMPrint.exe "mySchema.xsd"

It prints the Schema immediately!!. But when I specify -f option (Enable full schema constraint checking processing), it takes an hour to print it. It seems to me this is the same effect I have when I am parsing the xml validating against the Schema. This is extrange because "full schema checking" feature is not enable when I parse my XML file validating against the Schema. Maybe there is a kind of bug in xerces.

Any clue or idea will be welcome.