CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums
Results 1 to 3 of 3
  1. #1
    Join Date
    Sep 2014

    three way merging of xml documents

    I want to use Java to implement a Three way merge software for slx files. slx is a compressed file format containing mainly xml documents.
    The sizes of the xml files is pretty small, several of the files are around 1-2kB and the larger ones I have yet to come across any files larger than 200kB (assume 1MB to be sure).

    I want the software to be able to show the differences of the files and let the user choose parts from both documets. while automatically merging non-conflicting changes.

    What I can't figure out is what to use. As I understand I could use DOM, SAX, XSLT, XPath + XQuery to parse the documents. Trying to read up on these I have gotten this far:
    DOM - creates a tree structure in memory (feels intuitively correct) and allows you to manipulate this, big drawback is memory consumption since the tree can be a lot bigger than the original xml file. Even though intuitive people from xml conventions seem to agree that DOM is not recommended to anyone.
    SAX - Read the document once, is fast and fairly simple. Requires a lot of code (?). Is used in other three way merge tools such as 3dm. I don't find this intuitive at all, since I havn't understood what I acctually have to work with after this.
    ---- I realize that XSLT and XPath are Query languages whereas DOM is a model though I havn't really figure out what difference that does to me yet. Most of my work will be designing the algorithms to do the merge the document, what I really need is to get a reference the elements and attributes so I can compare these -----
    XSLT - Is what I recently got recommended to me, and it has its own merge function. This however will not support three way merge and will have to be done. Other than this I have no insight in XSLT.
    XPath + XQuery - Seems simple enough and fairly intuitive. Sort of like DOM in creating a tree to work with(?) but seems to be better with memory and quiete a lot of easy tutorials. However seems like a lot of work to use these in producing a Three way merge tool?

    As you can see I have never worked with any of these and a month ago I had no idea what an XML document nor what the difference in two way and three way merge was, so along the way it is likely I have missunderstood one or two things. Any recommendations to what I should choose and why would be much appretiated.

  2. #2
    Join Date
    Apr 2000
    Belgium (Europe)

    Re: three way merging of xml documents

    DOM: reads the entire xml in memory. you can then query the DOM via XPath. the advantage is that you can go back and forth as many times as you want. the disadvantage is that it needs memory for the entire DOM to be loaded.
    SAX: reads the xml linearly and informs/callsback into the reader each time it enounters a processable element.

    xslt: is a technology based on DOM and XPath which allows you to "transform" a DOM into a document the way you want it.

    xquery is something similar to xslt, but it uses a different approach, I'm not really familiar with it.

    3way merge means you either:
    process the inputs into partial output, then merge the partial output into the end result
    merge the inpurts into an intermediate, then process the intermediate into the end result.

    there's advantages and disadvantages to both, which is going to be easier depends on the technology path (sax or dom based), how different the inputs are and how the output is structures. too many variables to give you any kind of clear guidance.

    you don't have to use xslt or xquery, and instead you can make your own purpose specific code to extract the data you need for the output. It's sort of do you want to focus on using existing technologies, or do you want to get something done as fast as possible (just programming it all might be easier if you don't have to learn how to use how to use somethign like xslt before).
    you could even forgo DOM and SAX and xpath entirely and write your own xml parser (it's a lot harder than it seems, especially if you need to handle namespaces).
    there are other lightweight api's to load XML with simple means to query the contents without having to learn XPath (which is necesary for DOM based at least).

  3. #3
    Join Date
    Sep 2014

    Re: three way merging of xml documents

    I guess you are right, not the answer one was hoping for since it all seems so massive to get into at times. I don't want to get into writing my own parser... That seems to be way out of my league. Not being proficient in programming (yet) and I don't have any experience in handling different file formats, so just readinga about xml makes me very happy someone has done the grunt work for me. I found some really good material on xml that discusses almost everything I asked. It is an entire book but if anyone gets stuck with something similar I would recommend getting into it. Hopefully the next person has more programming knowledge, xml knowledge and maybe even some parsing experience and then dont have to read the hole thing.
    Anyways I figure I'd share this:
    I hope it will get me somewhere. Back to the book.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Click Here to Expand Forum to Full Width