CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 5 of 5
  1. #1
    Join Date
    Aug 2007
    Posts
    43

    Newline parsing?

    I'm creating an XML parser (please don't suggest any. The point is, I'm making it myself), and I realized that while using fseek, you scrawl through the exact bytes of a file, but when you fread, 0x0A0D (newline character.. for some reason it's 2 bytes long) gets parsed to 0x0A.. it's very annoying, because the file MIGHT have just had 0x0A (not 0x0A0D) so I can't just add 1 to what I'm fseeking if I encounter a 0x0A in my parsed data, but I can't ignore it because that throws it off. Is there any way around this newline parsing problem?

  2. #2
    Join Date
    Jul 2008
    Location
    dalian, China
    Posts
    36

    Re: Newline parsing?

    Is all the newline character is the 0x0a in your parse data file ?

    try regarding the 0x0a as the token!
    Cigagou,Cogitou!

  3. #3
    Join Date
    Aug 2007
    Posts
    43

    Re: Newline parsing?

    err I'm not sure what you meant, but parsing it as a binary file solved it.

    so instead of
    Code:
    fopen_s(&file,"filename","mode")
    do
    Code:
    fopen_s(&file,"filename","modeb");
    (not a typo, add a 'b' at the end to make it binary. 't' for text)..

    oh and I use fopen_s.. but for those of you who use fopen:

    Code:
    file = fopen(filename,mode);
    becomes

    Code:
    fopen_s(&file,filename,mode);
    so yeah..

  4. #4
    Join Date
    Mar 2002
    Location
    St. Petersburg, Florida, USA
    Posts
    12,125

    Re: Newline parsing?

    This is standard "C" functionallity, dating back at least 27 years....

    The text mode processing is important because different systems use different mechanisms to differentiate lines in a text file. There are 3 common patterns:

    1) 0x0D,0x0A // <CR><LR>
    2) 0X0D // <CR>
    3) 0X0A // <LF>

    The use of a double character sequence dates back to older printers. <CR> moved the print head back to the left, <LF> rolled the platten one line. The two items were completely independant and had to be performed in sequence.

    btw: you always did <OD><OA> and not <OA><OD> because of performance. It typically took longer for the head to move from the right margin back to the left than it took the platten to advance. This order allowed the two mechanical operations to overlap better. Yes it really mattered in the days of 110Baud (0.0000110 Mhz).

    However in a computer program, the extra character meant more memory. (Especially when a computer with 16K was considered extremely large, costing well over $100K - and that is before the last 40 years of inflation!)

    Since things were done in one of the two styles the "t" (text) mode was designed to convert input to the most optimal (only removing the 0x0D if it was followed by a 0x0A) for processing and making the output (inserting a 0x0D before the first 0x0A in a sequence) most applicable for printing (remember no CRT's in those days for most machines!).
    TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
    2008, 2009,2010
    In theory, there is no difference between theory and practice; in practice there is.

    * Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions
    * How NOT to post a question here
    * Of course you read this carefully before you posted
    * Need homework help? Read this first

  5. #5
    Join Date
    Aug 2007
    Posts
    43

    Re: Newline parsing?

    ah lol

    oh you're right it's 0x0D0A. ah well, I got it.

    thanks everybody

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured