-
July 11th, 2008, 11:19 PM
#1
Newline parsing?
I'm creating an XML parser (please don't suggest any. The point is, I'm making it myself), and I realized that while using fseek, you scrawl through the exact bytes of a file, but when you fread, 0x0A0D (newline character.. for some reason it's 2 bytes long) gets parsed to 0x0A.. it's very annoying, because the file MIGHT have just had 0x0A (not 0x0A0D) so I can't just add 1 to what I'm fseeking if I encounter a 0x0A in my parsed data, but I can't ignore it because that throws it off. Is there any way around this newline parsing problem?
-
July 12th, 2008, 12:09 AM
#2
Re: Newline parsing?
Is all the newline character is the 0x0a in your parse data file ?
try regarding the 0x0a as the token!
Cigagou,Cogitou!
-
July 12th, 2008, 12:24 AM
#3
Re: Newline parsing?
err I'm not sure what you meant, but parsing it as a binary file solved it.
so instead of
Code:
fopen_s(&file,"filename","mode")
do
Code:
fopen_s(&file,"filename","modeb");
(not a typo, add a 'b' at the end to make it binary. 't' for text)..
oh and I use fopen_s.. but for those of you who use fopen:
Code:
file = fopen(filename,mode);
becomes
Code:
fopen_s(&file,filename,mode);
so yeah..
-
July 12th, 2008, 12:44 AM
#4
Re: Newline parsing?
This is standard "C" functionallity, dating back at least 27 years....
The text mode processing is important because different systems use different mechanisms to differentiate lines in a text file. There are 3 common patterns:
1) 0x0D,0x0A // <CR><LR>
2) 0X0D // <CR>
3) 0X0A // <LF>
The use of a double character sequence dates back to older printers. <CR> moved the print head back to the left, <LF> rolled the platten one line. The two items were completely independant and had to be performed in sequence.
btw: you always did <OD><OA> and not <OA><OD> because of performance. It typically took longer for the head to move from the right margin back to the left than it took the platten to advance. This order allowed the two mechanical operations to overlap better. Yes it really mattered in the days of 110Baud (0.0000110 Mhz).
However in a computer program, the extra character meant more memory. (Especially when a computer with 16K was considered extremely large, costing well over $100K - and that is before the last 40 years of inflation!)
Since things were done in one of the two styles the "t" (text) mode was designed to convert input to the most optimal (only removing the 0x0D if it was followed by a 0x0A) for processing and making the output (inserting a 0x0D before the first 0x0A in a sequence) most applicable for printing (remember no CRT's in those days for most machines!).
TheCPUWizard is a registered trademark, all rights reserved. (If this post was helpful, please RATE it!)
2008, 2009,2010
In theory, there is no difference between theory and practice; in practice there is.
* Join the fight, refuse to respond to posts that contain code outside of [code] ... [/code] tags. See here for instructions
* How NOT to post a question here
* Of course you read this carefully before you posted
* Need homework help? Read this first
-
July 12th, 2008, 11:01 AM
#5
Re: Newline parsing?
ah lol
oh you're right it's 0x0D0A. ah well, I got it.
thanks everybody
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|