Read binary file with line delimeter - Page 6
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 6 of 11 FirstFirst ... 3456789 ... LastLast
Results 76 to 90 of 156

Thread: Read binary file with line delimeter

  1. #76
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello again 2kaud and Paul,

    After build the project as Paul said, the speed is better, but it seems still is very slow compare with the small ruby code I mentioned in my first post. I'm not sure why, the ruby code processes a 2GB file in 7 min and a 20MB file ina few seconds, but I've tested the C++ code with a 20MB file and waiting for more than 10 min and only the 10% of the file was processed

    I don't have idea where to modify in order to enhance the processing speed in a good manner.

    Maybe you can have an idea.

    Thanks again.

  2. #77
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    You're correct. Thank you. I only did build (F7) and when I execute that *.exe file it runs faster.
    That doesn't indicate whether you built a release version.

    Go to the Configuration and change it to "Release". Then rebuild the application.
    Another question: Where should be removed the calls to the screen within visual c++ project?
    You remove it from your code. Whatever you write in the code that is what is going to be done. There is no hidden "write to screen" in the C++ language.

    Regards,

    Paul McKenzie

  3. #78
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    but it seems still is very slow compare with the small ruby code I mentioned in my first post. I'm not sure why, the ruby code processes a 2GB file in 7 min
    Realize that the persons who created the Ruby system are professional programmers. More than likely they used C++ to create Ruby. Compare that with your experience, which is beginner. There is very little chance at your level that you're going to create optimal code that matches anything an experienced C++ programmer can produce.

    For example, it seems you're reading one or two bytes at a time and then processing those one or two bytes. Maybe the bottleneck is that you're only reading one or two bytes at a time, when the speedup would be to read, say, 1 megabyte of data and then process that 1 meg of data in memory. But before doing anything, you should profile your code to figure out what is slow or fast, then you optimize that area of the code.

    Regards,

    Paul McKenzie

  4. #79
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,389

    Re: Read binary file with line delimeter

    The sample code I posted was not the most efficient way of doing it (as it reads one byte at a time from the file) but the easiest to explain and understand. I would be considering using Windows IO completion ports for reading blocks of data from the file - but this is an advanced Windows concept.

    If you want to stop processing when FF78 is found, a simple change to getField for this could be

    Code:
    bool FileFields::getField(bVec& field, WORD delim)
    {
    char	by;
    
    bool	cont = true;
    
    BYTE	next;
    
    	field.clear();
    
    	if (!opened || !ifs.good())
    		return false;
    
    	for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
    		if ((BYTE)by == HIBYTE(delim))
    			if ((next = (BYTE)ifs.peek()) == LOBYTE(delim))
    				cont = false;
    			else if (next == 0x78)
    				return false;
    
    		if (cont) 
    			field.push_back(by);
    	}
    
    	return true;
    }
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  5. #80
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,389

    Re: Read binary file with line delimeter

    The way the program has evolved means that there is a lot of probably unnecessary converting and copying going on - getField returns a vector of BYTE which is then converted into a string of hex. This all takes time. One simple performance improvement would be to have getField return the required hex string and remove conString. This would give you a class FileFields such as

    Code:
    #include <iostream>
    #include <fstream>
    #include <vector>
    #include <iomanip>
    #include <sstream>
    using namespace std;
    
    typedef unsigned char BYTE;
    typedef unsigned short int WORD;
    
    const char hconv[16] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
    
    #ifndef LOBYTE
    	#define LOBYTE(w)	((BYTE)((WORD)(w) & 0xff))
    #endif
    
    #ifndef HIBYTE
    	#define HIBYTE(w)	((BYTE)((WORD)(w) >> 8))
    #endif
    
    class FileFields
    {
    private:
    	ifstream	ifs;
    	bool		opened;
    
    public:
    	FileFields() : opened(false) {}
    
    	~FileFields() {
    		if (opened)
    			ifs.close();
    	}
    
    	bool open(const char* name);
    
    	bool getField(string& field, WORD delim = 0xFF77);
    };
    
    bool FileFields::open(const char* name) {
    	ifs.open(name, ios::binary);
    	return (opened = ifs.is_open());
    }
    
    bool FileFields::getField(string& field, WORD delim)
    {
    char	by;
    
    bool	cont = true;
    
    BYTE	next;
    
    	field = "";
    
    	if (!opened || !ifs.good())
    		return false;
    
    	for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
    		if ((BYTE)by == HIBYTE(delim))
    			if ((next = (BYTE)ifs.peek()) == LOBYTE(delim))
    				cont = false;
    			else
    				if (next == 0x78) {
    					cont = false;
    					ifs.setstate(ios::failbit);
    				}
    
    		if (cont) {
    			field += hconv[by >> 4];
    			field += hconv[by & 0xf];
    		}
    	}
    
    	return true;
    }
    and a new main of

    Code:
    int main()
    {
    FileFields	ff;
    
    	if (!ff.open("C:\\binary.txt")) {
    		cout << "Cannot open file!" << endl;
    		return 1;
    	}
    
    string block;
    
    	for (int blk = 0; ff.getField(block); blk++) {
    		if (blk == 0) continue;
    
    		string outstr;
    		OutString(block, outstr);
    		cout << outstr << endl;
    		//cout << "Block "<< blk << endl << block << endl;
    	}
    	//Only to pause display in Visual Studio
    	//string name;
    	//getline(cin, name);
    	///////////////////////////////////////////////////
    	return 0;
    }
    Last edited by 2kaud; October 22nd, 2013 at 05:01 PM. Reason: Changed terminating condition
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  6. #81
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,389

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    Hello again 2kaud and Paul,

    After build the project as Paul said, the speed is better, but it seems still is very slow compare with the small ruby code I mentioned in my first post. I'm not sure why, the ruby code processes a 2GB file in 7 min and a 20MB file ina few seconds, but I've tested the C++ code with a 20MB file and waiting for more than 10 min and only the 10% of the file was processed

    I don't have idea where to modify in order to enhance the processing speed in a good manner.

    Maybe you can have an idea.

    Thanks again.
    After changing the code as suggested above, then try commenting out the two lines OutString() and cout << outstring in main and then see how long it takes to run (you could uncomment the cout << blk line). If this speed is reasonable, then the bottleneck is the OutString function and you'll need to look at how to improve its performance. If the speed is still bad, then the problem is the byte reading from the file and you'll need to look at block reading, io completion ports etc to get the data from the file quicker.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  7. #82
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Quote Originally Posted by Paul McKenzie View Post
    Realize that the persons who created the Ruby system are professional programmers. More than likely they used C++ to create Ruby. Compare that with your experience, which is beginner. There is very little chance at your level that you're going to create optimal code that matches anything an experienced C++ programmer can produce.

    For example, it seems you're reading one or two bytes at a time and then processing those one or two bytes. Maybe the bottleneck is that you're only reading one or two bytes at a time, when the speedup would be to read, say, 1 megabyte of data and then process that 1 meg of data in memory. But before doing anything, you should profile your code to figure out what is slow or fast, then you optimize that area of the code.
    Hello Paul,

    Thanks I understand and I agree with what you say. Thanks for the suggestion about release build and how to do it.

    Quote Originally Posted by 2kaud
    After changing the code as suggested above, then try commenting out the two lines OutString() and cout << outstring in main and then see how long it takes to run (you could uncomment the cout << blk line). If this speed is reasonable, then the bottleneck is the OutString function and you'll need to look at how to improve its performance. If the speed is still bad, then the problem is the byte reading from the file and you'll need to look at block reading, io completion ports etc to get the data from the file quicker.
    Hello 2kaud,

    Thanks for the help and time! I've tested your last code. The speed of printing each block it looks a litler bit faster, but is printing
    another characters like "@", spaces or "", even when hconv is considering all numbers from 0 to F.
    Code:
    0000045320644501@93455@F@14773@0427F F F0015000A4800015A000B4200016000013300013600013700015B00016600016500017700016900006A0000790000 40001 30001220000
    2100010900010A00012600010800012B00002C00012D00012E00015500015600072A00002F0000300000310000 F7900@0193290600000000 0 0A0E5400 106@14725 F F F F F0019
    @00935906000000000000@0193C90600000000 0 0A0E5400 106@14725 F F F F F0019@0194390600000000 0 0A0E5400 106@14725 F F F F F001905 10F01020000000D@1
    47526905 F F F00 310010C0000000D@147526905 F F F0101 60F010C0000000D@1475269594F F F00 40E01020102010001 F F F02010201 50600000000000005@2000A01010000
    06001900000007@02E91800 1@1475269531F F F F00 1@1475269531F F F000103A030808 EB0A00000000000000000000C0101@11B90B00 1@1475269567F F F FA06000000
    000000B0103C0101
    Thanks for all help so far.

  8. #83
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,389

    Re: Read binary file with line delimeter

    It's the old one about signed/unsigned char! My default is unsigned char, yours is signed. That's why it worked for me. Use

    Code:
    			field += hconv[(BYTE)by >> 4];
    			field += hconv[(BYTE)by & 0xf];
    As per my post #81, what's the speed like if you comment out the OutString() and cout << outstring lines and uncomment the cout << blk line in main? This test will give the indication as to whether the speed problem is getField() or OutString().
    Last edited by 2kaud; October 23rd, 2013 at 09:58 AM.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  9. #84
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello 2kaud,

    Thank you for the (BYTE) correction, now look all the correct blocks and it seems something has been improved a lot, but please see my tests.

    Test #1:
    I've been trying without OutString, only printing 2 single substrings from each block and is processing the 20MB file in
    less than 2 seconds and gives me and output file of 1.4 MB with 61,055 blocks processed.

    Test #2:
    After that, I wanted to measure how does it take to do the same with an input file of 2GB, but gives me an output file of only
    209KB in 0.27 seconds and only 8,534 blocks were processed.

    So, it seems that is not handle it the 2GB file.

    So I changed the type of variable in the "for loop" from int to double but gives the same output for a 2GB file.

    The current code I have is:
    Code:
    #include <iostream>
    #include <fstream>
    #include <vector>
    #include <iomanip>
    #include <sstream>
    #include <stdio.h>
    #include <time.h>
    
    using namespace std;
    
    typedef unsigned char BYTE;
    typedef unsigned short int WORD;
    
    const char hconv[16] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
    
    #ifndef LOBYTE
    	#define LOBYTE(w)	((BYTE)((WORD)(w) & 0xff))
    #endif
    
    #ifndef HIBYTE
    	#define HIBYTE(w)	((BYTE)((WORD)(w) >> 8))
    #endif
    
    class FileFields
    {
    private:
    	ifstream	ifs;
    	bool		opened;
    
    public:
    	FileFields() : opened(false) {}
    
    	~FileFields() {
    		if (opened)
    			ifs.close();
    	}
    
    	bool open(const char* name);
    
    	bool getField(string& field, WORD delim = 0xFF77);
    };
    
    bool FileFields::open(const char* name) {
    	ifs.open(name, ios::binary);
    	return (opened = ifs.is_open());
    }
    
    bool FileFields::getField(string& field, WORD delim)
    {
    char	by;
    
    bool	cont = true;
    
    BYTE	next;
    	field = "";
    	if (!opened || !ifs.good())
    		return false;
    
    	for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
    		if ((BYTE)by == HIBYTE(delim))
    			if ((next = (BYTE)ifs.peek()) == LOBYTE(delim))
    				cont = false;
    			else
    				if (next == 0x78) {
    					cont = false;
    					ifs.setstate(ios::failbit);
    				}
    		if (cont) {
    			field += hconv[(BYTE)by >> 4];
    			field += hconv[(BYTE)by & 0xf];
    		}
    	}
    	return true;
    }
    int main(int argc, char* argv[])
    {
    FileFields	ff;
    
    clock_t start = clock();
    
    	//if (!ff.open("C:\\binary.txt")) {
            if (!ff.open(argv[1])) {
    		cout << "Cannot open file!" << endl;
    		return 1;
    	}
    string block;
    	for (double blk = 0; ff.getField(block); blk++) {
    		if (blk == 0) continue;
                    cout << block.substr(0,6) << "|"  << block.substr(6,16)<<endl;
    	}
    	printf("Time elapsed: %f\n", ((double)clock() - start) / CLOCKS_PER_SEC);
    
    	return 0;
    }
    I've made another test, thinking that maybe stops before end of file I commented the lines in red below:
    Code:
    bool FileFields::getField(string& field, WORD delim)
    {
    	char	by;
    
    	bool	cont = true;
    
    	BYTE	next;
    	field = "";
    	if (!opened || !ifs.good())
    		return false;
    
    	for (ifs.get(by); cont && ifs.gcount(); ifs.get(by)) {
    		if ((BYTE)by == HIBYTE(delim))
    		if ((next = (BYTE)ifs.peek()) == LOBYTE(delim))
    			cont = false;
    		/*else
    		if (next == 0x78) {
    			cont = false;
    			ifs.setstate(ios::failbit);
    		}*/
    		if (cont) {
    			field += hconv[(BYTE)by >> 4];
    			field += hconv[(BYTE)by & 0xf];
    		}
    	}
    	return true;
    }
    Doing that with an input of 2GB, this time the code processed 65,329 blocks but suddenly, after 5 seconds I got a window pop up saying "Readblock_program stop working". Even though in 5 seconds processed 65,329 blocks!

    I think it would be better not stop when FF78 be found, but it seems commenting that part as I did generate issues.

    Thanks again for the help.
    Last edited by Philidor; October 24th, 2013 at 12:50 AM. Reason: Result of last test

  10. #85
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    Quote Originally Posted by Philidor View Post
    Doing that with an input of 2GB, this time the code processed 65,329 blocks but suddenly, after 5 seconds I got a window pop up saying "Readblock_program stop working". Even though in 5 seconds processed 65,329 blocks!
    Then it's time to learn how to use the debugger. You can't really go any further if you don't step through the program and see where it goes wrong. That error is more than likely caused by an illegal array access.

    If you're using Visual C++, run the program in the debugger. The program will run until it "crashes", which will display the debugger with the entire call stack of where the crash occurs. Then you go to that line of code in the debugger and see what the variables are and in what state they're in to cause the crash.

    If you don't do this, then it will be an endless thread of "try this, try that". Instead of that, know what the problem is by using the debugger, and address the problem directly.

    Regards,

    Paul McKenzie

  11. #86
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,389

    Re: Read binary file with line delimeter

    What OS are you using and what compiler? According to sources, it looks like the limit for a file size using c++ iostreams for older versions of MSVC is 2GB which has been corrected for VS2012 (apparantly it used signed 32 bit int for file size/position which has been changed to unsigned int). This might explain your 2GB problem if you're not using VS2012.

    If you are using MSVC before 2012 with these file sizes it might be better to use Windows file handling routines rather than the c++ ones.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  12. #87
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,389

    Re: Read binary file with line delimeter

    If you are using MSVC and want to try using Windows file management, a possible way could be

    Code:
    #include <iostream>
    #include <iomanip>
    #include <sstream>
    using namespace std;
    
    #include <windows.h>
    
    typedef unsigned char BYTE;
    typedef unsigned short int WORD;
    
    const char hconv[16] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
    
    #ifndef LOBYTE
    	#define LOBYTE(w)	((BYTE)((WORD)(w) & 0xff))
    #endif
    
    #ifndef HIBYTE
    	#define HIBYTE(w)	((BYTE)((WORD)(w) >> 8))
    #endif
    
    class FileFields
    {
    private:
    	HANDLE		ifs;
    	bool		good,
    			opened;
    
    public:
    	FileFields() : opened(false), good(false) {}
    
    	~FileFields() {
    		if (opened)
    			CloseHandle(ifs);
    	}
    
    	bool open(const char* name);
    
    	bool getField(string& field, WORD delim = 0xFF77);
    };
    
    bool FileFields::open(const char* name) {
    	return good = opened = ((ifs = CreateFile(name, FILE_READ_DATA, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL))!= INVALID_HANDLE_VALUE);
    }
    
    bool FileFields::getField(string& field, WORD delim)
    {
    BYTE	by,
    	next;
    
    bool	cont = true;
    
    DWORD	read = 0;
    
    	field = "";
    
    	if (!opened || !good)
    		return false;
    
    	for (ReadFile(ifs, (LPVOID)&by, 1, &read, NULL); cont && read; ReadFile(ifs, (LPVOID)&by, 1, &read, NULL)) {
    		if (by == HIBYTE(delim)) {
    			ReadFile(ifs, (LPVOID)&next, 1, &read, NULL);
    			if (read) {
    				SetFilePointer(ifs, -1, 0, FILE_CURRENT);
    				if (next == LOBYTE(delim))
    					cont = false;
    				else
    					if (next == 0x78)
    						cont = good = false;
    			}
    		}
    
    		if (cont) {
    			field += hconv[by >> 4];
    			field += hconv[by & 0xf];
    		}
    	}
    
    	if (good)
    		good = (read != 0);
    
    	return true;
    }
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

  13. #88
    Join Date
    Apr 1999
    Posts
    27,427

    Re: Read binary file with line delimeter

    If the problem is with the 2G file limitation that 2kaud mentioned, the other suggestion is to create a 64-bit program (if it is OK for you to create a 64-bit prog).

    The C++ stream classes for Visual C++ will handle files greater than 2Gig, but only if you choose the 64-bit compile option.

    Regards,

    Paul McKenzie

  14. #89
    Join Date
    Oct 2013
    Posts
    63

    Re: Read binary file with line delimeter

    Hello Paul,

    But create 64 bit program once compiled, will run on a 32 bit machine?

    And yes, for those things I can and know, I'm investigating from my side, reading from here and form there, trying to see where to enhanced the code in order to finish this thread as soon as possible since you both have helped me a lot. Thanks so much for that. I really don't want to be a bother.

    Hello 2kaud,

    I'm testing on Windows 7 with MSVS 2013 Express in a 64 bit machine, but I'd like to be able to run the code in a 32 bit machine if it is possible.

    I've tested your last code, and it seems the speed goes down.

    With an input of 20 MB before took less than 2 seconds, now takes 189 seconds.
    With an input of 2GB before were processed 8,534 blocks in 0.27 seconds, now was the same 8,534 blocks were processed but in 15.4 seconds.

    Thanks again for the great help.

    Regards
    Last edited by Philidor; October 24th, 2013 at 02:08 PM.

  15. #90
    Join Date
    Dec 2012
    Location
    England
    Posts
    2,389

    Re: Read binary file with line delimeter

    Yes - but are you still getting the error reported in post #84 or does it now run to completion correctly?
    Last edited by 2kaud; October 24th, 2013 at 03:11 PM.
    All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.

Page 6 of 11 FirstFirst ... 3456789 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width

This is a CodeGuru survey question.


Featured


HTML5 Development Center