Write Matrix in file faster
hi all i have 3050 x 3050 double matrix which is created through some calculations. This is a c application we got as a pilot project and we need to show them run faster. we have to first read this matrix from file which is CSV file and after some calculation write this back to file. Its taking more time in reading and writing the matrix. can any one give me a good algorithm for reading and writing this 3050 x 3050 double matrix. This is really a bottle neck for us, can any one help us.
Re: Write Matrix in file faster
Quote:
Originally Posted by
Vinod S
can any one give me a good algorithm for reading and writing this 3050 x 3050 double matrix.
There is only one algorithm: you read each value in sequence. If you want suggestions how to optimize your code, you'll have to post it first.
In the mean time, have a look at this thread: http://www.codeguru.com/forum/showthread.php?t=507380
Re: Write Matrix in file faster
Code:
// Open file to read (will fail if file does not exist)
if( NULL == ( fStream = fopen( cpFileName, "r" )))
{
perror( "Cannot open file to read : " ) ;
break ;
}
stFileOppend = 1 ;
stpData -> nRow = nRows ;
stpData -> nColumn = nColumns ;
stpData -> dbpMatrix = ( double ** ) malloc ( sizeof ( double * ) * nRows ) ;
nRow = 0 ;
while ( NULL != fgets( cpReadString, 6144, fStream ) && nRow < nRows )
{
stpData -> dbpMatrix[nRow] = ( double * ) malloc ( sizeof ( double ) * nColumns ) ;
nColumn = 0 ;
cpSubString = strtok(cpReadString, "," ) ;
//fValue = atof( Trim( cpSubString )) ;
//stpData -> fpMatrix[nRow][nColumn] = fValue ;
while ( NULL != cpSubString && nColumn < nColumns )
{
fValue = atof( Trim( cpSubString )) ;
stpData -> dbpMatrix[nRow][nColumn] = fValue ;
++nColumn ;
cpSubString = strtok( NULL, "," ) ;
}
++nRow ;
}
free ( cpReadString ) ;
This is the code to read matrix
Code:
// Open file to read (will fail if file does not exist)
if( NULL == ( fStream = fopen( cpFileName, "w" )))
{
perror( "Cannot open file to write : " ) ;
break ;
}
stFileOppend = 0 ;
for ( nRow = 0 ; nRow < stpMg1g1 -> nRow ; ++nRow )
{
//sprintf( cpFileContent, "" ) ;
for ( nColumn = 0 ; nColumn < stpMg1g1 -> nColumn ; ++nColumn )
{
if ( stpMg1g1 -> dbpMG1G1_data[nRow][nColumn] == 0 ) sprintf( cpWriteText, "0" ) ;
else
{
sprintf( cpWriteText, "%.8f", stpMg1g1 -> dbpMG1G1_data[nRow][nColumn] ) ;
TrimTrailingZeorWithDot( cpWriteText ) ;
}
//strcat( cpFileContent, cpWriteText ) ;
fputs( cpWriteText, fStream ) ;
if ( nColumn < stpMg1g1 -> nColumn - 1 )
{
//strcat( cpFileContent, "," ) ;
fputs( ",", fStream ) ;
}
}
//if ( nRow <= stpMg1g1 -> nRow - 1 )
{
fputs( "\n", fStream ) ;
//strcat( cpFileContent, "\n" ) ;
}
//fputs( cpFileContent, fStream ) ;
}
This is the code i am using to write can you specify a better method.
Re: Write Matrix in file faster
If you're really into speed, I'd suggest to migrate from CSV format to your own binary matrix file format. This would bypass parsing and conversion of the doubles between text and binary representations which certainly consume the vast majority of processing time. From your short description of the scenario, I can't tell whether this actually is an option, though.
If you get the original input file from an external source and you can't influence the format in which you get it, it may be worth to implement an extra step that converts the CSV representation into the binary matrix format. Of course that would only pay if you're going to read in that particular matrix multiple times, but then it would pay big-time...
I have no idea whether an approach like this has been suggested it that other thread linked to by D Drmmr as well - I'm simply too lazy to read all those 77 posts... However, chances are that it actually was suggested there, since it's a quite common approach in scenarios like this.
Re: Write Matrix in file faster
Quote:
Originally Posted by
Vinod S
Code:
while ( NULL != fgets( cpReadString, 6144, fStream ) && nRow < nRows )
That looks suspicious. Are you sure that 6142 characters is enough to represent a full row of 3050 values?
I didn't see any things in your code that would surely cause a big performance hit. However, some things that may help:
- Allocate the data for the entire matrix in one contiguous block, rather than one block for each row. You can allocate and keep an array of pointers to the first element in each row for fast random access using two indices.
- There's no need to trim a string before reading a value from it.
- It may be faster to keep a pointer to the point in the array where you want to store a value and increment the pointer in each iteration of the loop.
- When you write the matrix, you call fputs twice in each iteration. It may be faster to build a string for each line and write that.
Re: Write Matrix in file faster
Quote:
Originally Posted by
Eri523
I have no idea whether an approach like this has been suggested it that other thread linked to by D Drmmr as well - I'm simply too lazy to read all those 77 posts...
Post #2 contains all the relevant ideas, the rest are just details. ;)
Re: Write Matrix in file faster
Quote:
Originally Posted by
D_Drmmr
Post #2 contains all the relevant ideas, the rest are just details. ;)
Thanks. :)
The improvement OReubens reports as the result of applying his changes is awesome! :eek: However, I'm not sure whether overlapped I/O and double-buffering still would yield that much a gain, given disk performance and caching strategies of a modern system. Yet it's definitely worth a try, in particular if the CSV format is mandatory.