Passing an STL array by reference - help
how can I pass a matrix as a reference parameter?
I am using the following declarations:
Code:
typedef std::vector< std::vector<std::string> > ss_matrix_t;
I declare the matrix with the following statement, where nRows and nCols are integers
Code:
std::vector< std::vector<std::string> > vI2Matrix(nRows, std::vector<std::string>(nCols,""));
The function is called with:
Code:
int read_files(std::string fname, int nCols, int nRows, ss_matrix_t &ssMat )
but I get a linker error:
error LNK2019: unresolved external symbol "int __cdecl read_splayed_files(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,int,int,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,class std::vector<class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,class std::allocator<class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > > >)" (?read_files@@YAHV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@HHV?$vector@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$allocator@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@@2@V?$vector@V?$vector@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$allocator@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@@std@@V?$allocator@V?$vector@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$allocator@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@@std@@@2@@2@@Z) referenced in function "int __cdecl readDatafile(void)" (?readDatafile@@YAHXZ)
1>C:\PROJECTS\cppVS2010\Harness\Debug\Harness.exe : fatal error LNK1120: 1 unresolved externals
I suspect the syntax of the declaration, but I am not sure what to do here?
If I change the call to the function, then the array ( matrix ) is passed by value, and it takes forever:
Code:
int read_files(std::string fname, int nCols, int nRows, ss_matrix_t ssMat )
// this takes ages ( it does compile and link )
How can this be resolved?
Any advice appreciated
Jefe
Re: Passing an STL array by reference - help
Perhaps you should check that the forward declaration of your read_files function matches the definition. If you can find no problem with that then post the smallest and simplest program that demonstrates this error.
By the way, you might as well make the fname parameter a const std::string&.
Re: Passing an STL array by reference - help
Maestro, thanks, the forward reference was the culprit there!
as an aside, I erase that matrix when I have finished with it, and it is taking a real long time ... 600 seconds !
Code:
JournalLog("Immediate before matrix.erase(begin, end) ");
std::vector<std::vector<std::string> >::iterator row1 = vI2Matrix.begin();
std::vector<std::vector<std::string> >::iterator row2 = vI2Matrix.end();
vI2Matrix.erase(row1, row2);
JournalLog("Immediate before return from readDatafile() ");
is there a way to spped this erase operation?
the declaration of the matrix is, again:
Code:
std::vector< std::vector<std::string> > vI2Matrix(nRows, std::vector<std::string>(nCols,""));
Thanks again
Jefe
Re: Passing an STL array by reference - help
I suppose you could try swapping with an empty vector, but it may or may not make a difference:
Code:
std::vector<std::vector<std::string> >().swap(vI2Matrix);
Re: Passing an STL array by reference - help
Quote:
Originally Posted by
jefe9
as an aside, I erase that matrix when I have finished with it, and it is taking a real long time ... 600 seconds !
Did you time a fully optimized build (without iterator checking)?
What are the characteristics of your data? How many rows and columns, what type of strings are stored?
Re: Passing an STL array by reference - help
Quote:
Originally Posted by
jefe9
Maestro, thanks, the forward reference was the culprit there!
as an aside, I erase that matrix when I have finished with it, and it is taking a real long time ... 600 seconds !
Unless your compiler implementation of std::string and std::vector is a bunch of crap, in no way should it take 600 seconds if you run an optimized build. Note the bold text.
If you're running an unoptimized, "debug" build, then you cannot use that as an indication of how fast or slow the erase will perform.
Also, exactly what compiler and version are you using?
Quote:
is there a way to spped this erase operation?
There is nothing to do except to make sure you're running an optimized build. Again, no decent C++ compiler would have such a slow implementation, unless you're running an iterator-checked, unoptimized build, and not an optimized build.
Regards,
Paul McKenzie
Re: Passing an STL array by reference - help
The compiler is the MS VS-2010 express version;
The number of rows is slightly over 1million = 1,118,135 rows
The number of columns is 5
The release non-debug optimised build does see a significant performance improvement ie 13 seconds for my code , ..., and 225 seconds for the erase() of that matrix.
I wrote this code for a performance improvement to some open source software, and the debug version was marginal, but the release version is excellent; except for this nasty sting in the tail ... 225 seconds to erase that matrix is still too unpleasant.
I have tried looking at making the strings fixed size; in the hope that would have an improved performance, but have found no ideas there, the syntax of C strings of a pointer to the first element of an array almost ignores the possibility of a fixed string usage?
Any suggestions very welcome.
Thanks
Re: Passing an STL array by reference - help
Quote:
Originally Posted by
jefe9
The compiler is the MS VS-2010 express version;
The number of rows is slightly over 1million = 1,118,135 rows
The number of columns is 5
The release non-debug optimised build does see a significant performance improvement ie 13 seconds for my code , ..., and 225 seconds for the erase() of that matrix.
Well, erase() can't do magic. If you have that many rows, then what routine(s) do you think exist that can make the deallocation any faster?
You're accessing the heap, and any massive amount of allocation would cause this issue, STL or no STL. You are eventually allocating 5 * sizeof(std::string) for each row. Multiply that by sizeof(std::vector<std::string>) * 1,118,135.
So the total number of bytes allocated just to maintain this data is very large and that is only for the matrix alone, without even allocating the data needed for the vector or string's contents. When the vector is destroyed, you have to deallocate the memory used by over 5 million strings. Now do you think anything you can think of can make those deallocations fast?
Simply stated, you need to rethink your design if you have this many strings.
Regards,
Paul McKenzie
Re: Passing an STL array by reference - help
A reasonable point Paul, thank you.
Although the Matrix is allocated in less than 13 seconds; I am surprised why there is such an imbalance between creation of the matrix, assigning all the 5 million plus contents; all in less than 13 seconds.
Then, 225 seconds only to deallocate, why this large imbalance ( 13 to 225 ) ?
Thanks
Jefe
Re: Passing an STL array by reference - help
Quote:
Originally Posted by
jefe9
Then, 225 seconds only to deallocate, why this large imbalance ( 13 to 225 ) ?
Don't the strings contain data? Are you factoring in that when the strings are created, they are empty?
When you add data to the string, unless your std::string class has short-string optimization, an additional allocation must occur for the new data.
Code:
#include <string>
#include <vector>
using namespace std;
int main()
{
std::vector<std::string> s(10); // 10 *empty* strings are created
for (size_t i = 0; i < 10; ++i )
s[i] = "abcxyz123"; // an additional allocation may now occur
}
So just because allocation takes a short time doesn't translate into how fast the deallocation will occur. If you want proof, declare a vector of empty strings, and then destroy the vector of empty strings immediately afterwards.
Code:
#include <string>
#include <vector>
int main()
{
typedef std::vector< std::vector<std::string> > ss_matrix_t;
{
ss_matrix_t temp(1000000, std::vector<std::string>(5)); // declare a huge matrix
} // matrix will be destroyed here
} // how long did it take to get here?
Run this in an unoptimized build -- yes, unoptimized. Now how long did it take to create that local vector and when you reach the ending curly brace in main()?
Regards,
Paul McKenzie
Re: Passing an STL array by reference - help
You have just re-created how I discovered the deallocation problem.
The strings in my program are initialised with zero lenth strings. Data are assigned to each individual element, for each of the 5 million+ items. At this stage:
Column-0 contains 1.1 million strings of length 5 chars
Column-1 contains 1.1 million strings of length 21 chars
Column-2 contains 1.1 million strings of length 7 chars
Column-3 & 4 contains 1.1 million strings each of length 9 chars
this all took just a few seconds.
Every element of the array contains real data
Then, when I had finished using the data, the program took ages before terminating;
The code I wrote above illustrating the matrix erase(begin, end) was the code I wrote to demonstrate that it was the deallocation that took over 90% of the time in the program.
I see a huge imbalance in creation, assigning data to 5.5 million strings; and the subsequent deallocation which takes 95% of the time.
Re: Passing an STL array by reference - help
Quote:
Originally Posted by
jefe9
You have just re-created how I discovered the deallocation problem.
So start with what I wrote. Does this demonstrate the problem? If not, then add code to the code I wrote above that duplicates the problem.
Quote:
The strings in my program are initialised with zero lenth strings. Data are assigned to each individual element, for each of the 5 million+ items. At this stage:
Column-0 contains 1.1 million strings of length 5 chars
Column-1 contains 1.1 million strings of length 21 chars
Column-2 contains 1.1 million strings of length 7 chars
Column-3 & 4 contains 1.1 million strings each of length 9 chars
One of the things I do not want to do is to always assume that what is described in words is what is actually being done. Too many times we have posters claiming they have done "this or that", but when we actually see the code, they either didn't do "this or that", or they have done something totally wrong or different or additional to what they've described.
That's why I like to start on a consistent baseline. Take the code I wrote, add to it, and then show us that this code is the one that duplicates the problem. Then everyone will be able to run the exact same code on different versions of the compiler (and even other compilers) to see the results.
Regards,
Paul McKenzie
Re: Passing an STL array by reference - help
OK, I took my original code and ran it against VC++ 2008 with optimizations on and _SECURE_SCL=0 defined. The allocation indeed takes longer than the deallocation, but only because the deallocation code seems to have lost the fact that it efficiently allocated the memory.
The cause seems to be that the loop to deallocate each and every vector is done -- more like an excessive/unnecessary looping issue than a true blue memory deallocation issue. I didn't investigate any further, only that I debugged the vector source code and realized that much more work (either excessive or unnecessary) was being done when deallocating (you could have debugged the vector code also to see what the difference was).
The following code seems to go much faster:
Code:
#include <string>
#include <vector>
int main()
{
typedef std::vector< std::string > ss_matrix_t;
{
ss_matrix_t temp(5000000); // declare a huge matrix
}
}
Instead of a 2 dimensional array, a 1 dimensional array is allocated that is equal to the number of rows * number of columns. All you need to do is wrap this into a class, and instead of using [][] to access each element, overload operator () to have two arguments denoting the row and column to access.
Code:
#include <string>
#include <vector>
#include <iostream>
template <typename T>
class Matrix
{
public:
Matrix(unsigned rows, unsigned cols) : rows_(rows), cols_(cols),
data_(rows * cols) {}
T& operator() (unsigned row, unsigned col) { return data_[cols_* row + col]; }
T operator() (unsigned row, unsigned col) const {return data_[cols_*row + col]; }
private:
unsigned rows_, cols_;
std::vector<T> data_;
};
using namespace std;
int main()
{
{
Matrix<string> StringMatrix(1000000, 5);
// test
StringMatrix(10,3) = "abc123";
cout << StringMatrix(10,3);
}
}
Note that the deallocation when the end curly brace is encountered is very fast.
The code above is derived from this example from the C++ FAQ:
http://www.parashift.com/c++-faq/mat...script-op.html
Also read as to why operator () has its advantages over [][]:
http://www.parashift.com/c++-faq/mat...subscript.html
Also, your original code that erases -- why didn't you just call clear() instead of erase()?
Code:
// std::vector<std::vector<std::string> >::iterator row1 = vI2Matrix.begin();
// std::vector<std::vector<std::string> >::iterator row2 = vI2Matrix.end();
// vI2Matrix.erase(row1, row2);
//
v12Matrix.clear(); // why not just this?
I'm not saying this is any faster (it may be), but you don't need all of that previous code that calls erase() to empty a vector.
Finally, the title of your thread: Passing an STL array by reference - help:
There is an STL class called std::array that is different than a vector. Need to make sure that you don't confuse a std::vector with std::array.
Regards,
Paul McKenzie
Re: Passing an STL array by reference - help
For the deallocation speed, did you try running your program outside of the IDE? Just open a CMD window, go to the release directory and run it from there. The thing is that Visual Studio since 2003 also debugs memory in release builds when it's run trough the IDE. This causes sometimes problems with STL objects since they tend to rely heavily on new/delete.
http://forums.codeguru.com/showthrea...077#post859077
Re: Passing an STL array by reference - help
Quote:
Originally Posted by
Yves M
For the deallocation speed, did you try running your program
outside of the IDE? Just open a CMD window, go to the release directory and run it from there. The thing is that Visual Studio since 2003 also debugs memory in release builds when it's run trough the IDE. This causes sometimes problems with STL objects since they tend to rely heavily on new/delete.
http://forums.codeguru.com/showthrea...077#post859077
Hi Yves,
I did run it by choosing the "Start Without Debugging", so I will try completely outside the IDE and see if this makes a difference.
It did seem strange that deallocation took much more time than allocation, and only because of an odd looping issue and not that the deallocation was calling the heap manager many times.
Regards,
Paul McKenzie