allow literal string only
is it possible to create a constructor that takes as parameter ONLY a literal string?
I have a class that is designed to be used with a literal string only. This requirement allows us to simplify what the constructor does by assumption that we can 'store' the string only by copying the pointer, without having to copy the contents of the string.
Code:
class Foo
{
public:
Foo(LPCSTR lpsz) m_str(lpsz) {}
LPCSTR m_str;
}
This requirement can't be changed, this is both a performance issue, and security issue in that the string is assumed to be located in readonly memory.
There's currently a big warning plastered on the function prototype saying to use it ony with a literal, but this has been "overlooked" several times in the past resulting in hard to track down issues.
Ideal would be if we can get the compiler to simply reject everything other than a literal.
so in short:
Code:
Foo("this is ok");
static const char sz[] = "this is also ok";
Foo(sz);
char sz2[] = "Modifiable strings are not ok";
Foo(sz2); // should give compiler error
char sz3[255];
strcpy(sz3, "C-style string is not ok");
Foo(sz3); // should give compile error
CString str("MFC string is not ok");
Foo(str); // should give compile error
std::string str2("STL string is not ok");
Foo(str2); // should give compile error
Is this possible ?
Re: allow literal string only
Quote:
Is this possible ?
AFAIK, no. A char pointer is a char pointer... nothing more, nothing less. Doesn't matter where it comes from, it is still a char pointer.
Re: allow literal string only
Human beings are different from machine, this is about programming right ?
it's type mismatches, coders will cast them themselves or just feel pleased to use default settings by MSVC IDE (no more use of "sha" from the start)
Re: allow literal string only
Quote:
Originally Posted by
TheComputer
Human beings are different from machine, this is about programming right ?
it's type mismatches, coders will cast them themselves or just feel pleased to use default settings by MSVC IDE (no more use of "sha" from the start)
What? :confused:
Re: allow literal string only
Quote:
Originally Posted by OReubens
Ideal would be if we can get the compiler to simply reject everything other than a literal.
Is this possible ?
I agree with Skizmo that this sounds impossible. However, you could cut down on possible careless mistakes by overloading for a constructor that takes an LPSTR, and then use an assertion to check that this constructor is never invoked. It won't detect cases where the constructor is invoked with an array of const char that is not a string literal, of course, but it would detect two of those cases that you listed. (The last two cases should already be compile errors, right?)
Quote:
Originally Posted by Skizmo
AFAIK, no. A char pointer is a char pointer... nothing more, nothing less. Doesn't matter where it comes from, it is still a char pointer.
That said, in terms of type a string literal is not a char pointer, but an array, which is converted to a char pointer. But your reasoning seems sound to me otherwise. I considered using this fact with a template constructor that takes an array by reference, but passing a pointer could be the right thing to do if that pointer points to the first element of a string literal, and it still will not do anything about arrays of const char that are not string literals, as you reasoned.
Re: allow literal string only
I played around alittle and at least in VC2008 you can get a bit more safety by adding a Foo(char*) with nothing more than an assert in the body. sz2 & sz3 will be catched but not all variants for CString or std::string since the access operators return a const char*.
However, some additional safety is better than nothing I guess?
Edit: Thanks laserlight for saving me the need of double-checking the standard to back up my post... :)
Re: allow literal string only
as other said it seems impossible with the use of the sole type system ( unless compiler intrisics exist enabling static storage detection ... );
maybe you can use the preprocessor somehow, something like
Code:
#define LITERAL(s) Foo::please_dont_use_me(""s"")
#define LITERAL_TYPE please_dont_use_me
class Foo
{
public:
struct please_dont_use_me
{
explicit please_dont_use_me( const char* str ): value(str) {}
const char* value;
};
Foo( LITERAL_TYPE str ): str_( str.value ) {}
private:
const char* str_;
};
Foo x( LITERAL("ok") );
in this way, if you construct a Foo with something that has not the form LITERAL("...") an error should occur ( more specifically, it seems improbable someone accidentally instances a Foo with a Foo :: please_dont_use_me object ... ).
EDIT: thinking about it, the following might be more expressive ( but roughly equivalent ):
Code:
#define LITERAL(s) Foo::please_dont_use_me(""s"")
class Foo
{
struct literal
{
explicit literal( const char* str ): value(str) {}
const char* value;
};
public:
typedef literal please_dont_use_me;
Foo( literal str ): str_( str.value ) {}
private:
const char* str_;
};
Foo x( LITERAL("ok") );
BTW, note that it wouldn't protect you against expressions like "Foo x( LITERAL("ok"+"oops") );" and similar ...
Re: allow literal string only
As pointed above, there's no way to detect this at a compilation time, but it doesn't mean that you can't do it at a runtime though. As far as I know the latest Operating Systems (since Vista) are employing a different (i.e. more secure) memory management strategy, as well as a different approach to loading executables. And, this means that you can use it for the purpose of your question. So the whole idea is to test whether a string is writable in memory and if so, it will mean that it is not a static string. In other words, anything that is not declared static is allocated either on the stack or from a heap and is writable.
OK, so here's the code:
Add the following in the declaration of one of your globally accessible classes:
Code:
static __forceinline void DebuggerTestConstStringPtr(LPCTSTR pString, int nCodeLine, LPCSTR pCodeFile)
{
// #ifdef _DEBUG
__try
{
BOOL bNonWritableMem = ::IsBadWritePtr((LPVOID)pString, (lstrlen(pString) + 1) * sizeof(TCHAR));
ASSERT(bNonWritableMem == TRUE); //Assertion here means you passed a non-static string
if(bNonWritableMem != TRUE)
{
TRACE(_T("Non-static string passed on line %d in %s\n"), nCodeLine, pCodeFile);
}
}
__except(1)
{
ASSERT(FALSE); //Bad string pointer
TRACE(_T("Bad string passed on line %d in %s\n"), nCodeLine, pCodeFile);
}
// #endif
}
and then you can simply call this method from anywhere in your code:
Code:
static const TCHAR sz[] = _T("this is also ok");
CGlobalClass::DebuggerTestConstStringPtr(sz, __LINE__, __FILE__);
CGlobalClass::DebuggerTestConstStringPtr(_T("This is OK"), __LINE__, __FILE__);
TCHAR sz2[] = _T("this is not ok");
CGlobalClass::DebuggerTestConstStringPtr(sz2, __LINE__, __FILE__); //ASSERTs!
CString str("MFC string is not ok");
CGlobalClass::DebuggerTestConstStringPtr(str, __LINE__, __FILE__); //ASSERTs!
std::string str2("STL string is not ok");
CGlobalClass::DebuggerTestConstStringPtr(str2.c_str(), __LINE__, __FILE__); //ASSERTs!
NOTE that you have to run it on Vista or Windows 7 (and possibly compile with a newer version of Visual Studio) to be able to track writable memory. It is also not advisable to keep the IsBadWritePtr API in your distribution build, since that API is not safe (read MSDN for more into). You may also want to skip using the /NXCOMPAT, and /DYNAMICBASE compiler directives.
Re: allow literal string only
Ahmd:
The runtime check is unacceptable because of performance issues. Checking for a read segment is considerably more of a performance penalty than copying the string contents, and even that is already too much of an issue. This solution also means we need to specifically test each instance (which I'm trying to avoid), and can't suffise with a single unit-test.
Superbonzo:
interesting twist, but it doesn't work. For one it requires the caller to explicitely use the macro. Which means we'd have to change a LOT of existing code, meaning a lot of regression testing as well. For two :p, nothing prevents a programmer from writing:
Foo x( LITERAL(strMfcString)); // or stl string, or buffer or anything that has a conversion to LPCSTR.
Skizmo:
A char pointer is a char pointer. But I'm interested in a way to make the function so it does NOT accept a char pointer. LPCSTR is too generic for what i'm looking for, I need a more specific 'something', that isn't a char pointer.
FWIW as laserlight pointed out, a literal string is not a char pointer, it just gets implicitely converted to a char pointer by the compiler to match the function prototype. I can even pass any object I want as a parameter to Foo, as long as that object has a LPCSTR cast operator.
Code:
class Bar
{
public:
Bar() { }
operator LPCSTR() { return "Bar"; }
};
Bar b;
Foo(b); // Valid with Foo taking a LPCSTR parameter
SMA: Adding a char* variant would again mean extra regression testing and requirement to test each instance. But it does pop an idea into my head... if the char* constructor is private, and there isn't some C++ rule to make the compiler fall back to another constructor if the preferred one is private. Will be doing some investigating in that direction...
Re: allow literal string only
Quote:
Originally Posted by
OReubens
nothing prevents a programmer from writing:
Foo x( LITERAL(strMfcString)); // or stl string, or buffer or anything that has a conversion to LPCSTR.
no, LITERAL(strMfcString) expands to Foo:: please_dont_use_me(""strMfcString"") which is a compiler error ( whereas LITERAL("ok") expands to Foo:: please_dont_use_me("""ok""") that in turn is legally interpreted as simply Foo:: please_dont_use_me("ok") ... )
anyway, regarding your first objection I've nothing to say :)
Re: allow literal string only
I think I solved it :)
Code:
class Foo
{
public:
// This constructor will only accept character references with a known compile-time size.
template <std::size_t size>
Foo(const char (&str)[size])
{
m_cp = str;
}
private:
func (char* lpsz) // this private constructor, blocks the use of c-style character buffers.
{
}
const char* m_cp;
};
with the above:
Code:
Foo f("Literal"); // this works
const char s1[] = "s1";
Foo f1(s1); // This works
static const char s2[] = "s2";
Foo f2(s2); // This works
Everything else i've tried so far fails (as desired). Either because there's no matching constructor, or because the char* constructor is private.
It does mean the above is restricted to member functions where you can make a member private. It won't work on global functions.
Anyone notice anything I missed with the above ?
Re: allow literal string only
Quote:
Originally Posted by OReubens
I think I solved it
(...)
Anyone notice anything I missed with the above ?
Oh, that is precisely what I meant by:
Quote:
Originally Posted by laserlight
I considered using this fact with a template constructor that takes an array by reference, but passing a pointer could be the right thing to do if that pointer points to the first element of a string literal, and it still will not do anything about arrays of const char that are not string literals, as you reasoned.
If you are willing to consider this correct:
Code:
const char s1[] = "s1";
Foo f1(s1); // This works
then yes, there is no problem. But the above is suspect, in my opinion, as the member pointer would then be pointing to the first char of a string that may well be a local variable that goes out of scope, unlike a string literal or static const char array/pointer. Are you sure that this is desired?
Re: allow literal string only
Quote:
Originally Posted by
laserlight
If you are willing to consider this correct:
Code:
const char s1[] = "s1";
Foo f1(s1); // This works
then yes, there is no problem. But the above is suspect, in my opinion, as the member pointer would then be pointing to the first char of a string that may well be a local variable that goes out of scope, unlike a string literal or static const char array/pointer. Are you sure that this is desired?
Ah *bleep*... :(
Didn't consider that being used from inside a function. Not being static causes the compiler to "allocate" this buffer on the stack and copying the contents to it from the readonly memory to the stack. When the function returns, the value is gone.
In this case specifically, it's acceptable. For one, the scope of the 'Foo' object is always the same function that could possibly have that string definition.
And for 2nds, non-static (and non-const) arrays defined at function scope are marked as suspicious by our build tools. This means getting this to pass requires the developer to provide a proof that his solution is acceptable and doesn't cause unneccesary performance issues. In several million lines of code, we only have 2 such cases so far.
Would be nice to catch that case specifically also to have a more generic solution to the issue.
Re: allow literal string only
Hmmm I'm actually confused as to why this difference (having or not having 'static') actually matters on a const at function scope...
You're telling the compiler the char array is const (at function scope). But why does the compiler go through the trouble of making a local copy since you've said it's const anyway. I'll agree that there is the case where you could cast away the const and overwrite the string anyway, but that seems like a very poor reason to cause an unnecessary copy. Or am I missing something?
Re: allow literal string only
Quote:
Originally Posted by OReubens
But why does the compiler go through the trouble of making a local copy since you've said it's const anyway.
It might not, but then you would be depending on a compiler optimisation that is not guaranteed to happen.
Quote:
Originally Posted by OReubens
I'll agree that there is the case where you could cast away the const and overwrite the string anyway
Overwriting the string through the non-const pointer (from the const_cast) would result in undefined behaviour, and I believe that this is precisely because the standard committee had such an optimisation in mind.
Re: allow literal string only
Quote:
Originally Posted by
OReubens
Hmmm I'm actually confused as to why this difference (having or not having 'static') actually matters on a const at function scope...
static keyword on a data storage container (i.e. variable) makes compiler to place it in a different memory segment (i.e. read-only section if 'const' is used or implied) thus I gave you an example above on how to detect it. My code has to be run only in a debugger (or even your special) build. It can point lines of code where your strings are allocated dynamically that will help you change the code. In case of a big development team, you can do such checks once in a while to correct it up to your standards.
Otherwise there's no mechanism that is built in the currently available compiler to tell the difference and all your attempts to circumvent it won't result in a better (or safer) code. I think I heard this one before...
Oh, and const keyword simply instructs your compiler to catch any assignments done to the variable. It does nothing regarding placement in memory. It can be both a local stack/heap or a read-only memory.
Listen, not to be a pill, but why do you need such distinction in strings? And what are your "performance issues"?
Re: allow literal string only
The issue is that this 'Foo' object has many many (several thousand) occurances throughout the code base. It appears so often and appears in vary basic elementary functions (accessors or simple operators) that the constructor literally gets called hundredthousands of times per second. In many cases, the Foo construction/destruction is already a bigger part of the performance than the rest of the work the function is doing. So any small bit of performance change in 'Foo' has a severe impact on overall performance
Changing the class from a const char* member assignment to a CString or std::string member assignment makes the entire program slow to a crawl. A 15 second calculation cycle slows down to a 4 minute calculation cycle. This is unacceptable in a release build (it's even unacceptable in a debug build). Some of the more math heavy stuff using lots of the 'small' operators has an even more dramatic slowdown. A 20second gauss+FFT filter slows down to a 12minute calculation.
The big problem is that allocating a buffer, copying a string, and freeing the buffer are just humongous amounts of code to be executed in comparison to the small functions Foo is located in.
Re: allow literal string only
Quote:
Originally Posted by
OReubens
and security issue in that the string is assumed to be located in readonly memory.
This is assuming that the compiler that is being used guarantees that this is where string literals are placed (read-only-memory). There really is nothing in the standard that states where string-literals are placed, only that writing to them invokes undefined behaviour.
I remember older versions of Visual C++ did not place string literals in read-only memory, which is why a lot of older C++ programs that used string-literals incorrectly (i.e. writing to them) seemed to work.
Regards,
Paul McKenzie
Re: allow literal string only
Quote:
Originally Posted by OReubens
and security issue in that the string is assumed to be located in readonly memory.
It's not a cure-all solution, but you can use /NXCOMPAT, and /DYNAMICBASE compiler directives. Also consider employing the Standard Annotation Language (SAL) to guard against possible buffer overruns that can be exploited as a security breach. I believe all that will require a newer compiler and at least Windows Vista.
Quote:
Originally Posted by OReubens
So any small bit of performance change in 'Foo' has a severe impact on overall performance
First of all, you don't place the DebuggerTestConstStringPtr function into a release build. You don't even have to place it into a debug build.
Again, obviously I'm not aware of the workings of your project but here's how I would do it:
1. Make a define that will control whether that check function is invoked:
Code:
#define DO_CONST_STRING_CHECKS //Comment this line if checks aren't needed
2. Define the check function to be used on a pre-compiler condition:
Code:
#ifdef DO_CONST_STRING_CHECKS
CHECK_STRING(pString, nCodeLine, pCodeFile) CGlobalClass::DebuggerTestConstStringPtr(pString, nCodeLine, pCodeFile)
#else
CHECK_STRING(pString, nCodeLine, pCodeFile)
#endif
3. Then use CHECK_STRING throughout your code. It will obviously require you to put those string checks everywhere throughout the code (there's no way around it at this point). But later on you can instruct your team coders to use that simple line in every new method they write. Better still, you can put it into your class members that can be reused throughout the code and avoid the need to use it if the class methods are called instead.
Code:
CHECK_STRING(pYourStringVariableToCheck, __LINE__, __FILE__);
//You may be even able to put __LINE__ and __FILE__ into the #define statement itself to simplify it even more. I'll let you check that yourself.
4. If DO_CONST_STRING_CHECKS is not defined, you will get your normal build. But, as I said above, once in awhile to check that the coders in your team are doing a good job you define DO_CONST_STRING_CHECKS and make a test build. I understand that it will make it very slow to run, but what's the problem? You won't be shipping it like that anyway. It will simply tell you which lines are not up to your specification -- both in Release and Debug builds. (If you remove ASSERTs from my post with a complete check function, all bad strings will be dumped into a debugger output window. So all you need to do is to check those code lines and correct the problem. Once corrected you don't need to worry about it again.)
PS. The only reason how this method wouldn't work, is if you have your code perform some time-critical operations, i.e. something that ties up your code performance with some outside hardware peripheral. In that case, I believe you're stuck with using a built-in search engine in MS VC and a lot of tedious work.
Re: allow literal string only
As I come to think about it, you can probably speed up the check function by removing the exception handler and by calling IsBadWritePtr only on the first byte of the string. It will not be quite accurate but it will significantly speed it up.
Re: allow literal string only
Quote:
Originally Posted by
Paul McKenzie
This is assuming that the compiler that is being used guarantees that this is where string literals are placed (read-only-memory). There really is nothing in the standard that states where string-literals are placed, only that writing to them invokes undefined behaviour.
I remember older versions of Visual C++ did not place string literals in read-only memory, which is why a lot of older C++ programs that used string-literals incorrectly (i.e. writing to them) seemed to work.
Portability is a non issue in my case. It's Windows only and VC++ only.
You're right about older versions of VC not putting a literal string in the const/readonly segment. This is why the old code still forces it this way by using a
static const char sz[] = "literal";
This particular syntax makes some devs think it's safe to pass other types of strings. We're busy actually phasing out this old construct and going with literals in the constructor only.
The "security" aspect isn't that much of an issue, if nothing else it satisfies the concerns of the tester team that the strings can't be changed in any possible way (i'm not going to tell them you can modify the readonly status via a VirtualAlloc call :p)
Re: allow literal string only
Quote:
Originally Posted by
ahmd
First of all, you don't place the DebuggerTestConstStringPtr function into a release build. You don't even have to place it into a debug build.
It has to either be arranged at compile time (in which case it's part of the interface contract) or it would have to be in a release build. The developers can only do unit tests to test their own interfaces. It's up to the tester team to go through all the instances and regression testing.
Even if we could do this in a debug build, this sort of test would make debug builds so slow that we'd end up wasting too much developer time. If a unit test ends up taking more than a minute, this typically means the developer will be having a 5+ min break at the coffee machine. If a unit test takes more than 15Mins... god know what they'll go doing... it's the end of the world !!!
Like I said, doing a IsBadWritePtr test is slower than copying the contents, which is already really bad.
Quote:
once in awhile to check that the coders in your team are doing a good job you define DO_CONST_STRING_CHECKS and make a test build.
The problem here is... Which of the X devs is going to be assigned to doing this ? How are you going to make sure that whomever is testing goes through every single possible case and instance.
The last case we caught (and which was the main trigger to this forum post)
had code like this:
Foo( <test> ? "literal" : CString("otherliteral")+obj.ToString() );
The <test> was true in 99.99% of the cases, which is why it went through the regression testing, and unit testing, and instance testing. And it is why it very very rarely (and not consistently reproducibly) ended up failing horribly at some customers. Where it actually did go wrong wasn't even anywhere near this constructor call, but passing a "dirty" pointer caused memory to be overwritten elsewhere, causing issues in unrelated and random places.
The tester team then told us that we EITHER had to somehow with 100% certainty prevent such constructs from ever being made again. OR we would have to create a framework for doing elaborate instance tests. To try each and every possible combination of parameters. (=pretty much impossible)
Trying to prevent is is easier. If the compiler can make that happen, so much the better, the alternative would have been to write a code parser that catches anything invalid and add that into our pre-compilation of the testing and release candidate builds.
The solution I have now (see above) is adequate. The one problem case left (non static const at function scope) gets cought by the code evaluation tool we have in the pre-compilation. It would have been nice to have a more generic solution though.
Re: allow literal string only
Quote:
Originally Posted by
OReubens
the alternative would have been to write a code parser
Wow, you went the really hard way! I'm not sure if a code parser would be able to catch all the cases of your specification without actually compiling the code. Wouldn't it be easier to assign some interns (you must have some in summer) to go through a read-only version of the code source and point out inconsistencies?
Quote:
Originally Posted by OReubens
Foo( <test> ? "literal" : CString("otherliteral")+obj.ToString() );
Just curious, why would this cause a problem?
If Foo() uses that string pointer as a read-only string it should not be an issue. If it doesn't, declare it as LPTSTR instead and you'll catch all inconsistencies at a compilation time.