Hi guys, I'm currently writing a Visual Studio extension for a class project. Part of that extension's functionality will be the ability to do a few simple whitespace refactorings. For instance, ensure that there is a blank line after every function definition. Well, there's my problem.
The regular expression for recognizing a C++ function declaration is obviously VERY complicated. But I know that I can't be the first person to want to write one. I've searched but couldn't find any. Can you point me to some place that might have such a regular expression? Or if you can't do that, at least a full specification of the Visual C++ syntax so I can write my own, and have it be complete.
Despite the fact that regular expressions are powerful, writing them quickly becomes messy and difficult. I would really appreciate any kind of help. And in the future, I'll be needing ones for function and variable definitions as well, so I'd take some info on those too.
Recognizing a function in C++ is not possible using regular expressions.
For example, consider the following two (useless but legitimate) functions:
Code:
void f() { ((((())))) }
void g() { (((((()))))) }
Obviously you can create an infinite number of functions that way, by adding an internal "()" every time.
However, these functions cannot be recognized using a regular expressions (since regular expressions cannot "remember" the number of '(' occurrences).
What you are looking for is a Parser Generator for C++, which can, given a C++ code, return an Abstract Syntax Tree that describes the program. Using this Abstract Syntax Tree, you can easily check if new lines appear after every function, by scanning it.
I had a class last semester all about computation theory. We studied regular expressions, CFLs, Turing machines, etc. And I know that matched parentheses isn't a regular language. But what's weird is, I have a program called Expresso to work with .NET regular expressions, and it has a built in regular expression to recognize matching parentheses. :/ No idea how it's done.
Anyway, I think I found what I was looking for. Luckily, I'm also in a compilers and languages class this semester, so I knew how to work with an AST. Unfortunately, Visual Studio doesn't provide an AST or even a token list (which I'm told Eclipse offers both). Hours of searching later, and I finally decide to just integrate my own parser generator.
BUT, then I found out that Visual Studio offers something called a CodeModel.
It's not an AST, but it's sure as hell as close as you can get. Gives me information about all of the code 'elements', including functions, classes, variables, attributes, etc. It even has functions to get the start and end point in the text body. So, if anybody is looking for an AST in Visual Studio, see if the CodeModel will be a viable alternative.
Thanks for the help guys. As much as I love regular expressions, I'm glad I don't have to work with them any more.
EDIT: Apparently, .NET regex is souped up a bit. Here's a regex to recognize matching parentheses.
Bookmarks