Click to See Complete Forum and Search --> : Regular Expression for an identifier token


amigosplayground
February 27th, 2006, 05:30 PM
I'm having trouble defining a regular expression for an identifier token. An identifier is defined as a letter followed by zero or more letters or digits or underscores. But there cannot be two consecutive underscores in the identifier. That's the part I'm having trouble with. How do I represent that there cannot be two consecutive underscores in the identifier?

Once I have this definition I can include it in the finite state machine that I'm making for my lexical analyzer.

Any help would be appreciated.
Thanks

RoboTact
February 27th, 2006, 06:41 PM
It just means that word may begin with normal symbols, but once there is underscore, next stymbol is NOT underscore, and you may write it explicitly. It's actually simplier to draw as automaton.

For example, language words of symbols a and b, where b can't occur 2 times in a line is as follows (e is empty word):L = (ba|a)*+(b|e)