REGEX vs Stack Tokenizer?

  • Thread starter Thread starter xlar54
  • Start date Start date
X

xlar54

Hey guys, Im writing a simple language tokenizer and Im at that point
of "10 ways to do it, which is the best"...

All I need it to do, is given a string, look inside it for keywords,
and replace them with an ascii value (greater than 128 - each keyword
will have its own ascii token). The hitch is, obviously if the keyword
is between quotes, do nothing to it.

Ive gone the route of using a stack, and its ok, but seems clumsy and
overkill... Could just be my implementation...

The other thought I had was regular expressions. I dont know enough
about them to know if this will work and will actually be cleaner code
than doing a stack type of search.

Any thoughts are appreciated.

Thanks
 
xlar54 said:
Hey guys, Im writing a simple language tokenizer and Im at that point
of "10 ways to do it, which is the best"...

The easiest way to write a tokenizer is with a state machine (e.g. DFA or
NFA). You can implement a state machine with transition tables and a
driver method (in the style of lex etc.), but you usually use a tool to
create the tables.

An alternative is to code the state machine implicitly in the "program
counter", or current position, in a lexing method. I wrote a blog article
on how to do this in C#:

http://barrkel.blogspot.com/2005/06/lexer-for-lexical-analysis.html

-- Barry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top