REGEX vs Stack Tokenizer?

xlar54 · Apr 26, 2006

Hey guys, Im writing a simple language tokenizer and Im at that point
of "10 ways to do it, which is the best"...

All I need it to do, is given a string, look inside it for keywords,
and replace them with an ascii value (greater than 128 - each keyword
will have its own ascii token). The hitch is, obviously if the keyword
is between quotes, do nothing to it.

Ive gone the route of using a stack, and its ok, but seems clumsy and
overkill... Could just be my implementation...

The other thought I had was regular expressions. I dont know enough
about them to know if this will work and will actually be cleaner code
than doing a stack type of search.

Any thoughts are appreciated.

Thanks

Barry Kelly · Apr 26, 2006

xlar54 said:
Hey guys, Im writing a simple language tokenizer and Im at that point
of "10 ways to do it, which is the best"...

The easiest way to write a tokenizer is with a state machine (e.g. DFA or
NFA). You can implement a state machine with transition tables and a
driver method (in the style of lex etc.), but you usually use a tool to
create the tables.

An alternative is to code the state machine implicitly in the "program
counter", or current position, in a lexing method. I wrote a blog article
on how to do this in C#:

http://barrkel.blogspot.com/2005/06/lexer-for-lexical-analysis.html

-- Barry

Challenge - Regular Expression that divides a string at tokens	7	Feb 14, 2009
foreach pretty useless for composite classes, don't ya thunk?	33	Sep 19, 2008
stack questions	5	Oct 22, 2006
I have a challenge for you	5	Jun 20, 2020
How to pass a const object (read only object) to a method?	7	Jul 4, 2008
15% slowdown when multithreading	9	Mar 18, 2009
Need help with impersonation, please.	2	Mar 28, 2006
Approach to Implement a Thread Pump?	5	Jan 16, 2007

REGEX vs Stack Tokenizer?

xlar54

Barry Kelly

Ask a Question

Similar Threads