Parse library

  • Thread starter Thread starter William Stacey [MVP]
  • Start date Start date
W

William Stacey [MVP]

Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};
 
William Stacey said:
Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

libConfuse pretty much does what you want:

http://www.nongnu.org/confuse/

It's pure C, but it comes prebaked with a VS.NET project so it wouldn't be
too hard to convert/wrap.

It might take you less time to write something yourself with Regex.

Erik
 
Thanks Erik. After looking at the lib, looks like more work then what I
want to get into. Maybe I will just use xml serializer instead. I guess
most folks may like xml config files these days. Cheers!
 
William Stacey said:
Thanks Erik. After looking at the lib, looks like more work then what I
want to get into. Maybe I will just use xml serializer instead. I guess
most folks may like xml config files these days. Cheers!

Personally, I'm tryign to find a way to get away from them, ;). After a year
or so of too much xml, I'm starting to see why people fuss when someone uses
xml as a human readable\writeable language.

While I don't know a library that will parse it, writing a parser shouldn't
be terribly difficult, maybe a days work using jay, depending on the
flexibility.

Are you interested in a limited set of keywords or an open ended parser?
 
William said:
Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

One way to approach this problem could be to implement a custom
XmlReader that can parse such a file.

There's an article that shows such an approach being used here:

http://msdn.microsoft.com/msdnmag/issues/04/05/XMLFiles/default.aspx
 
That is interesting. I thought about something like that as it is kinda
like xml without the tags. Thanks.

--
William Stacey, MVP

Ed Courtenay said:
William said:
Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

One way to approach this problem could be to implement a custom
XmlReader that can parse such a file.

There's an article that shows such an approach being used here:

http://msdn.microsoft.com/msdnmag/issues/04/05/XMLFiles/default.aspx


--

Ed Courtenay
[MCP, MCSD]
http://www.edcourtenay.co.uk
 
Probably 50 or so key words with values like bool, string[], int, string.
What is "jay"?

--
William Stacey, MVP

Daniel O'Connell said:
William Stacey said:
Thanks Erik. After looking at the lib, looks like more work then what I
want to get into. Maybe I will just use xml serializer instead. I guess
most folks may like xml config files these days. Cheers!

Personally, I'm tryign to find a way to get away from them, ;). After a year
or so of too much xml, I'm starting to see why people fuss when someone uses
xml as a human readable\writeable language.

While I don't know a library that will parse it, writing a parser shouldn't
be terribly difficult, maybe a days work using jay, depending on the
flexibility.

Are you interested in a limited set of keywords or an open ended parser?
 
William Stacey said:
Probably 50 or so key words with values like bool, string[], int, string.
Hrmm, it wouldn't be terribly hard to write. It'd take acouple days, for
sure, but if you aren't trying to compile to MSIL or anything, it'd be
doable without much of a headache(its codegen that makes you wanna tear your
hair out).

The simplist way would be a very simple parser that just returns a
dictionary with name,value pairs. If one was really nuts you could go the
xsd.exe way and generate a config class that loads and writes out the config
file, ;).

Actually, that would be an interesting project, a library of config parsers
and config object generators, or more interestingly a parser generator based
on some kind of grammer...something to think about at the least, something
like this must exist for .NET somewhere....but I'm going off tangent here.

Anyway, it'd be easier than adapting libConfuse, I think, but probably still
more work than you are looking for.
What is "jay"?
Its the parser generator that mono uses. I've been using it to write the
parser of a compiler these last couple of days. There are a few other
generators out there, jay is just the one I happen to have used. I'm pretty
sure it would apply to this circumstance, but it requires learning a bit of
new syntax and writing your own tokenizer. To avoid that a direct C# parser
could probably be written, just not as efficent, I would think.
 
Thanks again Daniel. Not sure I want to go this path, but as we talking
about it....
I never did any tokenizer stuff, but am curious, in general logic, how you
would go about the passes?
1) First pass - get rid of ctrl linefeeds to get one long string.
2) Start marching down the string looking for tokens? This part not sure
about. Using regex could be a nightmare I would think. Maybe when you see
"options {", you replace it with "<options>" and when you find the last "}",
replace it with "</options>". Do that for everything, then you have xml
that can deserialize with the std .net stuff. Not sure.

Guess for now, will just leave as XML to get things working, then think
about it more after am closer to done. Cheers!

--
William Stacey, MVP

Daniel O'Connell said:
William Stacey said:
Probably 50 or so key words with values like bool, string[], int,
string.
Hrmm, it wouldn't be terribly hard to write. It'd take acouple days, for
sure, but if you aren't trying to compile to MSIL or anything, it'd be
doable without much of a headache(its codegen that makes you wanna tear your
hair out).

The simplist way would be a very simple parser that just returns a
dictionary with name,value pairs. If one was really nuts you could go the
xsd.exe way and generate a config class that loads and writes out the config
file, ;).

Actually, that would be an interesting project, a library of config parsers
and config object generators, or more interestingly a parser generator based
on some kind of grammer...something to think about at the least, something
like this must exist for .NET somewhere....but I'm going off tangent here.

Anyway, it'd be easier than adapting libConfuse, I think, but probably still
more work than you are looking for.
What is "jay"?
Its the parser generator that mono uses. I've been using it to write the
parser of a compiler these last couple of days. There are a few other
generators out there, jay is just the one I happen to have used. I'm pretty
sure it would apply to this circumstance, but it requires learning a bit of
new syntax and writing your own tokenizer. To avoid that a direct C# parser
could probably be written, just not as efficent, I would think.
 
I'd recommend books, but you look like you are only partially interested in
the process and more interested in getting your particular scenario to work as
easily as possible. I definitely support you in this later endeavor so here
goes
with a simple explanation of what you are looking for.

You really have two modes. Tokenizing(lexer) and parsing(parser). The
tokenizer simply breaks the stream down into portions, could be characters
or could be logical groups of characters. Then the parser assigns meaning to
these tokens based on context. That is the 10,000 mile look down approach.

To answer your specific questions, you can get rid of whitespace or you can
use it. Some languages use it, and things like linefeeds become important. So
your tokenizer can either preserve or remove whitespace depending on your
application.

As for your second question, you would march the string and build a context
tree of sorts. As you find options you might create an XmlNode of name options
and set it as the *context node* on some stack. The top of the stack is always
the *context node*. As you approach {, you know that you are entering the
options *nesting* area. This is a transition, since after options the two
things
you are looking for is either "{" or "=". One of these signifies which
direction
your parser is going to take. If you find an = then you are going to attach a
value
and then leave the current nodes context, pop it off the statck. If you find a
{ you
are going to enter the nodes context and the next item you find is going to be
another
node that will need to be popped onto the stack.

I'm starting to get complicated here, but hopefully you get the gist of what I'm
saying.
The code for this operation is probably only 100 or so lines long. If there is
interest
in this code let me know and I'll take the time to develop it into a small
sample, since
I could probably use the code myself.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers


William Stacey said:
Thanks again Daniel. Not sure I want to go this path, but as we talking
about it....
I never did any tokenizer stuff, but am curious, in general logic, how you
would go about the passes?
1) First pass - get rid of ctrl linefeeds to get one long string.
2) Start marching down the string looking for tokens? This part not sure
about. Using regex could be a nightmare I would think. Maybe when you see
"options {", you replace it with "<options>" and when you find the last "}",
replace it with "</options>". Do that for everything, then you have xml
that can deserialize with the std .net stuff. Not sure.

Guess for now, will just leave as XML to get things working, then think
about it more after am closer to done. Cheers!

--
William Stacey, MVP

Daniel O'Connell said:
William Stacey said:
Probably 50 or so key words with values like bool, string[], int,
string.
Hrmm, it wouldn't be terribly hard to write. It'd take acouple days, for
sure, but if you aren't trying to compile to MSIL or anything, it'd be
doable without much of a headache(its codegen that makes you wanna tear your
hair out).

The simplist way would be a very simple parser that just returns a
dictionary with name,value pairs. If one was really nuts you could go the
xsd.exe way and generate a config class that loads and writes out the config
file, ;).

Actually, that would be an interesting project, a library of config parsers
and config object generators, or more interestingly a parser generator based
on some kind of grammer...something to think about at the least, something
like this must exist for .NET somewhere....but I'm going off tangent here.

Anyway, it'd be easier than adapting libConfuse, I think, but probably still
more work than you are looking for.
What is "jay"?
Its the parser generator that mono uses. I've been using it to write the
parser of a compiler these last couple of days. There are a few other
generators out there, jay is just the one I happen to have used. I'm pretty
sure it would apply to this circumstance, but it requires learning a bit of
new syntax and writing your own tokenizer. To avoid that a direct C# parser
could probably be written, just not as efficent, I would think.
--
William Stacey, MVP

message
Thanks Erik. After looking at the lib, looks like more work then what
I
want to get into. Maybe I will just use xml serializer instead. I
guess
most folks may like xml config files these days. Cheers!


Personally, I'm tryign to find a way to get away from them, ;). After a
year
or so of too much xml, I'm starting to see why people fuss when someone
uses
xml as a human readable\writeable language.

While I don't know a library that will parse it, writing a parser
shouldn't
be terribly difficult, maybe a days work using jay, depending on the
flexibility.

Are you interested in a limited set of keywords or an open ended parser?
--
William Stacey, MVP

Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

--
William Stacey, MVP

libConfuse pretty much does what you want:

http://www.nongnu.org/confuse/

It's pure C, but it comes prebaked with a VS.NET project so it
wouldn't
be
too hard to convert/wrap.

It might take you less time to write something yourself with Regex.

Erik
 
I threw up a basic lexer that is capable of lexing your language at:

http://weblogs.asp.net/justin_rogers/archive/2004/05/15/132668.aspx

I've also finished the parser/compiler, which wasn't all that difficult, but I
want
to comment the parser a bit better and throw it up as an article. Note this
only took me about an hour, so it isn't of the highest quality, but as soon as I
get a chance to get it up I'll post.

As you'll find out a compiler can go from one format to any other format, so
I've chosen to compile to an XML document. Strange going from one format
to the other, but if you already have plenty of code for working with XML
documents then you'll appreciate the conversion or compilation. Since this is
a compiler, it won't allow you to *write* your changes back out in the same
format, but I don't see that as a huge problem, since writing code to output
the format based on an XmlDocument would be fairly trivial.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

Justin Rogers said:
I'd recommend books, but you look like you are only partially interested in
the process and more interested in getting your particular scenario to work as
easily as possible. I definitely support you in this later endeavor so here
goes
with a simple explanation of what you are looking for.

You really have two modes. Tokenizing(lexer) and parsing(parser). The
tokenizer simply breaks the stream down into portions, could be characters
or could be logical groups of characters. Then the parser assigns meaning to
these tokens based on context. That is the 10,000 mile look down approach.

To answer your specific questions, you can get rid of whitespace or you can
use it. Some languages use it, and things like linefeeds become important. So
your tokenizer can either preserve or remove whitespace depending on your
application.

As for your second question, you would march the string and build a context
tree of sorts. As you find options you might create an XmlNode of name options
and set it as the *context node* on some stack. The top of the stack is always
the *context node*. As you approach {, you know that you are entering the
options *nesting* area. This is a transition, since after options the two
things
you are looking for is either "{" or "=". One of these signifies which
direction
your parser is going to take. If you find an = then you are going to attach a
value
and then leave the current nodes context, pop it off the statck. If you find a
{ you
are going to enter the nodes context and the next item you find is going to be
another
node that will need to be popped onto the stack.

I'm starting to get complicated here, but hopefully you get the gist of what I'm
saying.
The code for this operation is probably only 100 or so lines long. If there is
interest
in this code let me know and I'll take the time to develop it into a small
sample, since
I could probably use the code myself.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers


William Stacey said:
Thanks again Daniel. Not sure I want to go this path, but as we talking
about it....
I never did any tokenizer stuff, but am curious, in general logic, how you
would go about the passes?
1) First pass - get rid of ctrl linefeeds to get one long string.
2) Start marching down the string looking for tokens? This part not sure
about. Using regex could be a nightmare I would think. Maybe when you see
"options {", you replace it with "<options>" and when you find the last "}",
replace it with "</options>". Do that for everything, then you have xml
that can deserialize with the std .net stuff. Not sure.

Guess for now, will just leave as XML to get things working, then think
about it more after am closer to done. Cheers!

--
William Stacey, MVP

Daniel O'Connell said:
Probably 50 or so key words with values like bool, string[], int, string.
Hrmm, it wouldn't be terribly hard to write. It'd take acouple days, for
sure, but if you aren't trying to compile to MSIL or anything, it'd be
doable without much of a headache(its codegen that makes you wanna tear your
hair out).

The simplist way would be a very simple parser that just returns a
dictionary with name,value pairs. If one was really nuts you could go the
xsd.exe way and generate a config class that loads and writes out the config
file, ;).

Actually, that would be an interesting project, a library of config parsers
and config object generators, or more interestingly a parser generator based
on some kind of grammer...something to think about at the least, something
like this must exist for .NET somewhere....but I'm going off tangent here.

Anyway, it'd be easier than adapting libConfuse, I think, but probably still
more work than you are looking for.
What is "jay"?

Its the parser generator that mono uses. I've been using it to write the
parser of a compiler these last couple of days. There are a few other
generators out there, jay is just the one I happen to have used. I'm pretty
sure it would apply to this circumstance, but it requires learning a bit of
new syntax and writing your own tokenizer. To avoid that a direct C# parser
could probably be written, just not as efficent, I would think.
--
William Stacey, MVP

message
Thanks Erik. After looking at the lib, looks like more work then what
I
want to get into. Maybe I will just use xml serializer instead. I
guess
most folks may like xml config files these days. Cheers!


Personally, I'm tryign to find a way to get away from them, ;). After a
year
or so of too much xml, I'm starting to see why people fuss when someone
uses
xml as a human readable\writeable language.

While I don't know a library that will parse it, writing a parser
shouldn't
be terribly difficult, maybe a days work using jay, depending on the
flexibility.

Are you interested in a limited set of keywords or an open ended parser?
--
William Stacey, MVP

Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

--
William Stacey, MVP

libConfuse pretty much does what you want:

http://www.nongnu.org/confuse/

It's pure C, but it comes prebaked with a VS.NET project so it
wouldn't
be
too hard to convert/wrap.

It might take you less time to write something yourself with Regex.

Erik
 
Very cool and thanks. I will have to look at it a bit harder and test with
it to offer any other questions or comments. I would be interested in other
doco or revisions on your code as you go down the road with it. Thanks
again Justin! Cheers.

--
William Stacey, MVP

Justin Rogers said:
I threw up a basic lexer that is capable of lexing your language at:

http://weblogs.asp.net/justin_rogers/archive/2004/05/15/132668.aspx

I've also finished the parser/compiler, which wasn't all that difficult, but I
want
to comment the parser a bit better and throw it up as an article. Note this
only took me about an hour, so it isn't of the highest quality, but as soon as I
get a chance to get it up I'll post.

As you'll find out a compiler can go from one format to any other format, so
I've chosen to compile to an XML document. Strange going from one format
to the other, but if you already have plenty of code for working with XML
documents then you'll appreciate the conversion or compilation. Since this is
a compiler, it won't allow you to *write* your changes back out in the same
format, but I don't see that as a huge problem, since writing code to output
the format based on an XmlDocument would be fairly trivial.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

Justin Rogers said:
I'd recommend books, but you look like you are only partially interested in
the process and more interested in getting your particular scenario to work as
easily as possible. I definitely support you in this later endeavor so here
goes
with a simple explanation of what you are looking for.

You really have two modes. Tokenizing(lexer) and parsing(parser). The
tokenizer simply breaks the stream down into portions, could be characters
or could be logical groups of characters. Then the parser assigns meaning to
these tokens based on context. That is the 10,000 mile look down approach.

To answer your specific questions, you can get rid of whitespace or you can
use it. Some languages use it, and things like linefeeds become
important.
So
your tokenizer can either preserve or remove whitespace depending on your
application.

As for your second question, you would march the string and build a context
tree of sorts. As you find options you might create an XmlNode of name options
and set it as the *context node* on some stack. The top of the stack is always
the *context node*. As you approach {, you know that you are entering the
options *nesting* area. This is a transition, since after options the two
things
you are looking for is either "{" or "=". One of these signifies which
direction
your parser is going to take. If you find an = then you are going to attach a
value
and then leave the current nodes context, pop it off the statck. If you
find
a
{ you
are going to enter the nodes context and the next item you find is going to be
another
node that will need to be popped onto the stack.

I'm starting to get complicated here, but hopefully you get the gist of
what
I'm
saying.
The code for this operation is probably only 100 or so lines long. If
there
is
interest
in this code let me know and I'll take the time to develop it into a small
sample, since
I could probably use the code myself.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers


William Stacey said:
Thanks again Daniel. Not sure I want to go this path, but as we talking
about it....
I never did any tokenizer stuff, but am curious, in general logic, how you
would go about the passes?
1) First pass - get rid of ctrl linefeeds to get one long string.
2) Start marching down the string looking for tokens? This part not sure
about. Using regex could be a nightmare I would think. Maybe when you see
"options {", you replace it with "<options>" and when you find the last "}",
replace it with "</options>". Do that for everything, then you have xml
that can deserialize with the std .net stuff. Not sure.

Guess for now, will just leave as XML to get things working, then think
about it more after am closer to done. Cheers!

--
William Stacey, MVP

message
Probably 50 or so key words with values like bool, string[], int,
string.
Hrmm, it wouldn't be terribly hard to write. It'd take acouple days, for
sure, but if you aren't trying to compile to MSIL or anything, it'd be
doable without much of a headache(its codegen that makes you wanna tear
your
hair out).

The simplist way would be a very simple parser that just returns a
dictionary with name,value pairs. If one was really nuts you could go the
xsd.exe way and generate a config class that loads and writes out the
config
file, ;).

Actually, that would be an interesting project, a library of config
parsers
and config object generators, or more interestingly a parser generator
based
on some kind of grammer...something to think about at the least, something
like this must exist for .NET somewhere....but I'm going off tangent here.

Anyway, it'd be easier than adapting libConfuse, I think, but probably
still
more work than you are looking for.
What is "jay"?

Its the parser generator that mono uses. I've been using it to write the
parser of a compiler these last couple of days. There are a few other
generators out there, jay is just the one I happen to have used. I'm
pretty
sure it would apply to this circumstance, but it requires learning a bit
of
new syntax and writing your own tokenizer. To avoid that a direct C#
parser
could probably be written, just not as efficent, I would think.
--
William Stacey, MVP

message
Thanks Erik. After looking at the lib, looks like more work then
what
I
want to get into. Maybe I will just use xml serializer instead. I
guess
most folks may like xml config files these days. Cheers!


Personally, I'm tryign to find a way to get away from them, ;). After a
year
or so of too much xml, I'm starting to see why people fuss when someone
uses
xml as a human readable\writeable language.

While I don't know a library that will parse it, writing a parser
shouldn't
be terribly difficult, maybe a days work using jay, depending on the
flexibility.

Are you interested in a limited set of keywords or an open ended
parser?
--
William Stacey, MVP

Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

--
William Stacey, MVP

libConfuse pretty much does what you want:

http://www.nongnu.org/confuse/

It's pure C, but it comes prebaked with a VS.NET project so it
wouldn't
be
too hard to convert/wrap.

It might take you less time to write something yourself with Regex.

Erik
 
Just to add as you may want to do something like this for your article as it
was the initial driver for the question and a real world thing. As you
probably know, this syntax is the config syntax used for Bind's config file
to configure views and zones, etc. for the DNS server. If I could hook up
your code to parse below and eventually get it into some object model, that
would be killer. The end game is to import and export the same thing from
internal object model (as user can change objects via out-of-band means such
as remoting apis and need to write back config that would again be loaded
next time server starts.) For me, just getting it into some tmp objects or
arrays, etc would be good enouph as I could run with it I think. I look
forward to any other work you do in regards to your lexer. Nice work.
Thanks again.

BTW - not sure the ";" after the "}" are required here, but Bind requires
them for some reason.

// config file.
options {
directory "/this/named";
forwarders { 192.168.0.1; 192.168.0.2; };
};

acl internal { ! 192.168.0.2; 192.168.1.2; 192.168/16; }; // Note also the
"!" not.

view internal
{
match-clients { internal; };
zone "foo.bar" {
type master;
file "foo.bar.db";
// other options.
};
};

view external
{
match-clientss { any; };
zone "foo.example"
{
type master;
file "foo.ex.db";
};
};
// End file

--
William Stacey, MVP

Justin Rogers said:
I threw up a basic lexer that is capable of lexing your language at:

http://weblogs.asp.net/justin_rogers/archive/2004/05/15/132668.aspx

I've also finished the parser/compiler, which wasn't all that difficult, but I
want
to comment the parser a bit better and throw it up as an article. Note this
only took me about an hour, so it isn't of the highest quality, but as soon as I
get a chance to get it up I'll post.

As you'll find out a compiler can go from one format to any other format, so
I've chosen to compile to an XML document. Strange going from one format
to the other, but if you already have plenty of code for working with XML
documents then you'll appreciate the conversion or compilation. Since this is
a compiler, it won't allow you to *write* your changes back out in the same
format, but I don't see that as a huge problem, since writing code to output
the format based on an XmlDocument would be fairly trivial.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

Justin Rogers said:
I'd recommend books, but you look like you are only partially interested in
the process and more interested in getting your particular scenario to work as
easily as possible. I definitely support you in this later endeavor so here
goes
with a simple explanation of what you are looking for.

You really have two modes. Tokenizing(lexer) and parsing(parser). The
tokenizer simply breaks the stream down into portions, could be characters
or could be logical groups of characters. Then the parser assigns meaning to
these tokens based on context. That is the 10,000 mile look down approach.

To answer your specific questions, you can get rid of whitespace or you can
use it. Some languages use it, and things like linefeeds become
important.
So
your tokenizer can either preserve or remove whitespace depending on your
application.

As for your second question, you would march the string and build a context
tree of sorts. As you find options you might create an XmlNode of name options
and set it as the *context node* on some stack. The top of the stack is always
the *context node*. As you approach {, you know that you are entering the
options *nesting* area. This is a transition, since after options the two
things
you are looking for is either "{" or "=". One of these signifies which
direction
your parser is going to take. If you find an = then you are going to attach a
value
and then leave the current nodes context, pop it off the statck. If you
find
a
{ you
are going to enter the nodes context and the next item you find is going to be
another
node that will need to be popped onto the stack.

I'm starting to get complicated here, but hopefully you get the gist of
what
I'm
saying.
The code for this operation is probably only 100 or so lines long. If
there
is
interest
in this code let me know and I'll take the time to develop it into a small
sample, since
I could probably use the code myself.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers


William Stacey said:
Thanks again Daniel. Not sure I want to go this path, but as we talking
about it....
I never did any tokenizer stuff, but am curious, in general logic, how you
would go about the passes?
1) First pass - get rid of ctrl linefeeds to get one long string.
2) Start marching down the string looking for tokens? This part not sure
about. Using regex could be a nightmare I would think. Maybe when you see
"options {", you replace it with "<options>" and when you find the last "}",
replace it with "</options>". Do that for everything, then you have xml
that can deserialize with the std .net stuff. Not sure.

Guess for now, will just leave as XML to get things working, then think
about it more after am closer to done. Cheers!

--
William Stacey, MVP

message
Probably 50 or so key words with values like bool, string[], int,
string.
Hrmm, it wouldn't be terribly hard to write. It'd take acouple days, for
sure, but if you aren't trying to compile to MSIL or anything, it'd be
doable without much of a headache(its codegen that makes you wanna tear
your
hair out).

The simplist way would be a very simple parser that just returns a
dictionary with name,value pairs. If one was really nuts you could go the
xsd.exe way and generate a config class that loads and writes out the
config
file, ;).

Actually, that would be an interesting project, a library of config
parsers
and config object generators, or more interestingly a parser generator
based
on some kind of grammer...something to think about at the least, something
like this must exist for .NET somewhere....but I'm going off tangent here.

Anyway, it'd be easier than adapting libConfuse, I think, but probably
still
more work than you are looking for.
What is "jay"?

Its the parser generator that mono uses. I've been using it to write the
parser of a compiler these last couple of days. There are a few other
generators out there, jay is just the one I happen to have used. I'm
pretty
sure it would apply to this circumstance, but it requires learning a bit
of
new syntax and writing your own tokenizer. To avoid that a direct C#
parser
could probably be written, just not as efficent, I would think.
--
William Stacey, MVP

message
Thanks Erik. After looking at the lib, looks like more work then
what
I
want to get into. Maybe I will just use xml serializer instead. I
guess
most folks may like xml config files these days. Cheers!


Personally, I'm tryign to find a way to get away from them, ;). After a
year
or so of too much xml, I'm starting to see why people fuss when someone
uses
xml as a human readable\writeable language.

While I don't know a library that will parse it, writing a parser
shouldn't
be terribly difficult, maybe a days work using jay, depending on the
flexibility.

Are you interested in a limited set of keywords or an open ended
parser?
--
William Stacey, MVP

Anyone know of some library that will parse files like following:

options {
directory "/etc";
allow-query { any; }; // This is the default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

--
William Stacey, MVP

libConfuse pretty much does what you want:

http://www.nongnu.org/confuse/

It's pure C, but it comes prebaked with a VS.NET project so it
wouldn't
be
too hard to convert/wrap.

It might take you less time to write something yourself with Regex.

Erik
 
Justin Rogers said:
I threw up a basic lexer that is capable of lexing your language at:

http://weblogs.asp.net/justin_rogers/archive/2004/05/15/132668.aspx

Phew, that makes my lexer look bulky, ;)(536 lines so far). Granted it
supports several literal formats(string, integers and double right now).
I've also finished the parser/compiler, which wasn't all that difficult,
but I
want
to comment the parser a bit better and throw it up as an article. Note
this
only took me about an hour, so it isn't of the highest quality, but as
soon as I
get a chance to get it up I'll post.

Out of curiosity, what are you using for your parser? Are you using a parser
generator or just writing one by hand? I imagine I could wait for you to
post the article, but I am curious.
As you'll find out a compiler can go from one format to any other format,
so
I've chosen to compile to an XML document. Strange going from one format
to the other, but if you already have plenty of code for working with XML
documents then you'll appreciate the conversion or compilation. Since
this is
a compiler, it won't allow you to *write* your changes back out in the
same
format, but I don't see that as a huge problem, since writing code to
output
the format based on an XmlDocument would be fairly trivial.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

Justin Rogers said:
I'd recommend books, but you look like you are only partially interested
in
the process and more interested in getting your particular scenario to
work as
easily as possible. I definitely support you in this later endeavor so
here
goes
with a simple explanation of what you are looking for.

You really have two modes. Tokenizing(lexer) and parsing(parser). The
tokenizer simply breaks the stream down into portions, could be
characters
or could be logical groups of characters. Then the parser assigns
meaning to
these tokens based on context. That is the 10,000 mile look down
approach.

To answer your specific questions, you can get rid of whitespace or you
can
use it. Some languages use it, and things like linefeeds become
important. So
your tokenizer can either preserve or remove whitespace depending on your
application.

As for your second question, you would march the string and build a
context
tree of sorts. As you find options you might create an XmlNode of name options
and set it as the *context node* on some stack. The top of the stack is always
the *context node*. As you approach {, you know that you are entering
the
options *nesting* area. This is a transition, since after options the
two
things
you are looking for is either "{" or "=". One of these signifies which
direction
your parser is going to take. If you find an = then you are going to
attach a
value
and then leave the current nodes context, pop it off the statck. If you
find a
{ you
are going to enter the nodes context and the next item you find is going
to be
another
node that will need to be popped onto the stack.

I'm starting to get complicated here, but hopefully you get the gist of
what I'm
saying.
The code for this operation is probably only 100 or so lines long. If
there is
interest
in this code let me know and I'll take the time to develop it into a
small
sample, since
I could probably use the code myself.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers


William Stacey said:
Thanks again Daniel. Not sure I want to go this path, but as we
talking
about it....
I never did any tokenizer stuff, but am curious, in general logic, how
you
would go about the passes?
1) First pass - get rid of ctrl linefeeds to get one long string.
2) Start marching down the string looking for tokens? This part not
sure
about. Using regex could be a nightmare I would think. Maybe when you
see
"options {", you replace it with "<options>" and when you find the last
"}",
replace it with "</options>". Do that for everything, then you have
xml
that can deserialize with the std .net stuff. Not sure.

Guess for now, will just leave as XML to get things working, then think
about it more after am closer to done. Cheers!

--
William Stacey, MVP

message
Probably 50 or so key words with values like bool, string[], int,
string.
Hrmm, it wouldn't be terribly hard to write. It'd take acouple days,
for
sure, but if you aren't trying to compile to MSIL or anything, it'd
be
doable without much of a headache(its codegen that makes you wanna
tear
your
hair out).

The simplist way would be a very simple parser that just returns a
dictionary with name,value pairs. If one was really nuts you could go
the
xsd.exe way and generate a config class that loads and writes out the
config
file, ;).

Actually, that would be an interesting project, a library of config
parsers
and config object generators, or more interestingly a parser
generator
based
on some kind of grammer...something to think about at the least,
something
like this must exist for .NET somewhere....but I'm going off tangent
here.

Anyway, it'd be easier than adapting libConfuse, I think, but
probably
still
more work than you are looking for.
What is "jay"?

Its the parser generator that mono uses. I've been using it to write
the
parser of a compiler these last couple of days. There are a few other
generators out there, jay is just the one I happen to have used. I'm
pretty
sure it would apply to this circumstance, but it requires learning a
bit
of
new syntax and writing your own tokenizer. To avoid that a direct C#
parser
could probably be written, just not as efficent, I would think.
--
William Stacey, MVP

in
message
Thanks Erik. After looking at the lib, looks like more work
then
what
I
want to get into. Maybe I will just use xml serializer instead.
I
guess
most folks may like xml config files these days. Cheers!


Personally, I'm tryign to find a way to get away from them, ;).
After a
year
or so of too much xml, I'm starting to see why people fuss when
someone
uses
xml as a human readable\writeable language.

While I don't know a library that will parse it, writing a parser
shouldn't
be terribly difficult, maybe a days work using jay, depending on
the
flexibility.

Are you interested in a limited set of keywords or an open ended
parser?
--
William Stacey, MVP

message
Anyone know of some library that will parse files like
following:

options {
directory "/etc";
allow-query { any; }; // This is the
default
recursion no;
listen-on { 192.168.0.225; };
forwarders { 4.2.2.2; };
};

--
William Stacey, MVP

libConfuse pretty much does what you want:

http://www.nongnu.org/confuse/

It's pure C, but it comes prebaked with a VS.NET project so it
wouldn't
be
too hard to convert/wrap.

It might take you less time to write something yourself with
Regex.

Erik
 
Phew, that makes my lexer look bulky, ;)(536 lines so far). Granted it
supports several literal formats(string, integers and double right now).

Yes, the lexer could be more advanced. Notice in a second post, I took
out even more code bringing the 39 lines to only 30 (for the lexer), but
added the necessary namespace imports and a test driver.

http://weblogs.asp.net/justin_rogers/archive/2004/05/15/132693.aspx
Out of curiosity, what are you using for your parser? Are you using a parser
generator or just writing one by hand? I imagine I could wait for you to
post the article, but I am curious.

I generally write my parser's by hand. It doesn't take much. I'm not at the
machine with the parser on it right now so I can't tell you the number of lines
to parse the Bind configuration file (thanks to William for pointing out the
source of this configuration format), but it is relatively short, on the order
of
only 100-150 with comments. There is some code to build the XmlDocument
object (note this is not a straight parser, but rather a parser + compiler
module
that compiles the Bind configuration file into XmlDocument). I'll have the
article
up shortly.
 
Justin Rogers said:
Yes, the lexer could be more advanced. Notice in a second post, I took
out even more code bringing the 39 lines to only 30 (for the lexer), but
added the necessary namespace imports and a test driver.

http://weblogs.asp.net/justin_rogers/archive/2004/05/15/132693.aspx

Yeah, it could be, but in this case it probably doesn't need to be. Most of
what I'm doing in my tokenizer(reading string and numeric literals) can be
handled just as easily by your parser\compiler. The format is simple enough
and it doesn't appear to have any particular schema restrictions which the
parser would have to deal with. That is bound to help quite a bit.
I generally write my parser's by hand. It doesn't take much. I'm not at
the
machine with the parser on it right now so I can't tell you the number of
lines
to parse the Bind configuration file (thanks to William for pointing out
the
source of this configuration format), but it is relatively short, on the
order
of
only 100-150 with comments. There is some code to build the XmlDocument
object (note this is not a straight parser, but rather a parser + compiler
module
that compiles the Bind configuration file into XmlDocument). I'll have
the
article
up shortly.

For this particular project(and others like it) I agree by hand isn't to
hard. But, so far anyway, I think I prefer generated parsers when dealing
with fixed keywords and the like that you run into when designing a language
and compiler more so than a datafile parser.

At this point, my parser grammer is about 300 lines(generated file is about
1000, but hand coded probably doesn't ahve all the mess). Now, code gen,
that is spread across 31 files and has proven to be harder than I expected,
I mistakenly thought writing ILGenerator.Emit statements in such a way that
it'll work in a compiler wouldn't be any harder than writing correct IL
code..its what I get for deciding I wanted to write a compiler though. I
would have been done days ago if I'd have just gone ahead and interpreted
the script.
 
Your the man! That is sweet. You gave me an idea. It might be interesting
to use c# code style over the bind style. The bind is good and what I was
after, but the c# style would be a very cool twist and I am not tied to the
bind but for the common use of it. How could/should I define the same kind
of config using c# style such as below and will this work?

class BindConfig
{
bool recursion = true;
IPAddress[] forwarders = {"192.168.0.1", "192.168.0.2"}; // not sure
here

class View1
{
class Zone1
{
string name = "mydomain.com.";
IPAddress[] forwarders = ...
}
}
class View2
{
// other zones.
}
}

Not sure if this is better or not, but could be flexible.
Normally I don't need to be spoon fed, but this stuff is a bit new to me.
Very much appreciate your help and interest.
 
William Stacey said:
Your the man! That is sweet. You gave me an idea. It might be
interesting
to use c# code style over the bind style. The bind is good and what I was
after, but the c# style would be a very cool twist and I am not tied to
the
bind but for the common use of it. How could/should I define the same
kind
of config using c# style such as below and will this work?

class BindConfig
{
bool recursion = true;
IPAddress[] forwarders = {"192.168.0.1", "192.168.0.2"}; // not sure
here

class View1
{
class Zone1
{
string name = "mydomain.com.";
IPAddress[] forwarders = ...
}
}
class View2
{
// other zones.
}
}

Not sure if this is better or not, but could be flexible.
If you are going to go this far, why not make it full blown syntax?

config Bind
{
bool recursion = true;
IPAddress[] forwards = {192.168.0.1, 192.168.0.2};

view basicView
{
zone FooZone
{
string name = "mydomain.com";
IPAddress[] forwarders = ...
}
}

}

or wahtever is specifica to your needs. Justins lexer is probably flexible
enough to do this, although the parser would probably need a bit of work for
type verification and the like. The flexibility is there, its just a matter
of designing it and writing the parser. When you get into typing, c style
strings, and the like the lexer and\or parser(depending on your design)
become a bit more complex, although not prohibitivly so.
 
I've gone the basic route. You can fine the article and full code for a parse
that does
a pseudo-bind format. My format might support something like:

bind {
recursion = true;
forwarders {
0 = "192.168.0.1"; 1 = "192.168.0.2";
}
view1 {
zone1 {
name = "mydomain.com";
forwarders {
0 = "192.168.0.1"; 1 = "192.168.0.2";
}
}
}
}

Note, that I am going to finally implement the full bind configuration format,
not because I have
to, but because I think it would be kind of cool. The result of my compiler is
an Xml file format,
but you could easily evolve the compiler to spit out another format if you
wished. I point out in
the article that I do semantic processing in two places. If you wanted a parser
rather than a
linked parser/compiler module, then there are some things that would have to be
changed to be
more efficient to that form of program. With a parser module, you'd expect some
abstract output
that you would then input to a compiler module that would create the final view.
I've simply linked
these two steps into one because it is common and easy to do for small
languages.

http://weblogs.asp.net/justin_rogers/archive/2004/05/16/132744.aspx


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

Daniel O'Connell said:
William Stacey said:
Your the man! That is sweet. You gave me an idea. It might be
interesting
to use c# code style over the bind style. The bind is good and what I was
after, but the c# style would be a very cool twist and I am not tied to
the
bind but for the common use of it. How could/should I define the same
kind
of config using c# style such as below and will this work?

class BindConfig
{
bool recursion = true;
IPAddress[] forwarders = {"192.168.0.1", "192.168.0.2"}; // not sure
here

class View1
{
class Zone1
{
string name = "mydomain.com.";
IPAddress[] forwarders = ...
}
}
class View2
{
// other zones.
}
}

Not sure if this is better or not, but could be flexible.
If you are going to go this far, why not make it full blown syntax?

config Bind
{
bool recursion = true;
IPAddress[] forwards = {192.168.0.1, 192.168.0.2};

view basicView
{
zone FooZone
{
string name = "mydomain.com";
IPAddress[] forwarders = ...
}
}

}

or wahtever is specifica to your needs. Justins lexer is probably flexible
enough to do this, although the parser would probably need a bit of work for
type verification and the like. The flexibility is there, its just a matter
of designing it and writing the parser. When you get into typing, c style
strings, and the like the lexer and\or parser(depending on your design)
become a bit more complex, although not prohibitivly so.
 
a pseudo-bind format. My format might support something like:
bind {
recursion = true;
forwarders {
0 = "192.168.0.1"; 1 = "192.168.0.2";

Very cool and almost perfect. Would it be hard to remove the need for "0 =
"192...". The zero and 1 can be infered I think.
Note, that I am going to finally implement the full bind configuration format,
not because I have
to, but because I think it would be kind of cool. The result of my compiler is
an Xml file format,

Great and thanks again. I will try to point folks to your site as the topic
comes up. Cheers!
 
Back
Top