C# Regular Expressions

H

hclugano

Hi!
There is a SQL Create Table statement, for example:

CREATE TABLE [dbo].[EM5Auftrag] (
[ID] [int] IDENTITY (1, 1) NOT NULL,
[Nummer] [int] NOT NULL,
[Auftrag] [varchar] ( 15) NOT NULL
.....
) ON [PRIMARY]

The Syntax isn't always the same. But it's always a correct SQL-Syntax.
For example the [ and ] are optionally or the new lines are on other
places, no tabs and so on.
I have to write a parser, which is able to find all attributes and
their properties (id, int, identity (1,1), not null). For each of this
attributes I need an Object (the class exists).

How can I find this Attributes. I tried it with a Regular Expression as

@"create table \[?dbo\]?.\[?" + Tabname + @"\]? \([!!!!!]\)"
!!!!! is the placeholder, I don't know what to write for it. Tabname is
the name of the Table. It's known.

What should I write in, for !!!!!???
If a solution with regex isn't possible, I need to find another way.
Thanks a lot!
 
G

Guest

Regular expressions will solve your problem but you need to be more specific
about what you want and how you use them. Personally I would use multiple
regular expressions here. 1 to pull ou the contents of the Create (i.e.
between ( and ) in the stored proc) and a second to retrieve details for each
line.

Something like this:

1st regex:
@"(?imsn)create table (\[?dbo\]?.\[?EM5Auftrag\]? ?\()(?<Contents>.*)\) ?on"

I know there is a .* in there and that in general .* is inefficient but in
that case it's best as there are not many characters after the final )

Pull out the contents of the group name "Contents" and then run the second
regex on it.

2nd Regex
@"(?imsn)(?<Line>((?!,\s\[?[a-zA-Z]).)*)"

This second regex might be a bit iffy but going on your examplp stored proc
it's the best I could come up with..

it pulls out each line of the stored proc. After that you can easily parse
out the type useing string parsing or another regex.

Hope this helps.


Brian Delahunty
Ireland

http://briandela.com/blog
 
C

Crispie

What does this one mean: <content> or <line>?? I never saw this.
And how could I access one Line? Both RegEx are giving the same result
in my test.

pattern = @"(?imsn)create table (\[?dbo\]?.\[?"+Tabname+@"\]?
?\()(?<Contents>.*)\) ?on";
regEx = new Regex(pattern, RegexOptions.IgnoreCase);
m = regEx.Match(content);
if(m.Success){
//Console.Write(m.ToString());
pattern = @"(?imsn)(?<Line>((?!,\s\[?[a-­zA-Z]).)*)" ;
Regex regExAtt= new Regex(pattern, RegexOptions.IgnoreCase);
Match ma = regExAtt.Match(m.ToString());
if(ma.Success){
Console.Write(ma.Groups[1].Value);
}
}else{
}
 
G

Guest

They are named captures.

e.g.

string storedProc = <your stored proc text would be here>;
string pattern = @"(?imsn)create table (\[?dbo\]?.\[?"+Tabname+@"\]?
?\()(?<Contents>.*)\) ?on";

Match match = Regex.Match(storedProc, pattern);

if(match.Success == true)
{
string content = match.Groups["Contents"].Value;

string secondPattern = @"(?imsn)(?<Line>((?!,\s\[?[a-­zA-Z]).)*)" ;

MatchCollection matches = Regex.Matches(content, secondPattern);

foreach(Match m in matches)
{
string line = m.Groups[Line].Value;
// Do what you want with the line... e.g. additional parsing etc
}
}


You don't need to use an RegexOptions as the (?imsn) at the beginning of the
regex sets those.

Hope this helps.
 
C

Crispie

The problem I have, is that i can't access one line. in the
ma.Groups["Line"].Value is the hole content...

pattern = @"(?imsn)create table (\[?dbo\]?.\[?"+Tabname+@"\]?
?\()(?<Contents>.*)\) ?on";
m = Regex.Match(content, pattern);
contentCreate = m.Groups["Contents"].Value;
if(m.Success){
string secondPattern =
@"(?imsn)(?<Line>((?!,\s\[?[a-­­zA-Z]).)*)" ;
MatchCollection matches = Regex.Matches(contentCreate,
secondPattern);
foreach(Match ma in matches) {
string line = ma.Groups["Line"].Value;
Console.WriteLine("test:" + line);
// Do what you want with the line... e.g. additional parsing
etc
}
}

RESULT:
test:
[ID] [int] IDENTITY (1, 1) NOT NULL,
[Nummer] [int] NOT NULL,
[Auftrag] [varchar] ( 15) NOT NULL,

test:

And not what I want:
test [ID] [int] IDENTITY (1, 1) NOT NULL,
test [Nummer] [int] NOT NULL,
.....
 
G

Guest

Yes. This is probably due to the fact that I based my regex on what was
posted here (which was: "CREATE TABLE [dbo].[EM5Auftrag] ( [ID] [int]
IDENTITY (1, 1) NOT NULL, [Nummer] [int] NOT NULL, [Auftrag] [varchar] ( 15)
NOT NULL ..... ) ON [PRIMARY] ") and which obviously will not match what is
in memory when you pull back the stored proc from whereever you are storing
it. My regex's are only meant to be a starting point... not a solution...
sorry if I didn't make that clear.

Try this regex as the second one:

@"(?imsn)[^\[a-zA-z]+(?<Line>((?!,\n).)*)";

That should work

Hope this helps!

Brian Delahunty
Ireland

http://briandela.com/blog
 
C

Crispie

I tried it with exactly the postet one...
The first Regex works fine, and i get the content from create to ...)
ON. This is what i need. Now I need this content not at one piece, but
on lines.
there are 16 lines with attributes in there... The problem is, that I
can't access this attributes separatly. In SQL, its possible to write
this attributes all 16 on the same line or as this every on a own line.
With this new secondRegex, it doesn't works. The result I get is as
following:
61times:
test:
test:
....
the variable "line" is empty.

if i try ma.value, i get only the numbers in the content.
[ID] [int] IDENTITY (1, 1) NOT NULL,
-->
test:
test:
test: (1, 1)
test:
test:,

I don't understand this result. why are all letters hidden?

I'm very sorry, I'm not experienced in programming with regex... I read
some documents, but didn't understand it well.
Greets and thanks for being patience.
 
C

Crispie

@"(?imsn)[a-zA-z]+(?<Line>(­(?!,\n).)*)";

Deleting ^\[ in your solution gives me the result as follows:
test:[ID]
test:[int]
test:IDENTITY
test:NOT
test:NULL
test:[Nummer]
test:[int]
test:NOT
test:NULL

But now, there aren't any numbers... I need also the IDENTITY (1, 1) or
( 15) as the length of a varchar...
whats the problem?
 
G

Guest

Can you debug through your code and copy out the contents of "content"
variable just before you run the regex on it.. i.e. the stored procedure as
it appears in memory. Paste it here and I'll see if I can do up a regex for
it.
 
C

Crispie

I solved the problem:
I now take this one: @"(?imsn)[a-zA-z0-9(),
-_]+(?<Line>(­(?!,\n).)*)";
(added 0-9(),-_ into your regex)

the output now shows like this (ignore true/false, thats another test):
test:[ID] True False
test:[int] False True
test:IDENTITY (1, 1) False False
test:NOT NULL, False False
test:[Nummer] True False
test:[int] False True
test: False False
test:NOT NULL, False False
test:[Auftrag] True False
test:[varchar] False True
test:( 15) False False
test:NOT NULL, False False

So I have now every single item. Now I can create some
Attribute-Objects.

Thanks for help,
Greets
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top