Splitting a String

  • Thread starter Thread starter Materialised
  • Start date Start date
M

Materialised

Hi all,

Just wondering if someone could help me with this little problem I'm having.

I have a string value (it actually represents a barcode) which looks
like this:

5021378002392

What I wish to do is split this string in 4 different string values, as
such:

val1 = 50;
val2 = 21378;
val3 = 00239;
val4 = 2;

The placement of the values is actually fixed, so it will always be 2,5,5,1.
Does anyone Know how i can do this?
 
Materialised,

Why not just do this?

public static int[] SplitBarCode(string barcode)
{
// The return value.
int[] retVal = new int[4];

// Split the string.
retVal[0] = Int32.Parse(barcode.Substring(0, 2));
retVal[1] = Int32.Parse(barcode.Substring(2, 5));
retVal[2] = Int32.Parse(barcode.Substring(7, 5));
retVal[3] = Int32.Parse(barcode.Substring(12, 1));

// Return the value.
return retVal;
}

Of course, you might want to put a little more error handling into this,
but this should do it.

Hope this helps.
 
Materialised,

Why not just do this?

public static int[] SplitBarCode(string barcode)
{
// The return value.
int[] retVal = new int[4];

// Split the string.
retVal[0] = Int32.Parse(barcode.Substring(0, 2));
retVal[1] = Int32.Parse(barcode.Substring(2, 5));
retVal[2] = Int32.Parse(barcode.Substring(7, 5));
retVal[3] = Int32.Parse(barcode.Substring(12, 1));

// Return the value.
return retVal;
}

Of course, you might want to put a little more error handling into this,
but this should do it.

Hope this helps.

Use regular expression (RegEx class) and its Split method!.
 
Use regular expression (RegEx class) and its Split method!.

While it is possible to use Regex.Split, it is definatly easier to
read the substring version and the Regex is also more inefficent. Here
is the regex version.

public static string[] SplitBarCode(string barcode)
{
return Regex.Split(barcode,
@"(?<= ^\d{2} | ^\d{7} | ^\d{12} )"
,RegexOptions.IgnorePatternWhitespace)
}
 
How could you use that in this case...?

Out of my head:
string[] parts = Regex.Split("5021378002392",
@"(\d{2})(\d{5})(\d{5})(\d{1})");

That regex only works if you are using Regex.Match and iterates over
the groups. That won't yield a one line solution though.

Regex.Split uses the regex to detect boundries, so you would have to
use lookbehind assertion to see how far from the beginning of the
string you are. I posted the correct split regex in another post in
this thread.
 
Use regular expression (RegEx class) and its Split method!.

How could you use that in this case...?

Out of my head:
string[] parts = Regex.Split("5021378002392",
@"(\d{2})(\d{5})(\d{5})(\d{1})");

That regex only works if you are using Regex.Match and iterates over
the groups. That won't yield a one line solution though.

Regex.Split uses the regex to detect boundries, so you would have to
use lookbehind assertion to see how far from the beginning of the
string you are. I posted the correct split regex in another post in
this thread.

Well I just tried it and it does work.... first and last element of
the resulting array are empty though.
 
Well I just tried it and it does work.... first and last element of
the resulting array are empty though.

Heh, my bad. I even tested your code. I just forgot to actually print
out the values.

Anyway, I re-read the entry on Regex.Split. The reason there is an
extra first and last element when using your version with groups is
because of the following

MSDN Regex Split:
"
If capturing groups are used in a Regex.Split expression, the
capturing groups are included in the resulting string array. The
following example would yield the array items "one", "-", "two", "-",
"banana".

Regex r = new Regex("(-)"); // Split on hyphens.
string[] s = r.Split("one-two-banana");
"

Your regex treats the whole barcode as the delimiter and the first and
last element becomes the non existing data before and after the
barcode delimiter.
 
Well I just tried it and it does work.... first and last element of
the resulting array are empty though.

Heh, my bad. I even tested your code. I just forgot to actually print
out the values.

Anyway, I re-read the entry on Regex.Split. The reason there is an
extra first and last element when using your version with groups is
because of the following

MSDN Regex Split:
"
If capturing groups are used in a Regex.Split expression, the
capturing groups are included in the resulting string array. The
following example would yield the array items "one", "-", "two", "-",
"banana".

Regex r = new Regex("(-)"); // Split on hyphens.
string[] s = r.Split("one-two-banana");
"

Your regex treats the whole barcode as the delimiter and the first and
last element becomes the non existing data before and after the
barcode delimiter.

Right, thanks, I was wondering why that was :) But if you would have
to choose between using RegEx or using the SubString method, what
would you prefer and why?
 
Right, thanks, I was wondering why that was :) But if you would have
to choose between using RegEx or using the SubString method, what
would you prefer and why?

In this case, either one is acceptable. I would probably still go with
Substring, because it is slightly easier to read and debug. I usually
save regular expressions for more complex pattern matching.

I do have one big pet issue when using regular expressions though. I
try to always use RegexOptions.IgnorePatternWhitespace so I can add
formatting and comments to make it more readable.
 
In this case, either one is acceptable. I would probably still go with
Substring, because it is slightly easier to read and debug. I usually
save regular expressions for more complex pattern matching.

Agree 100%. I find that I almost never use RegEx for string splitting, but
almost always use it for validation.
 
ludwigs exsample is good, another way to it is this.
the string is now spilt into groups, block1 is an eample of that


Regex RegexObj = new
Regex("\\A(?<block1>\\d{2})(?<block2>\\d{5})(?<block3>\\d{5})(?<block4>\\d)");
string block1 =RegexObj.Match(SubjectString).Groups["block1"].Value;

block1 = 50;
block2 = 21378;
block3 = 00239;
block4 = 2;
 
Rene Sørensen said:
ludwigs exsample is good, another way to it is this.
the string is now spilt into groups, block1 is an eample of that


Regex RegexObj = new
Regex("\\A(?<block1>\\d{2})(?<block2>\\d{5})(?<block3>\\d{5})(?<block4>\\d)");
string block1 =RegexObj.Match(SubjectString).Groups["block1"].Value;

block1 = 50;
block2 = 21378;
block3 = 00239;
block4 = 2;

And this is a good example of why you should really think about whether
regular expressions are the right solution before using them. Pretend
you don't know what the bit of code is meant to do, and read the
version above. Then read Nick's version using String.Substring. I know
which I find significantly simpler to understand...
 
Rene Sørensen said:
ludwigs exsample is good, another way to it is this.
the string is now spilt into groups, block1 is an eample of that


Regex RegexObj = new
Regex("\\A(?<block1>\\d{2})(?<block2>\\d{5})(?<block3>\\d{5})(?<block4>\\d)");
string block1 =RegexObj.Match(SubjectString).Groups["block1"].Value;

block1 = 50;
block2 = 21378;
block3 = 00239;
block4 = 2;

And this is a good example of why you should really think about whether
regular expressions are the right solution before using them. Pretend
you don't know what the bit of code is meant to do, and read the
version above. Then read Nick's version using String.Substring. I know
which I find significantly simpler to understand...

I agree! The format is well-known, so RegEx is not the best and most
simple solution. Of course, if you are a regular expressions expert,
this piece of code may be just as simple as using SubString...
 
Ludwig said:
I agree! The format is well-known, so RegEx is not the best and most
simple solution. Of course, if you are a regular expressions expert,
this piece of code may be just as simple as using SubString...

Only if you can assume that *everyone* who is going to read/maintain
the code is as happy with regular expressions. Don't forget that code
spends more time being maintained than being written - and it's often
maintained by different people.
 
Well, Jon, there is some validity to the idea that unfamiliarity with almost
any language makes it look cryptic. I remember when I first started working
with C, way back at the beginning of my programming education. It gave me
headaches! But now it's almost like reading English, which is an excellent
analogy, since English is actually one of the more difficult human languages
to learn.

That said, I am not happy with the fact that regular expression language
does not support splitting an expression across lines. I have always felt
that simply breaking the parts of it across lines would make it much easier
to read.

There are some tools (I'm partial to RegexBuddy) that create a graphic
representation of the syntax, and that is very helpful as well. But I sure
would like to see the ability to bread regular expressions across lines!
--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

Rene Sørensen said:
ludwigs exsample is good, another way to it is this.
the string is now spilt into groups, block1 is an eample of that


Regex RegexObj = new
Regex("\\A(?<block1>\\d{2})(?<block2>\\d{5})(?<block3>\\d{5})(?<block4>\\d)");
string block1 =RegexObj.Match(SubjectString).Groups["block1"].Value;

block1 = 50;
block2 = 21378;
block3 = 00239;
block4 = 2;

And this is a good example of why you should really think about whether
regular expressions are the right solution before using them. Pretend
you don't know what the bit of code is meant to do, and read the
version above. Then read Nick's version using String.Substring. I know
which I find significantly simpler to understand...
 
That said, I am not happy with the fact that regular expression language
does not support splitting an expression across lines. I have always felt
that simply breaking the parts of it across lines would make it much easier
to read.

It is called RegexOptions.IgnorePatternWhitespace. If you use that
option, whitespace characters, including newline, will be ignored and
the "#" character can be used for comments in the same way as "//" is
in c#.
 
Kevin Spencer said:
Well, Jon, there is some validity to the idea that unfamiliarity with almost
any language makes it look cryptic. I remember when I first started working
with C, way back at the beginning of my programming education. It gave me
headaches! But now it's almost like reading English, which is an excellent
analogy, since English is actually one of the more difficult human languages
to learn.

That said, I am not happy with the fact that regular expression language
does not support splitting an expression across lines. I have always felt
that simply breaking the parts of it across lines would make it much easier
to read.

There are some tools (I'm partial to RegexBuddy) that create a graphic
representation of the syntax, and that is very helpful as well. But I sure
would like to see the ability to bread regular expressions across lines!

But why add complexity to the mix when it's unnecessary? You're already
using C#, so it's reasonable that anyone reading the code will be
familiar with C#. It's not unreasonable to suggest that anyone who
knows C# has probably used the String.Substring method. However, it's
perfectly possible to write an awful lot of perfectly good code without
using regular expressions.

Basically, using regular expressions means you need to understand two
languages instead of one. That's fine when there's a significant
*benefit* in using regular expressions - but in this case there isn't.
 
Jon Skeet said:
Basically, using regular expressions means you need to understand two
languages instead of one. That's fine when there's a significant
*benefit* in using regular expressions - but in this case there isn't.

But in other situations there will be a benefit to using regex's. Which
means that it behooves any programmer to understand them. Which means that
there should be no disadvantage to using them.
 
Back
Top