string routines and code libraries

Z

zoro

Hi,
I am new to C#, coming from Delphi. In Delphi, I am using a 3rd party
string handling library that includes some very useful string
functions, in particular I'm interested in BEFORE (return substring
before a pattern), AFTER (return substring after a pattern), and
BETWEEN (return substring between 2 patterns).
My questions are:
1. Can any tell me how I can implement such functionality in C#?
2. Is it possible to add/include function libraries to C~, and if so
how?

Thank you very much for your help.

Zoro.
 
P

Paul E Collins

zoro said:
I'm interested in BEFORE (return substring
before a pattern), AFTER (return substring
after a pattern), and BETWEEN (return substring
between 2 patterns). [...]
1. Can any tell me how I can implement such
functionality in C#?

Use String.IndexOf to find the patterns, and String.Substring to
extract the text between them. An alternative would be regular
expressions (available in System.Text), but they're less efficient and
some people find them hard to learn and understand.
2. Is it possible to add/include function libraries
to C~, and if so how?

Certainly possible.

Write your functions ("methods") in a separate class, and prefix calls
with the name of the class (or an object of the class, if the method
is non-static) and a dot. If the class is in a different namespace,
you need to import it with the "using" statement, just as you would
when you write e.g. "using System.IO".

P.
 
K

Kevin Spencer

Hi Zoro,

I'm not familiar with Dephi, so I may be misinterpreting the meaning of the
functions here. Assuming that when you mention "pattern" you are talking
about a Regular Expression-type pattern, I can see how such functions might
indeed be useful. As I may need such functions in the future as well, I've
taken the liberty of writing a few .Net methods for doing what you're
talking about:

/// <summary>
/// Returns all Indices of Regex Matches in a string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Array of indices of all matches</returns>
/// <remarks>If no matches are found, zero-length array is
returned</remarks>
public static int[] IndicesOf(string input, string pattern)
{
int[] returnVal;
Regex rx = new Regex(pattern);
MatchCollection matches = rx.Matches(input);
returnVal = new int[matches.Count];
for (int i = 0; i < matches.Count; i++)
returnVal = matches.Index;
return returnVal;
}

/// <summary>
/// Returns first index of a Regex match in a string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Index of first match in input string</returns>
/// <remarks>Returns -1 if no match is found</remarks>
public static int IndexOf(string input, string pattern)
{
Regex rx = new Regex(pattern);
if (!rx.IsMatch(input)) return -1;
return rx.Match(input).Index;
}

/// <summary>
/// Returns the index of the last match of a pattern in an input string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Index of the last match of a pattern in the input
string</returns>
public static int LastIndexOf(string input, string pattern)
{
int[] vals = IndicesOf(input, pattern);
if (vals.Length == 0) return -1;
return vals[vals.Length - 1];
}

/// <summary>
/// Returns a Substring of a string
/// before or after the first occurrence of a pattern in the string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <param name="before">Get the Substring before the pattern?</param>
/// <returns>Substring of input string, starting from the beginning
/// of the string and ending before the first character of the match,
/// or, if before is false, starting from the end of the match, and ending
/// at the end of the string.</returns>
/// <remarks>If there is no match, returns the input string.
/// If before is false, returns the substring after the match</remarks>
public static string Substring(string input, string pattern, bool before)
{
int i;
if (before)
{
i = IndexOf(input, pattern);
if (i > -1) return input.Substring(0, i);
}
else
{
Regex rx = new Regex(pattern);
MatchCollection matches = rx.Matches(input);
if (matches.Count > 0)
{
i = matches[matches.Count - 1].Index + matches[matches.Count -
1].Value.Length;
return input.Substring(i);
}
}
return input;
}

/// <summary>
/// Finds a substring of ain input string between 2 pattern matches
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern1">first pattern</param>
/// <param name="pattern2">second pattern</param>
/// <returns>Substring of input string between the 2 patterns</returns>
/// <remarks><para>The order of the patterns is only important if both
/// paterns are found, and are not identical patterns.
/// If the patterns are different, and both patterns are found,
/// the substring returned will be the substring between them
/// regardless of the order in which they appear in the input text</para>
/// <para>If both patterns are found, but their matches overlap, there
/// is nothing between them, and a blank string is returned</para>
/// <para>If both patterns are found, and they are the same pattern,
/// The method will look for a second occurrence of the pattern, and
/// attempt to return the substring between the first and second match
/// of the pattern used. If there is not a second match, the patterns
/// overlap, as they occupy the same space,
/// and there is nothing between.</para>
/// <para>If the first pattern is found, but the second pattern
/// is not found, the substring will be either the substring
/// of the input string after the first match of pattern1,
/// or if the first pattern matches the end of the string,
/// the substring of the string after the first match of pattern1</para>
/// <para>If the second pattern is found, but the first pattern is not,
/// the substring will be either the substring of the input string
/// before the beginning of the first match of the second pattern,
/// or if the second pattern is the beginning of the string, the
/// substring of the string after the end of the first match of the
/// second pattern.</para>
/// <para>If neither pattern is found, the entire input string will
/// be returned.</para>
/// </remarks>
public static string SubstringBetween(string input,
string pattern1, string pattern2)
{
// indices of 2 matches matching 2 patterns
int index1 = -2, index2 = -2;

// 2 Matches to use in calculation
Match m1 = null, m2 = null;
int len1, len2;

// Calculate first match
if (!Regex.IsMatch(input, pattern1)) index1 = -1;
else
{
m1 = Regex.Match(input, pattern1);
index1 = m1.Index;
}

// Calculate second match
if (!Regex.IsMatch(input, pattern2)) index2 = -1;
else
{
m2 = Regex.Match(input, pattern2);
index2 = m2.Index;
}

// if neither is found, return input
if (index1 == -1 && index2 == -1) return input;

// Otherwise, at least 1 is found. Return a substring

// pattern1 not found.
if (index1 == -1)
{
if (index2 > 0)
return input.Substring(0, index2); // treat as second
else
return input.Substring(index2 + m2.Length); // treat as first
}

// Used for no pattern2, identical patterns, and overlaps

// Length of input to end of m1
len1 = index1 + m1.Length;

//pattern2 not found.
if (index2 == -1)
{
if (len1 < input.Length)
return input.Substring(len1); // treat as first
else
return input.Substring(0, index1); // treat as second
}

// Length of input to end of m2
len2 = index2 + m2.Length;

//Test for identical patterns
if (pattern1 == pattern2)
{
int[] indices = IndicesOf(input, pattern1);
// overlap, as both are the same
if (indices.Length == 1) return "";
return input.Substring(len1, indices[1] - len1);
}

// Not identical patterns. Test for overlap

// Test for overlap (index2 falls inside m1)
if (index2 >= index1 && index2 <= len1) return "";

// Test for overlap (index1 falls inside m2)

if (index1 >= index2 && index1 <= len2) return "";

// No overlap. See which one is first, and get value between

// m1 is first match
if (index2 < index1)
return input.Substring(len2, index1 - len2);

// m2 is first match
// Length of input to end of m1
len1 = index1 + m1.Length;
return input.Substring(len1, index2 - len1);
}

/// <summary>
/// Returns a Substring of a string
/// before the first occurrence of a pattern in the string
/// </summary>
/// <param name="input">string to evaluate</param>
/// <param name="pattern">pattern to match</param>
/// <returns>Substring of input string, starting from the beginning
/// of the string and ending before the first character of the
pattern</returns>
/// <remarks>If there is no match, returns the input string.
public static string Substring(string input, string pattern)
{
return Substring(input, pattern, true);
}

A couple of notes: You will need to reference the
System.Text.RegularExpressions NameSpace to use these. You may want to
change the names of the methods for clarity. I have them in a class for
doing Regular Expression functions, so the class name is sufficient for my
needs. Also, carefully examine the Substring method in particular. The rules
for it are fairly complex, and may not conform to the same rules in Delphi.
I have commented it quite a bit for clarity. It is not primarily concerned
about the order of the 2 patterns, unless one of them is not found. It
returns the entire string if neither of them is found. If only one pattern
is not found, it attempts first to use the order in which they appear, but
the rule changes if, for example, the first pattern matches, but at the end
of the string, or the second pattern matches, but at the beginning of the
string. In essence, it treats a non-match as if it were a blank string.

Of course, these methods could be expanded an extended quite a bit. Some of
them only look for a single match. But they should give you (or anyone) a
good starting point for your own class library.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.
 
Z

zoro

Thank you very much for all the suggestions. It still looks very
complex for the simple functions I wanted:

AFTER returns the substring AFTER the pattern so:
str := AFTER('@', '(e-mail address removed)');
str = 'microsoft.com'

BEFORE returns the substring BEFORE the pattern so:
str := BEFORE('@', '(e-mail address removed)');
str = 'bill.gates

BETWEEN returns the substring BETWEEN 2 patterns so:
str := BETWEEN('@', '.', '(e-mail address removed)');
str = 'microsoft'

There must be a simpler way to achieve this in C# - surely?
Also, does anyone know of a third party library that will include such
functions?

Thanks again,

ilZoro.
 
J

Jon Skeet [C# MVP]

zoro said:
Thank you very much for all the suggestions. It still looks very
complex for the simple functions I wanted:

AFTER returns the substring AFTER the pattern so:
str := AFTER('@', '(e-mail address removed)');
str = 'microsoft.com'

BEFORE returns the substring BEFORE the pattern so:
str := BEFORE('@', '(e-mail address removed)');
str = 'bill.gates

BETWEEN returns the substring BETWEEN 2 patterns so:
str := BETWEEN('@', '.', '(e-mail address removed)');
str = 'microsoft'

There must be a simpler way to achieve this in C# - surely?

All of those can be done with IndexOf very easily.
Also, does anyone know of a third party library that will include such
functions?

No, but there may be one around. However, it would be only a matter of
about five minutes to write you one for the above. What else would you
want?
 
T

The Crow

public sealed class StringHelper
{

public static string Before(string pattern, string strLookup)
{
int index = strLookup.IndexOf(pattern);
if(index > -1)
return strLookup.SubString(0, index);
else
return null;
}

public static string After(string pattern, string strLookup)
{
int index = strLookup.IndexOf(pattern);
if(index > -1)
return strLookup.SubString(index, strLookup.Length - index);
else
return null;
}

public static string Between(string pattern1, string pattern2, string
strLookup)
{
int index1 = strLookup.IndexOf(pattern1);
int index2 = strLookup.IndexOf(pattern2);

if(index1 == -1 && index2 == -1) // if either is not found, return null
return null;
else if(index1 == -1)
return strLookup.SubString(index1, strLookup.Length - index); // if
only first pattern is found, return after.
else if(index2 == -1)
return strLookup.SubString(0, index2); // if only second pattern is
found, return before.
else
return strLookup.SubString(index1, (strLookup.Length - index1 -
index 2)); // else return between
}

}






if you want to use this class as a librar, create a new dynamic code library
project under Visual Studio, insert this class in your project, compile it,
and then use output dll in your desired projects by referencing it.
 
K

Kevin Spencer

Well, you did say "pattern." I assumed you were talking about a pattern. In
any case, all you have to do is use the functions I wrote for you. Don't
know how I could have made it any easier.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.
 
Z

zoro

Thank you all for your help. Sorry, I didn't I didn't make myself
clear the first time.
Zoro.
 
Z

zoro

Thanks crow - your solutions are exactly what I needed. But instead
of compiling this to a dll, shouldn't it be possible/desirable in C#
to add these functions to the system some how, by expanding the built
in string class?

Thanks,
Zoro.
 
T

The Crow

you may inherit from string and add this methods, but i think this would be
very bad idea. and you cant add static method definitions to String class.
 
J

Jon Skeet [C# MVP]

you may inherit from string and add this methods, but i think this would be
very bad idea. and you cant add static method definitions to String class.

No, you can't derive from string - it's a sealed class.
 
J

Jon Skeet [C# MVP]

zoro said:
Thanks crow - your solutions are exactly what I needed. But instead
of compiling this to a dll, shouldn't it be possible/desirable in C#
to add these functions to the system some how, by expanding the built
in string class?

You can't change the string class, but by adding a DLL you effectively
are adding them to "the system" as far as the code which uses it is
concerned - you just need to add a reference to your library in the
same way that you add references to system libraries.
 
M

Michael S

zoro said:
Hi,
I am new to C#, coming from Delphi.

Welcome!

I am also from the Delphi (and Borland C++) corner.
I think you'll find that the 'Hejlsberg-fenonemon' is very much present in
..NET. You'll learn C# in a jiffy! =)
But forget all you knew about strings. They are invariant in .NET and not as
cool as in Delphi.

I'm still waiting for .NET (and Java) to have a string class that have the
by-reference-but-with-copy-on-write-semantics as in Delphi. There is
something missing between String and StringBuilder. We sure need strings
like in Delphi for performance....

Anybody knows why we don't get such a class? If I think for 2 seconds I'd
imagine it would screw up the GC as every such string must be pinned to a
memory location. If anyone else could think for like 4 seconds or even a
minute, I would appriecate your input on why and why not.

I sure miss 'em....

Happy Strings
- Michael S
 
K

Kevin Spencer

Anybody knows why we don't get such a class? If I think for 2 seconds I'd
imagine it would screw up the GC as every such string must be pinned to a
memory location. If anyone else could think for like 4 seconds or even a
minute, I would appriecate your input on why and why not.

Ask Anders Hejlsberg. He led the team that created Delphi AND the Mcirosoft
..Net platform.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.
 
M

Michael S

Kevin Spencer said:
Ask Anders Hejlsberg. He led the team that created Delphi AND the
Mcirosoft .Net platform.

No shit!
I've been a follower of Hejlsberg since Turbo Pascal..

<joke>But since I showed up in tiger-tanga-lingerie, he sorta stopped
calling me</joke>

But this is not the question. My question is why we don't have a
variant-copy-on-write-string in the .NET-framework..

Happy Coding
- Michael S
 
K

kevin cline

Are you imagining some sort of reference-counting scheme where a string
will only be copied if there is more than one reference to the string?
That doesn't play well with threading.
 
K

kevin cline

In a word, no, it's not desirable. You end up with a huge number of
simple functions that are relatively useless, because it's rare that
one of them will do the entire job. And once you need more than one
function call, you might as well use a regular expression, which often
can do the whole job.
 
M

Michael S

kevin cline said:
Are you imagining some sort of reference-counting scheme where a string
will only be copied if there is more than one reference to the string?
That doesn't play well with threading.

No, it doesn't play well with threading at all.
But I'm not dreaming. It is all there. And I don't take credit for it as it
has been in Delphi since 2.0. =)

Have a look how strings are done in Delphi and you'll see something neat.
Or don't. I'll do it for you...

I'm not saying that System.String should be replaced, but that a sorta
System.StringBuffer would be desirable.
I just picked the name from Java, just to make sure Javaites would get
really really confused...

StringBuffer o1 = "Hello World!" // o1 points to virtual memory of address
1000 and has a refcount of 1.
StringBuffer o2 = o1; // o2 now also points to the memory address of 1000
that keeps a refcount of 2. No chars hurt!
o2.CharAt[1] = 'a'; // Now a new string gets copied to the heap at address
2000 and points to 'Hallo World!".
StringBuffer o3 = o2; // o3 is simply a reference to address of 2000. No
chars was copied.

But there is more to strings in Delphi. A string in Delphi also keeps its
length.

o1 = "OK; // o1 still points to the memory of address 1000 containing "OKllo
World!"
o1 = "Now this is really cool"; // The allocated space of o1 cannot hold the
string. It is being copied to address of 3000.

There is (somewhat) no magic. This is how the structure works.

[32-bit refcount][32-bit allocated][32-bit length[0
depricated]][1][2][3][4]...[N] ascii characters.

o1 = "Get it?" //o1 does not reallocate. It stays at 3000 and contains "Get
it?s is really cool"

Hence the reference of o1 would point to address of 3000:
1, 23, 7 [points here]Get it?s is really cool

Also why Length(o1) in Delphi is actually nothing more than a single fetch
of the address with a -3 offset. No strlen needed at all.

Happy Strings
- Michael S
 
J

Jon Skeet [C# MVP]

kevin cline said:
In a word, no, it's not desirable. You end up with a huge number of
simple functions that are relatively useless, because it's rare that
one of them will do the entire job. And once you need more than one
function call, you might as well use a regular expression, which often
can do the whole job.

Do you use regular expressions every time you need to do more than one
operation on a string then? I certainly don't. I'd rather see a few
simple operations than one regular expression which could take a while
to understand or even to write properly in the first place.

Regular expressions are great when they take the place of *complicated*
string processing, but when you've just got a few operations to
perform, I'll take the simplicity of straight string operations any
day.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top