Finding formatting items in a string

  • Thread starter Thread starter Jack
  • Start date Start date
J

Jack

Hi there,

Given a standard .NET string, does anyone know what the regular expression
would be to locate each (optional) formatting item in the string (or more
likely does anyone have a link that will show me this). For instance, given
the following simple string:

"My phone number is {0} and my SSN is {1}"

I want to enumerate (or create a collection of) all formatting items in the
string which would be "{0}" and "{1}" in this (trivial) example. The regular
expression itself should handle all legal cases of course (as described
under "composite formatting" in MSDN - see here:
http://msdn2.microsoft.com/en-us/library/txafckwd.aspx). Any help would be
appreciated. Thanks.
 
Hello Jack,
Hi there,

Given a standard .NET string, does anyone know what the regular
expression would be to locate each (optional) formatting item in the
string (or more likely does anyone have a link that will show me
this). For instance, given the following simple string:

"My phone number is {0} and my SSN is {1}"

I want to enumerate (or create a collection of) all formatting items
in the string which would be "{0}" and "{1}" in this (trivial)
example. The regular expression itself should handle all legal cases
of course (as described under "composite formatting" in MSDN - see
here: http://msdn2.microsoft.com/en-us/library/txafckwd.aspx). Any
help would be appreciated. Thanks.


The following expression will take care of most you want:

(?<!([^\{]|^)\{(\{{2})*)\{[0-9]+(,[-]?[0-9]+)?(:[^\}]+)?\}(?!\}(\{{2})*([^\}]|$))

I'll try to explain what it does:

(?<!([^\{]|^)\{(\{{2})*)
This part sees if we're dealing with an even number of opening {. In that
case all are escaped and should therefore be ignored.
Due to the fact that there is no easy way to check for off or even numbers
I've done it as follows:
- first make sure we're either at the beginning of a line or that we match
a character that is no {. That way we're sure where we're startign to count.
- Now chop off the first {, followed by any group of 2 extra {'s.

\{
- If that still leaves us with one {, then we're in business.

[0-9]+
- Now accept the numbered part. I've made it pretty simple here, any number
will so.

(,[-]?[0-9]+)?
- Now accept the optional alignment. I think you could write the [-] as [+-],
but I'm not sure from the top of my head that a plus is allowed for the alignment.
I guess it is though.

(:[^\}]+)?
- Accept almost anything as optional formatting mask. As you can specify
the formatting mask for each and every tipe differently based on the TypeFormatter,
I guess there's no use in limiting the possible formats any way.
- So chop off everything that's not a closing }

\}
- Pick off the closing }

(?!\}(\{{2})*([^\}]|$))
- But only if it's followed by no or an odd number of closing }'s. This used
the same logic as above.

You could make the regex more specific, but I guess this should get you started.

Also note that I haven't taken any whitespace into account, as I haven't
had time to experiment where you would be allowed to add whitespace and where
not.

If you still have any questions on how to improve or further limit the expression,
feel free to ask.
 
Hello Jack,

By the way, a great resource on the standard number formats can be found
here:
http://blog.stevex.net/index.php/string-formatting-in-csharp/

It lists all the valid formats or Date, the different numeric types and for
enums.

Also the expression below put's each part of the match in a neatly named
group for easy access should you need it:
(?<!([^\{]|^)\{(\{{2})*)\{(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?<format>:[^\}]+)?\}(?!\}(\{{2})*([^\}]|$))

I used the following set to test against:

{0} - pass
{0,-1} - pass
{0,1} - pass
{0:D} - pass
{0,-1:D} - pass
{{}} - fail
{{{0}}} - pass
{{0}} - fail
{0}} - fail

--
Jesse Houwing
jesse.houwing at sogeti.nl

Hello Jack,
Hi there,

Given a standard .NET string, does anyone know what the regular
expression would be to locate each (optional) formatting item in the
string (or more likely does anyone have a link that will show me
this). For instance, given the following simple string:

"My phone number is {0} and my SSN is {1}"

I want to enumerate (or create a collection of) all formatting items
in the string which would be "{0}" and "{1}" in this (trivial)
example. The regular expression itself should handle all legal cases
of course (as described under "composite formatting" in MSDN - see
here: http://msdn2.microsoft.com/en-us/library/txafckwd.aspx). Any
help would be appreciated. Thanks.
The following expression will take care of most you want:

(?<!([^\{]|^)\{(\{{2})*)\{[0-9]+(,[-]?[0-9]+)?(:[^\}]+)?\}(?!\}(\{{2})
*([^\}]|$))

I'll try to explain what it does:

(?<!([^\{]|^)\{(\{{2})*)
This part sees if we're dealing with an even number of opening {. In
that
case all are escaped and should therefore be ignored.
Due to the fact that there is no easy way to check for off or even
numbers
I've done it as follows:
- first make sure we're either at the beginning of a line or that we
match
a character that is no {. That way we're sure where we're startign to
count.
- Now chop off the first {, followed by any group of 2 extra {'s.
\{
- If that still leaves us with one {, then we're in business.
[0-9]+
- Now accept the numbered part. I've made it pretty simple here, any
number
will so.
(,[-]?[0-9]+)?
- Now accept the optional alignment. I think you could write the [-]
as [+-],
but I'm not sure from the top of my head that a plus is allowed for
the alignment.
I guess it is though.
(:[^\}]+)?
- Accept almost anything as optional formatting mask. As you can
specify
the formatting mask for each and every tipe differently based on the
TypeFormatter,
I guess there's no use in limiting the possible formats any way.
- So chop off everything that's not a closing }
\}
- Pick off the closing }
(?!\}(\{{2})*([^\}]|$))
- But only if it's followed by no or an odd number of closing }'s.
This used
the same logic as above.
You could make the regex more specific, but I guess this should get
you started.

Also note that I haven't taken any whitespace into account, as I
haven't had time to experiment where you would be allowed to add
whitespace and where not.

If you still have any questions on how to improve or further limit the
expression, feel free to ask.
 
Hello Jack,

I even found a FAQ on this... I'm going to write a blogpost about this pattern
at some poitn I guess. It has a lot of interesting regex things in it.

The FAQ:
http://msdn2.microsoft.com/en-us/netframework/aa569608.aspx

And a further completed regex, including the fact that you can use { and
} within the custom pattern if you want to... Just escape them again. Which
makes the whole escaping { harder and harder to understand...

This is the expression I've got so far:
(?<!([^\{]|^){({{)*){(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?<format>:([^{}]|{{|}})+)?}(?!}({{)*([^}]|$))

note also that I removed the \ before most, if not all of the { and } in
the expression. It seems that the .NET regex parser is quite content with
this. Only if the {n(,(m)?)?} format is used, do you need to escape the {
and the }. I found that by accident. Not that it makes the expression any
easier to read... *sigh*...
 
Hello Jack,

And finally you could use the following expression to test if a format string
is correct, or at least has it's {'s and }'s in the rigth place.

Hello Jack,

By the way, a great resource on the standard number formats can be
found
here:
http://blog.stevex.net/index.php/string-formatting-in-csharp/
It lists all the valid formats or Date, the different numeric types
and for enums.

Also the expression below put's each part of the match in a neatly
named

group for easy access should you need it:

(?<!([^\{]|^)\{(\{{2})*)\{(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?
<format>:[^\}]+)?\}(?!\}(\{{2})*([^\}]|$))

I used the following set to test against:

{0} - pass
{0,-1} - pass
{0,1} - pass
{0:D} - pass
{0,-1:D} - pass
{{}} - fail
{{{0}}} - pass
{{0}} - fail
{0}} - fail
--
Jesse Houwing
jesse.houwing at sogeti.nl
Hello Jack,
Hi there,

Given a standard .NET string, does anyone know what the regular
expression would be to locate each (optional) formatting item in the
string (or more likely does anyone have a link that will show me
this). For instance, given the following simple string:

"My phone number is {0} and my SSN is {1}"

I want to enumerate (or create a collection of) all formatting items
in the string which would be "{0}" and "{1}" in this (trivial)
example. The regular expression itself should handle all legal cases
of course (as described under "composite formatting" in MSDN - see
here: http://msdn2.microsoft.com/en-us/library/txafckwd.aspx). Any
help would be appreciated. Thanks.
The following expression will take care of most you want:

(?<!([^\{]|^)\{(\{{2})*)\{[0-9]+(,[-]?[0-9]+)?(:[^\}]+)?\}(?!\}(\{{2}
) *([^\}]|$))

I'll try to explain what it does:

(?<!([^\{]|^)\{(\{{2})*)
This part sees if we're dealing with an even number of opening {. In
that
case all are escaped and should therefore be ignored.
Due to the fact that there is no easy way to check for off or even
numbers
I've done it as follows:
- first make sure we're either at the beginning of a line or that we
match
a character that is no {. That way we're sure where we're startign to
count.
- Now chop off the first {, followed by any group of 2 extra {'s.
\{
- If that still leaves us with one {, then we're in business.
[0-9]+
- Now accept the numbered part. I've made it pretty simple here, any
number
will so.
(,[-]?[0-9]+)?
- Now accept the optional alignment. I think you could write the [-]
as [+-],
but I'm not sure from the top of my head that a plus is allowed for
the alignment.
I guess it is though.
(:[^\}]+)?
- Accept almost anything as optional formatting mask. As you can
specify
the formatting mask for each and every tipe differently based on the
TypeFormatter,
I guess there's no use in limiting the possible formats any way.
- So chop off everything that's not a closing }
\}
- Pick off the closing }
(?!\}(\{{2})*([^\}]|$))
- But only if it's followed by no or an odd number of closing }'s.
This used
the same logic as above.
You could make the regex more specific, but I guess this should get
you started.
Also note that I haven't taken any whitespace into account, as I
haven't had time to experiment where you would be allowed to add
whitespace and where not.

If you still have any questions on how to improve or further limit
the expression, feel free to ask.
 
Hello Jack,

And finally you could use the following expression to test if a format string
is correct, or at least has it's {'s and }'s in the right place.

\A((?<!([^\{]|^){({{)*){([0-9]+)(,([-+]?[0-9]+))?(:(([^{}]|{{|}})+))?}(?!}({{)*([^}]|$))|{{|}}|[^{}]*)*\z

It iterates the whole input and will either accept a correct formatting string,
or a correctly escaped { or } or any other character until it finds the end
of the input.

That should allow you to prevent FormatExceptions as well.

This has been fun... now to think of how to put it in a nice article...

Jesse

Hello Jack,

By the way, a great resource on the standard number formats can be
found
here:
http://blog.stevex.net/index.php/string-formatting-in-csharp/
It lists all the valid formats or Date, the different numeric types
and for enums.

Also the expression below put's each part of the match in a neatly
named

group for easy access should you need it:

(?<!([^\{]|^)\{(\{{2})*)\{(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?
<format>:[^\}]+)?\}(?!\}(\{{2})*([^\}]|$))

I used the following set to test against:

{0} - pass
{0,-1} - pass
{0,1} - pass
{0:D} - pass
{0,-1:D} - pass
{{}} - fail
{{{0}}} - pass
{{0}} - fail
{0}} - fail
--
Jesse Houwing
jesse.houwing at sogeti.nl
Hello Jack,
Hi there,

Given a standard .NET string, does anyone know what the regular
expression would be to locate each (optional) formatting item in the
string (or more likely does anyone have a link that will show me
this). For instance, given the following simple string:

"My phone number is {0} and my SSN is {1}"

I want to enumerate (or create a collection of) all formatting items
in the string which would be "{0}" and "{1}" in this (trivial)
example. The regular expression itself should handle all legal cases
of course (as described under "composite formatting" in MSDN - see
here: http://msdn2.microsoft.com/en-us/library/txafckwd.aspx). Any
help would be appreciated. Thanks.
The following expression will take care of most you want:

(?<!([^\{]|^)\{(\{{2})*)\{[0-9]+(,[-]?[0-9]+)?(:[^\}]+)?\}(?!\}(\{{2}
) *([^\}]|$))

I'll try to explain what it does:

(?<!([^\{]|^)\{(\{{2})*)
This part sees if we're dealing with an even number of opening {. In
that
case all are escaped and should therefore be ignored.
Due to the fact that there is no easy way to check for off or even
numbers
I've done it as follows:
- first make sure we're either at the beginning of a line or that we
match
a character that is no {. That way we're sure where we're startign to
count.
- Now chop off the first {, followed by any group of 2 extra {'s.
\{
- If that still leaves us with one {, then we're in business.
[0-9]+
- Now accept the numbered part. I've made it pretty simple here, any
number
will so.
(,[-]?[0-9]+)?
- Now accept the optional alignment. I think you could write the [-]
as [+-],
but I'm not sure from the top of my head that a plus is allowed for
the alignment.
I guess it is though.
(:[^\}]+)?
- Accept almost anything as optional formatting mask. As you can
specify
the formatting mask for each and every tipe differently based on the
TypeFormatter,
I guess there's no use in limiting the possible formats any way.
- So chop off everything that's not a closing }
\}
- Pick off the closing }
(?!\}(\{{2})*([^\}]|$))
- But only if it's followed by no or an odd number of closing }'s.
This used
the same logic as above.
You could make the regex more specific, but I guess this should get
you started.
Also note that I haven't taken any whitespace into account, as I
haven't had time to experiment where you would be allowed to add
whitespace and where not.

If you still have any questions on how to improve or further limit
the expression, feel free to ask.
 
Thanks very much for your assistance (and enthusiasm). I'll take a look at
everything in detail and post back if I find any issues. Thanks again
(greatly appreciated).
 
Just an update that it looks very good so far (thanks). I haven't unravelled
the opening and closing brace (stuff) yet but I think there may be a (rare)
problem with the handling of the ":[formatString]". Any pairs of "{{" or
"}}" are valid in "formatString" if I understand the docs correctly so they
should be ignored. I'm still reviewing the situation however (and your code)
so this is just a heads-up before you start tackling your article :)
 
Jack said:
Just an update that it looks very good so far (thanks). I haven't
unravelled the opening and closing brace (stuff) yet but I think there may
be a (rare) problem with the handling of the ":[formatString]". Any pairs
of "{{" or "}}" are valid in "formatString" if I understand the docs
correctly so they should be ignored. I'm still reviewing the situation
however (and your code) so this is just a heads-up before you start
tackling your article :)

Ok. It appears that your expression:

(?<!([^\{]|^)\{(\{{2})*)\{(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?<format>:[^\}]+)?\}(?!\}(\{{2})*([^\}]|$))

may need to be modified slightly:

(?<!([^\{]|^)\{(\{{2})*)\{(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?<format>:([^\}]|\}{2})+)?\}(?!\}(\{{2})*([^\}]|$))

I've simply changed the "format" so that it in addition to allowing one or
more of any character except a "}" (as per your original expression), it
also now allows one or more pairs of "}}" (before the final "}" that
terminates it). I'm still digging through it all though as I rarely ever
work with regular expressions.
 
Hello Jack,
Just an update that it looks very good so far (thanks). I haven't
unravelled the opening and closing brace (stuff) yet but I think
there may be a (rare) problem with the handling of the
":[formatString]". Any pairs of "{{" or "}}" are valid in
"formatString" if I understand the docs correctly so they should be
ignored. I'm still reviewing the situation however (and your code) so
this is just a heads-up before you start tackling your article :)
Ok. It appears that your expression:

(?<!([^\{]|^)\{(\{{2})*)\{(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?
<format>:[^\}]+)?\}(?!\}(\{{2})*([^\}]|$))

may need to be modified slightly:

(?<!([^\{]|^)\{(\{{2})*)\{(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?
<format>:([^\}]|\}{2})+)?\}(?!\}(\{{2})*([^\}]|$))

I've simply changed the "format" so that it in addition to allowing
one or more of any character except a "}" (as per your original
expression), it also now allows one or more pairs of "}}" (before the
final "}" that terminates it). I'm still digging through it all though
as I rarely ever work with regular expressions.

That would indeed solve the issue. I've been experimenting a bit more and
came to the same conclusion...

To make it even less readable, but shorter, you can remove the escapes from
the \{ and \} to make it the following expression:

(?<!([^{]|^){({{)*){(?<item>[0-9]+)(?<alignment>,[-+]?[0-9]+)?(?<format>:([^{}]|}}|{{)+)?}(?!}({{)*([^}]|$))

Also as far as I can tell the opening { must be escaped in the format pattern
as well. I adjusted the above expression for that.
 
Back
Top