Convert.ToString( double ) + xsd:pattern + RegEx == ?

  • Thread starter Thread starter aevans1108
  • Start date Start date
A

aevans1108

expanding this message to microsoft.public.dotnet.xml

Greetings

Please direct me to the right group if this is an inappropriate place
to post this question. Thanks.

I want to format a numeric value according to an arbitrary regular
expression.

Background:

I have an XML schema that I have no control over. It is filled with
simpleTypes with restrictions that include xsd:pattern elements, eg:

<xsd:restriction base="xsd:float">
<xsd:minInclusive value="0.000" />
<xsd:maxInclusive value="10.000" />
<xsd:pattern value="\d{1,2}\.\d{3}" />
</xsd:restriction>

These patterns and BaseTypes vary widely.

I am dynamically creating an XmlDocument at runtime and validating it
against the schema. Each XmlDocumentElement.InnerText property is
populated from a data structure that contains an object reference that
points to a value of the appropriate type. That is: The object
references may point to a float or to a string or whatever is called
for, depending on the BaseType of the corresponding simpleType.
(Usings strings for everything is not an option.)

The problem is that in order to create the XmlDocument, I have to
convert an object that might actually be a float or an int in to a
formatted string.

I can't just go

myElement.InnerText = Convert.ToString( Convert.ToDouble( myObject ) );

because myElement.InnerText will then contain strings that look like
this

1.000 ==> "1"
1.050 ==> "1.05"

which cannot be guaranteed to satisfy the schema. It certainly doesn't
satisfy the one above. In the case of that schema, I would obviously
want to have conversions like this:

1.000 ==> "1.000"
1.050 ==> "1.050"

Yes, I know I can easily write code to do this for a single regular
expression, but the problem is that there is no guarantee what the
xsd:pattern is going to be. Hard coding something that uses three
decimal places just won't do.

Here's what I would like to do:

myElement.InnerText = FormatAccordingToRegEx( Convert.ToDouble(
myObject ), myXsdPattern );

Remember, myXsdPattern could be anything, so I can't do something like

Regex.Replace( Convert.ToString( Convert.ToDouble( myObject ) ),
myXsdPattern, "${1},${2}.${3}" ); // not sure about syntax here

That won't work if myXsdPattern = "\d{1,}.\d{3}" or "\d{1,}.\d{1}" or
"-{0,1}\d{1,2}.\d{1}" (to list a few examples); and even if it did, I
think I would need to have access to the schema so I could interject
parentheses in to the regular expressions/xsd:patterns. As I said, I
have no control over the schema. (Although, the ultimate solution may
indeed require me to read the xsd:pattern in, insert parentheses
according to some algorithm and then do something like the code
fragment above. I don't know.)

Now, I'm pretty sure that xsd:pattern will only be used with numeric
values like xsd:float and xsd:int in the schema. I don't think it
would be possible to do this with an arbitrary string anyway because in
many cases, several formatted outputs are possible from one input... so
I would think this simplifies the problem a bit. (Several outputs are
possible from one input with numbers as well, but they all mean the
same thing -- leading zeros and commas are pretty irrelevant. A number
is still a number.)

I hope I'm being clear. I'm grammatically challenged today.

I know this is a bit backward, but I'm though I'd check and see if
anyone has already written code to do this. I will be grateful for any
and all suggestions -- even guesses about what to try.
Thanks in advance.

Tony
 
(e-mail address removed) wrote in
expanding this message to microsoft.public.dotnet.xml

Greetings

Please direct me to the right group if this is an inappropriate
place to post this question. Thanks.

I want to format a numeric value according to an arbitrary
regular expression.

Background:

I have an XML schema that I have no control over. It is filled
with simpleTypes with restrictions that include xsd:pattern
elements, eg:

<xsd:restriction base="xsd:float">
<xsd:minInclusive value="0.000" />
<xsd:maxInclusive value="10.000" />
<xsd:pattern value="\d{1,2}\.\d{3}" />
</xsd:restriction>

These patterns and BaseTypes vary widely.

I am dynamically creating an XmlDocument at runtime and
validating it against the schema. Each
XmlDocumentElement.InnerText property is populated from a data
structure that contains an object reference that points to a
value of the appropriate type. That is: The object references
may point to a float or to a string or whatever is called for,
depending on the BaseType of the corresponding simpleType.
(Usings strings for everything is not an option.)

The problem is that in order to create the XmlDocument, I have
to convert an object that might actually be a float or an int in
to a formatted string.

I can't just go

myElement.InnerText = Convert.ToString( Convert.ToDouble(
myObject ) );

because myElement.InnerText will then contain strings that look
like this

1.000 ==> "1"
1.050 ==> "1.05"

which cannot be guaranteed to satisfy the schema. It certainly
doesn't satisfy the one above. In the case of that schema, I
would obviously want to have conversions like this:

1.000 ==> "1.000"
1.050 ==> "1.050"

Yes, I know I can easily write code to do this for a single
regular expression, but the problem is that there is no
guarantee what the xsd:pattern is going to be. Hard coding
something that uses three decimal places just won't do.

Here's what I would like to do:

myElement.InnerText = FormatAccordingToRegEx( Convert.ToDouble(
myObject ), myXsdPattern );

Remember, myXsdPattern could be anything, so I can't do
something like

Regex.Replace( Convert.ToString( Convert.ToDouble( myObject ) ),
myXsdPattern, "${1},${2}.${3}" ); // not sure about syntax here

That won't work if myXsdPattern = "\d{1,}.\d{3}" or
"\d{1,}.\d{1}" or "-{0,1}\d{1,2}.\d{1}" (to list a few
examples); and even if it did, I think I would need to have
access to the schema so I could interject parentheses in to the
regular expressions/xsd:patterns. As I said, I have no control
over the schema. (Although, the ultimate solution may indeed
require me to read the xsd:pattern in, insert parentheses
according to some algorithm and then do something like the code
fragment above. I don't know.)

Now, I'm pretty sure that xsd:pattern will only be used with
numeric values like xsd:float and xsd:int in the schema. I don't
think it would be possible to do this with an arbitrary string
anyway because in many cases, several formatted outputs are
possible from one input... so I would think this simplifies the
problem a bit. (Several outputs are possible from one input with
numbers as well, but they all mean the same thing -- leading
zeros and commas are pretty irrelevant. A number is still a
number.)

I hope I'm being clear. I'm grammatically challenged today.

I know this is a bit backward, but I'm though I'd check and see
if anyone has already written code to do this. I will be
grateful for any and all suggestions -- even guesses about what
to try. Thanks in advance.

Tony,

Interesting problem. Regexes are usually used to see if a given piece of
data matches a given format. I'm not aware of any way to directly use a
regex to *make* the data match the format.

FWIW, .Net has numeric format strings that supply this functionality.

http://msdn.microsoft.com/library/d...us/cpguide/html/cpconnumericformatstrings.asp

or

http://tinyurl.com/4tus6

I know you stated you have no direct control over the XSD, but if it's
feasible, you should find a way to get the author of the XSD to insert
an additional element in the xsd:restriction element. Something like:

<xsd:format value="0#.###"/>

If that's not possible, there may be a way to hack something together...

Correct me if I'm wrong, but it seems the algorithm you're trying to
produce is:

int input = 4;
string regex = "\d{1,2}\.\d{3}"; // No optional clauses.
string result = BlackBox(input, regex);

// result would be 4.000.

int input = -4;
string regex = "-{0,1}\d{1,2}.\d{1}"; // Optional minus sign.
string result = BlackBox(input, regex);

// result would be -4.0.

I'm assuming that illegal input values that are out of range of the
supplied regular expression will be caught using xsd:minInclusive
and xsd:maxInclusive.

I'm also assuming the format of the regex roughly matches the format
of the xsd:minInclusive and xsd:maxInclusive values. If that's the
case, then you may be able to use that knowledge to use the supplied
regex to format the input value.

It appears from the regex examples you give that the number of digits
to the right side of the decimal point are usually fixed, whereas the
number of digits on the left side of the decimal point are variable.

Using that observation, it's possible to pad the input value with
zeros on the right side of the decimal point, and use the regex to
extract the formatted number:


// Compile with "csc /t:exe example.cs"

using System;
using System.Text.RegularExpressions;

namespace Example
{
public class TestClass
{
public static int Main(string[] args)
{
Console.WriteLine(BlackBox("4", @"\d{1,2}\.\d{3}", "10.000"));
Console.WriteLine(BlackBox("4", @"\d{1,}.\d{3}", "1000.000"));
Console.WriteLine(BlackBox("4", @"\d{1,}.\d{1}", "1000.0"));
Console.WriteLine(BlackBox("-4", @"-{0,1}\d{1,2}.\d{3}", "10.000"));

return 0;
}

public static string BlackBox(string input, string regex, string maxInput)
{
const string decimalPoint = ".";

if (maxInput.IndexOf(decimalPoint) > 0)
{
if (input.IndexOf(decimalPoint) < 0)
input += decimalPoint;

// Pad input with zeros to the right of the decimal
// point so it matches the number of decimal places in
// maxInput (we don't care about numbers on the left side of
// the decimal point).
// For example, if maxInput = 12.3456, and
// input = 9.8, pad input so it equals 9.8000.

input += new string('0',
GetNumberOfDecimalPlaces(maxInput) -
GetNumberOfDecimalPlaces(input));
}

return Regex.Match(input, regex).Groups[0].ToString();
}

public static int GetNumberOfDecimalPlaces(string input)
{
return Regex.Match(input, @"\.(?<digits>\d*)",
RegexOptions.ExplicitCapture).Groups["digits"].ToString().Length;
}
}
}
 
Thanks for the reply Chris. Your solution certainly covers a wide
range of possibilities. I think relying on something like maxInclusive
is about as good as it's going to get. I'll throw an exception if they
don't have a [min|max][Inclusive|Exclusive] restriction allong with the
pattern.

One more question, though... Wouldn't this

e.InnerText =
Convert.ToDouble(objectValue).ToString("F"+(maxExclusiveValue.Length-m_maxExclusiveValue.IndexOf('.')-1).ToString());

do the same thing? If not, what are the disadvantages of doing it this
way?

Thanks
Tony
 
(e-mail address removed) wrote in
Thanks for the reply Chris. Your solution certainly covers a
wide range of possibilities. I think relying on something like
maxInclusive is about as good as it's going to get. I'll throw
an exception if they don't have a [min|max][Inclusive|Exclusive]
restriction allong with the pattern.

One more question, though... Wouldn't this

e.InnerText =
Convert.ToDouble(objectValue).ToString("F"+(maxExclusiveValue.Len
gth-m_maxExclusiveValue.IndexOf('.')-1).ToString());

do the same thing? If not, what are the disadvantages of doing
it this way?

Tony,

If both solutions work for your data, then apply Occam's Razor. I
was probably too focused on the regular expression, and honestly
didn't think of using the maxInclusive value to build a format
string.

Chris.
 
It appears that both solutions do indeed work. I just wanted to be
sure that you didn't use a regular expression for a particular reason.
I see you are now indicating that you didn't.

In looking at the schema, I see that sometimes maxInclusive isn't in
the format specified by the pattern! However, the DefaultValue for the
element always is, so I decided to go with that. So...

m_element.InnerText = Convert.ToDouble( objectValue
).ToString("F"+(m_schemaElement.DefaultValue.Length-m_schemaElement.DefaultValue.IndexOf('.')-1).ToString());
Thanks again.
Tony
 
Back
Top