Array values split

B

byoxemeek

I have an array created from an undelimited text file I receive with a
format like:

60190012003010203040506070809101112
60190012004010203040506070809101112
6019001200501020304
60190021998040506070809101112
......

I need to split these values into array items or dataset columns in a
format like:

6019001 (code) 2003 (year) 01 02 03 04 (month #)

the end game is to produce an xml file from the dataset:

<code id="6019001">
<year yrid="2002">
<m id="1" link="6019001_012002.xml">Jan</m>
........
<m id="12" link="6019001_122002.xml">Dec</m>
</year>
<year yrid="2003">...

I am at a loss to know how to proceed without any delimiters and I have
no control over the format of the data I recieve.

Any help gratefully accepted

regards

John
 
G

Guest

Hi,
you can use somthing like:
FileStream m_streamReader
string code_id,year;
string m_p= this.readLine(this.m_streamReader);
codeid = m_p.Substring(0,7);
year= m_p.Substring(7,4);
.....
or :
FileStream fs
byte[] codeid = new byte[7];
int i = fs.Read(codeid,offset,7);
offset+=7
......
 
I

Ignacio Machin \( .NET/ C# MVP \)

Hi,

Are there always going to be 4 months at the end?

if so it;s a piece of cake:
line.Substring( line.Length - 9 ) // get the last 8 digits
line.Substring( line.Length - 13, 4 ) //year
line.SubString( 0, line.Length - 12 ) // code

of course you should test that line.Length be bigger than the values used.

also you could get a more cleaner solution, in anyway the deal is start from
the end , and of course all is based that you know FOR SURE that you have 4
months, otherwise it becomes more difficult


cheers,
 
G

Guest

byoxemeek said:
I have an array created from an undelimited text file I receive with a
format like:

60190012003010203040506070809101112
60190012004010203040506070809101112
6019001200501020304
60190021998040506070809101112
......

I need to split these values into array items or dataset columns in a
format like:

6019001 (code) 2003 (year) 01 02 03 04 (month #)

the end game is to produce an xml file from the dataset:

<code id="6019001">
<year yrid="2002">
<m id="1" link="6019001_012002.xml">Jan</m>
........
<m id="12" link="6019001_122002.xml">Dec</m>
</year>
<year yrid="2003">...

I am at a loss to know how to proceed without any delimiters and I have
no control over the format of the data I recieve.

Any help gratefully accepted

regards

John

Hi John,

First you need to get the lines from the text file (I assume there will be
no issues in doing this).

After getting the line there are two approaches you can try:
1. Using Regex
2. Using Basic String Operations (Not sure but i guess this will be more
efficient)

By first approach first you extract the Code and Year and update the string
by removing these. Now your string will contain only months so you can use
the regualr expression to get the months.

<CODE>
string str = "60190012003010203040506070809101112";
string code = str.Substring(0, 7);
string year = str.Substring(7, 4);
str = str.Substring(11,str.Length - 11);
string pattern = @"\d{2}";
Regex reg = new Regex(pattern);
MatchCollection mc = reg.Matches(str);
if(mc.Count > 0)
{
foreach(Match m in mc)
{
Console.WriteLine("Match Found: " + m.Value);
// Write your logic to consume month
}
}
</CODE>

By using string operations this can be coded like following:

<CODE>
string str = "60190012003010203040506070809101112";
string code = str.Substring(0, 7);
string year = str.Substring(7, 4);
if(str.Length >= 2)
{
for(int j=11; j < str.Length; j+=2)
{
Console.WriteLine("Match Found: " + str[j]+str[j+1]);
}
}
</CODE>

Hope it will help.
 
G

Greg Bacon

: I have an array created from an undelimited text file I receive with a
: format like:
:
: 60190012003010203040506070809101112
: 60190012004010203040506070809101112
: 6019001200501020304
: 60190021998040506070809101112
: .....
:
: I need to split these values into array items or dataset columns in a
: format like:
:
: 6019001 (code) 2003 (year) 01 02 03 04 (month #)
:
: the end game is to produce an xml file from the dataset:
:
: <code id="6019001">
: <year yrid="2002">
: <m id="1" link="6019001_012002.xml">Jan</m>
: ........
: <m id="12" link="6019001_122002.xml">Dec</m>
: </year>
: <year yrid="2003">...
:
: I am at a loss to know how to proceed without any delimiters and I
: have no control over the format of the data I recieve.

Describe your input with a regular expression:

static void Main(string[] args)
{
string[] lines = new string[]
{
"60190012003010203040506070809101112",
"60190012004010203040506070809101112",
"6019001200501020304",
"60190021998040506070809101112",
};

Regex record = new Regex(
@"^" +
@"(?<code>\d{7})" +
@"(?<year>\d{4})" +
@"(?<months>\d\d)+" +
@"$");

XmlDocument doc = new XmlDocument();
doc.AppendChild(doc.CreateElement("codes"));

foreach (string line in lines)
{
Match m = record.Match(line);

if (!m.Success)
{
Console.Error.WriteLine("no match (" + line + ")");
continue;
}

string code = m.Groups["code"].ToString();
string codexpath = "/codes/code[@id = '" + code + "']";
XmlNode codeelt = doc.SelectSingleNode(codexpath);
if (codeelt == null)
{
XmlElement elt = doc.CreateElement("code");
elt.SetAttribute("id", code);
doc.DocumentElement.AppendChild(elt);
codeelt = elt;
}

string year = m.Groups["year"].ToString();
XmlElement yearelt = doc.CreateElement("year");
yearelt.SetAttribute("yrid", year);
codeelt.AppendChild(yearelt);

foreach (Capture mm in m.Groups["months"].Captures)
{
string mmm;
int month = int.Parse(mm.ToString());

if (month >= 1 && month <= 12)
mmm = DateTimeFormatInfo.InvariantInfo.
MonthNames[month-1].Substring(0,3);
else
mmm = "???";

XmlElement melt = doc.CreateElement("m");
melt.SetAttribute("id", month.ToString());
melt.SetAttribute("link", code + "_" + mm + year + ".xml");
melt.InnerText = mmm;
yearelt.AppendChild(melt);
}
}

XmlTextWriter w = new XmlTextWriter(Console.Out);
w.Formatting = Formatting.Indented;
w.Indentation = 2;
doc.WriteTo(w);
}

Hope this helps,
Greg
 
J

Jay B. Harlow [MVP - Outlook]

byoxemeek,
As the others suggest you can use either (or both) String.SubString & Regex
to parse the file.

Seeing as you are going to Xml, I would consider creating a custom XmlReader
object similar to the GedcomReader in the last link below. Then use XSLT to
transform it into the format expected. This allows easy changing of the
target format by simply replacing the XSTL used...

The following articles discuss how to create a custom XmlReader object:
http://msdn.microsoft.com/msdnmag/issues/01/09/xml/default.aspx
http://msdn.microsoft.com/msdnmag/issues/04/05/XMLFiles/

Hope this helps
Jay

|I have an array created from an undelimited text file I receive with a
| format like:
|
| 60190012003010203040506070809101112
| 60190012004010203040506070809101112
| 6019001200501020304
| 60190021998040506070809101112
| .....
|
| I need to split these values into array items or dataset columns in a
| format like:
|
| 6019001 (code) 2003 (year) 01 02 03 04 (month #)
|
| the end game is to produce an xml file from the dataset:
|
| <code id="6019001">
| <year yrid="2002">
| <m id="1" link="6019001_012002.xml">Jan</m>
| ........
| <m id="12" link="6019001_122002.xml">Dec</m>
| </year>
| <year yrid="2003">...
|
| I am at a loss to know how to proceed without any delimiters and I have
| no control over the format of the data I recieve.
|
| Any help gratefully accepted
|
| regards
|
| John
|
 
G

Greg Bacon

: As the others suggest you can use either (or both) String.SubString & Regex
: to parse the file.
:
: Seeing as you are going to Xml, I would consider creating a custom XmlReader
: object similar to the GedcomReader in the last link below. Then use XSLT to
: transform it into the format expected. This allows easy changing of the
: target format by simply replacing the XSTL used...
:
: The following articles discuss how to create a custom XmlReader object:
: http://msdn.microsoft.com/msdnmag/issues/01/09/xml/default.aspx
: http://msdn.microsoft.com/msdnmag/issues/04/05/XMLFiles/

I might have use for this technique for $work, so as an exercise, I
wrote an XmlReader for the input format the OP described. It's
certainly not a complete (or even halfway polished) implementation, but
I hope someone will find value in it.

I welcome any comments.

using System;
using System.Collections;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml;

namespace FunkyReader
{
public class FunkyReader : XmlReader
{
private NameTable nametable = new NameTable();
private ReadState state;
private Codes codes;
private ArrayList dfs;
private int node;

public FunkyReader(string[] lines)
{
codes = new Codes();
state = ReadState.Initial;

ParseLines(lines);
}

public ArrayList Linearization
{
get { return dfs; }
}

#region XmlReader methods

public override int AttributeCount
{
get { return -1; }
}

public override string BaseURI
{
get { return ""; }
}

public override void Close() {}

public override int Depth
{
get { return -1; }
}

public override bool EOF
{
get { return false; }
}

public override string GetAttribute(int i) { return null; }
public override string GetAttribute(string name) { return null; }
public override string GetAttribute(string name, string namespaceURI)
{
return null;
}

public override bool HasValue
{
get { return false; }
}

public override bool IsDefault
{
get { return false; }
}

public override bool IsEmptyElement
{
get
{
Node n = (Node) dfs[node];
Node next = (Node) dfs[node+1];

return n.NodeType == Node.Type.Start && next.NodeType == Node.Type.End;
}
}

public override string LocalName
{
get
{
Node n = (Node) dfs[node];

switch (n.NodeType)
{
case Node.Type.Start:
case Node.Type.End:
case Node.Type.Attribute:
return n.Name;
default:
return "";
}
}
}

public override string LookupNamespace(string prefix) { return null; }
public override void MoveToAttribute(int i) {}
public override bool MoveToAttribute(string name) { return false; }
public override bool MoveToAttribute(string name, string ns) { return false; }
public override bool MoveToElement() { return false; }
public override bool MoveToFirstAttribute() { return false; }

public override bool MoveToNextAttribute()
{
Node next = (Node) dfs[node+1];

if (next.NodeType == Node.Type.Attribute)
{
++node;
return true;
}
else
return false;
}

public override string Name
{
get { return LocalName; }
}

public override string NamespaceURI
{
get { return ""; }
}

public override XmlNameTable NameTable
{
get { return nametable; }
}

public override XmlNodeType NodeType
{
get
{
if (node >= dfs.Count)
return XmlNodeType.None;

Node n = (Node) dfs[node];
switch (n.NodeType)
{
case Node.Type.Attribute:
return XmlNodeType.Attribute;
case Node.Type.Start:
return XmlNodeType.Element;
case Node.Type.End:
return XmlNodeType.EndElement;
case Node.Type.Text:
return XmlNodeType.Text;
default:
return XmlNodeType.None;
}
}
}

public override string Prefix
{
get { return null; }
}

public override char QuoteChar
{
get { return '"'; }
}

public override bool Read()
{
if (state == ReadState.Initial)
{
state = ReadState.Interactive;
node = 0;
}
else
node++;

return node < dfs.Count;
}

public override bool ReadAttributeValue()
{
Node n = (Node) dfs[node];

if (n.NodeType == Node.Type.Attribute)
{
++node;
return true;
}
else
return false;
}

public override ReadState ReadState
{
get { return ReadState.EndOfFile; }
}

public override void ResolveEntity() {}

public override string this[int i]
{
get { return null; }
}

public override string this[string name, string namespaceURI]
{
get { return null; }
}

public override string this[string name]
{
get { return null; }
}

public override string Value
{
get
{
return ((Node) dfs[node]).Value;
}
}

public override string XmlLang
{
get { return null; }
}

public override XmlSpace XmlSpace
{
get { return XmlSpace.None; }
}

#endregion

#region parse input

private void ParseLines(string[] lines)
{
Regex record = new Regex(
@"^" +
@"(?<code>\d{7})" +
@"(?<year>\d{4})" +
@"(?<months>\d\d)+" +
@"$");

foreach (string line in lines)
{
Match m = record.Match(line);

string code = m.Groups["code"].ToString();
string year = m.Groups["year"].ToString();
foreach (Capture mm in m.Groups["months"].Captures)
AddMonth(code, year, mm.ToString());
}

dfs = new ArrayList();
dfs.Add(new Node("codes", Node.Type.Start));
foreach (Code c in codes)
c.Linearize(dfs);
dfs.Add(new Node("codes", Node.Type.End));
}

private void AddMonth(string code, string year, string mm)
{
codes
Code:
[year].Add(new Month(code, year, mm));
}

#endregion

#region Node class

public class Node
{
public enum Type { Start, Attribute, Text, End };

private string name;
private Type type;

public Node(string name, Type type)
{
this.name = name;
this.type = type;
}

public string Name
{
get { return name; }
}

public string Value
{
get { return name; }
}

public Type NodeType
{
get { return type; }
}
}

#endregion

#region various element representations

class Codes
{
private ArrayList codes = new ArrayList();

public Code this[string code]
{
get
{
Code c = null;
for (int i = 0; i < codes.Count; i++)
{
if (((Code) codes[i]).ID == code)
{
c = (Code) codes[i];
break;
}
}

if (c != null)
return c;
else
{
codes.Add(c = new Code(code));
return c;
}
}
}

public IEnumerator GetEnumerator()
{
return codes.GetEnumerator();
}
}

class Code
{
string id;
private ArrayList years = new ArrayList();

public Code(string name)
{
id = name;
}

public Year this[string year]
{
get
{
Year y = null;
for (int i = 0; i < years.Count; i++)
{
if (((Year) years[i]).ID == year)
{
y = (Year) years[i];
break;
}
}

if (y != null)
return y;
else
{
years.Add(y = new Year(year));
return y;
}
}
}

public string ID
{
get { return id; }
}

public void Linearize(ArrayList record)
{
record.Add(new Node("code", Node.Type.Start));
record.Add(new Node("id", Node.Type.Attribute));
record.Add(new Node(ID, Node.Type.Text));
foreach (Year y in years)
y.Linearize(record);
record.Add(new Node("code", Node.Type.End));
}
}

class Year
{
string yrid;
ArrayList months = new ArrayList();

public Year(string year)
{
yrid = year;
}

public string ID
{
get { return yrid; }
}

public void Add(Month m)
{
months.Add(m);
}

public void Linearize(ArrayList record)
{
record.Add(new Node("year", Node.Type.Start));
record.Add(new Node("yrid", Node.Type.Attribute));
record.Add(new Node(yrid, Node.Type.Text));
foreach (Month m in months)
m.Linearize(record);
record.Add(new Node("year", Node.Type.End));
}
}

class Month
{
private int month;  // i.e., 1-12
private string link;

public Month(string code, string year, string mm)
{
this.month = int.Parse(mm);
this.link  = code + "_" + mm + year + ".xml";
}

public int ID
{
get { return month; }
}

public string Link
{
get { return link; }
}

public string MonthShortName
{
get
{
return DateTimeFormatInfo.InvariantInfo.MonthNames[month-1].Substring(0,3);
}
}

public void Linearize(ArrayList record)
{
record.Add(new Node("m", Node.Type.Start));
record.Add(new Node("id", Node.Type.Attribute));
record.Add(new Node(month.ToString(), Node.Type.Text));
record.Add(new Node("link", Node.Type.Attribute));
record.Add(new Node(link, Node.Type.Text));
record.Add(new Node(MonthShortName, Node.Type.Text));
record.Add(new Node("m", Node.Type.End));
}
}

#endregion
}
}

Enjoy,
Greg
 
G

Greg Bacon

: I might have use for this technique for $work, so as an exercise, I
: wrote an XmlReader for the input format the OP described. It's
: certainly not a complete (or even halfway polished) implementation, but
: I hope someone will find value in it.

Well, not complete, but I *can* load the OP's input lines into an
XML document and get the expected output:

string[] lines =
{
"60190012003010203040506070809101112",
"60190012004010203040506070809101112",
"6019001200501020304",
"60190021998040506070809101112",
};

FunkyReader r = new FunkyReader(lines);

XmlDocument xml = new XmlDocument();
xml.Load(r);

XmlTextWriter w = new XmlTextWriter(Console.Out);
w.Formatting = Formatting.Indented;
w.Indentation = 2;

xml.WriteTo(w);

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top