Is there an escape sequence for Space char? for space delimted tx

  • Thread starter Thread starter Rich
  • Start date Start date
R

Rich

I need to read a large space delimted text file. I can do this using a
streamReader except it takes twice as long as an OleDBDataAdapter (using the
following delimiters: tabDelimited/comma/| pipe). My problem is in using a
space as the delimiter for reading a space delimited text file using OleDB.
Here are some sample connection strings for a tabDelimited text file or Pipe
Delimited (which both work fine):

string s1 = Application.StartupPath;

connOle.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
+ s1 + ";Extended Properties=\"text;HDR=Yes;FMT=TabDelimited\"";

connOle.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
+ s1 + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited(|)\""

Note: oleDB also requires a schema.ini file to be placed in the same folder
as the text file to be read

save as schema.ini

[fileName.extention]
ColNameHeader=true
CharacterSet=ANSI
Format=Delimited(|)

I have tried variations for space delimiting without success

connOle.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
+ s1 + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited(' ')\""

//here I try a hex sequence which works with console.writeline() but not
with oleDB

connOle.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
+ s1 + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited(\x20)\""

Any suggestions would be greatly appreciated for an escape sequence for a
space delimiter.

Thanks,
Rich
 
Thanks. This does look promising. Any you say this has performance like the
Jet method? How about the delimiter? can I say

parser.ColumnDelimiter = " ".ToCharArray();

for a space delimiter?
 
Rich said:
Thanks. This does look promising. Any you say this has performance like the
Jet method? How about the delimiter? can I say

parser.ColumnDelimiter = " ".ToCharArray();

for a space delimiter?

I can't answer the parser-specific aspect (though, seems like that's
something you could just try), but if you can do the above, wouldn't you
prefer this instead:

parser.ColumnDelimiter = new char[] { ' ' };

?

Why create a whole string instance only to just turn around and then
create a new array based on it, when you can just create the array directly?

Pete
 
Thanks. This does look promising. Any you say this has performance like
the
Jet method?

The performance is spectacular. I've never done any side-by-side
comparisons, but I know this thing is fast.
 
I started experimenting with this sample project. I noticed that stream
reader is being used here, along with a creation of a schema.ini file like
the Jet technique. It looks to me like the Jet technique wraps up all of its
coding to a one liner - where the Jet underlying code is probably simlar to
the code being used in this sample. And I guess the benefit with the code in
this sample is that can be modified where the Jet code can't.

The downside with this sample - for me - is the learning curve. I will have
to study this a bit. And then once I compile the class I would have to
reference it - adding a dependency to my project.

It looks - for the time being - I will resign myself to my elementary usage
of StreamReader. The Jet technique would be nice because it is a one liner,
but alas! it does not seem to support a space as a delimiter.
 
I started experimenting with this sample project. I noticed that stream
reader is being used here, along with a creation of a schema.ini file like
the Jet technique. It looks to me like the Jet technique wraps up all of
its
coding to a one liner - where the Jet underlying code is probably simlar
to
the code being used in this sample. And I guess the benefit with the code
in
this sample is that can be modified where the Jet code can't.

The downside with this sample - for me - is the learning curve. I will
have
to study this a bit. And then once I compile the class I would have to
reference it - adding a dependency to my project.

It looks - for the time being - I will resign myself to my elementary
usage
of StreamReader. The Jet technique would be nice because it is a one
liner,
but alas! it does not seem to support a space as a delimiter.

....learning curve? It should be about as simple as

TextParserAdapter parser = new
TextParserAdapter(@"<path>\SpaceDelimFile.txt");
parser.ColumnDelimiter = new char[] { ' ' };

DataTable dt = parser.GetDataTable();

And then you just work with the data in the DataTable like you would with
data from any other data source. Now I've made a lot of modifications to
that library over time, but I think the code I have right there should work
out-of-the-box.
 
Here is what I mean by learning curve:
.....learning curve? It should be about as simple as

TextParserAdapter parser = new
textparseradapter(@"<path>\SpaceDelimFile.txt");
parser.ColumnDelimiter = new char[] { ' ' };

DataTable dt = parser.GetDataTable();
<

I compiled GenericParsing, I added a reference to my project to the
GenericParsing library, and also added a using directive -- using
GenericParsing

But I do not get TextParserAdapter to show up in the intellisense and if I
just type it - VS(2008) complains. Is it because GenericParsing is from
VS2003? How do Implement this in my (VS2008 C#) project?

Thanks
 
Here is what I mean by learning curve:
....learning curve? It should be about as simple as

TextParserAdapter parser = new
textparseradapter(@"<path>\SpaceDelimFile.txt");
parser.ColumnDelimiter = new char[] { ' ' };

DataTable dt = parser.GetDataTable();
<

I compiled GenericParsing, I added a reference to my project to the
GenericParsing library, and also added a using directive -- using
GenericParsing

But I do not get TextParserAdapter to show up in the intellisense and if I
just type it - VS(2008) complains. Is it because GenericParsing is from
VS2003? How do Implement this in my (VS2008 C#) project?

Well, crap. It appears I didn't like "GenericParser" and renamed it
"TextParser". (I must have thought it was too "generic"....)

So use GenericParserAdapter instead and see if that works.
 
Well, I tried the following, but VS complained at the DataTable part. In the
demo project I did not see any GetDataTable() methods.

GenericParsing.GenericParser parser = new GenericParsing.GenericParser(s1);

DataTable dt = parser.getdatatable();



Jeff Johnson said:
I started experimenting with this sample project. I noticed that stream
reader is being used here, along with a creation of a schema.ini file like
the Jet technique. It looks to me like the Jet technique wraps up all of
its
coding to a one liner - where the Jet underlying code is probably simlar
to
the code being used in this sample. And I guess the benefit with the code
in
this sample is that can be modified where the Jet code can't.

The downside with this sample - for me - is the learning curve. I will
have
to study this a bit. And then once I compile the class I would have to
reference it - adding a dependency to my project.

It looks - for the time being - I will resign myself to my elementary
usage
of StreamReader. The Jet technique would be nice because it is a one
liner,
but alas! it does not seem to support a space as a delimiter.

....learning curve? It should be about as simple as

TextParserAdapter parser = new
TextParserAdapter(@"<path>\SpaceDelimFile.txt");
parser.ColumnDelimiter = new char[] { ' ' };

DataTable dt = parser.GetDataTable();

And then you just work with the data in the DataTable like you would with
data from any other data source. Now I've made a lot of modifications to
that library over time, but I think the code I have right there should work
out-of-the-box.


.
 
OK. I am kind of a lamo, but I finally tried this which sort of worked up to
---

GenericParsing.GenericParserAdapter parser = new
GenericParsing.GenericParserAdapter(s1);
parser.ColumnDelimiter = new char[] { ' ' };
DataTable dt = parser.GetDataTable();

Console.WriteLine(dt.Rows.Count.ToString());

dgrv1.DataSource = dt;

--the datagridview (dgrv1) complained that the column fill width could not
exceed 65535 (or some number like that).

I am sure that the parser read the text file (this textfile only had 52,000
rows) because it took it about 30 seconds to load - which is way quicker than
my streamRead routine. But on the console.writeline (above) it only wrote 1
row for dt.Rows.Count.ToString()

I think I like the performance - just trying to get it to work correctly is
a little bit challenging (for me).

Jeff Johnson said:
Here is what I mean by learning curve:
....learning curve? It should be about as simple as

TextParserAdapter parser = new
textparseradapter(@"<path>\SpaceDelimFile.txt");
parser.ColumnDelimiter = new char[] { ' ' };

DataTable dt = parser.GetDataTable();
<

I compiled GenericParsing, I added a reference to my project to the
GenericParsing library, and also added a using directive -- using
GenericParsing

But I do not get TextParserAdapter to show up in the intellisense and if I
just type it - VS(2008) complains. Is it because GenericParsing is from
VS2003? How do Implement this in my (VS2008 C#) project?

Well, crap. It appears I didn't like "GenericParser" and renamed it
"TextParser". (I must have thought it was too "generic"....)

So use GenericParserAdapter instead and see if that works.


.
 
Yay! I got it work -- turns out that my text file with the 52000 rows was
this type:

"abc" "def" "ghi" "jkl"
"abc" "def" "ghi" "jkl"
"abc" "def" "ghi" "jkl"
....

with double quotes surrounding the text. It must have been generated with
VBA. Anyway, the streamreader in my original routine will read the double
quotes OK, I was just doing a .Replace(..."\"","") for each piece of data.
Once I understand the workings of GenericParser I could probably add a
..Replace to it (somewhere).

Rich said:
OK. I am kind of a lamo, but I finally tried this which sort of worked up to
---

GenericParsing.GenericParserAdapter parser = new
GenericParsing.GenericParserAdapter(s1);
parser.ColumnDelimiter = new char[] { ' ' };
DataTable dt = parser.GetDataTable();

Console.WriteLine(dt.Rows.Count.ToString());

dgrv1.DataSource = dt;

--the datagridview (dgrv1) complained that the column fill width could not
exceed 65535 (or some number like that).

I am sure that the parser read the text file (this textfile only had 52,000
rows) because it took it about 30 seconds to load - which is way quicker than
my streamRead routine. But on the console.writeline (above) it only wrote 1
row for dt.Rows.Count.ToString()

I think I like the performance - just trying to get it to work correctly is
a little bit challenging (for me).

Jeff Johnson said:
Here is what I mean by learning curve:


....learning curve? It should be about as simple as

TextParserAdapter parser = new
textparseradapter(@"<path>\SpaceDelimFile.txt");
parser.ColumnDelimiter = new char[] { ' ' };

DataTable dt = parser.GetDataTable();
<

I compiled GenericParsing, I added a reference to my project to the
GenericParsing library, and also added a using directive -- using
GenericParsing

But I do not get TextParserAdapter to show up in the intellisense and if I
just type it - VS(2008) complains. Is it because GenericParsing is from
VS2003? How do Implement this in my (VS2008 C#) project?

Well, crap. It appears I didn't like "GenericParser" and renamed it
"TextParser". (I must have thought it was too "generic"....)

So use GenericParserAdapter instead and see if that works.


.
 
Yay! I got it work -- turns out that my text file with the 52000 rows was
this type:

"abc" "def" "ghi" "jkl"
"abc" "def" "ghi" "jkl"
"abc" "def" "ghi" "jkl"
...

with double quotes surrounding the text. It must have been generated with
VBA. Anyway, the streamreader in my original routine will read the double
quotes OK, I was just doing a .Replace(..."\"","") for each piece of data.
Once I understand the workings of GenericParser I could probably add a
.Replace to it (somewhere).

parser.TextQualifier = '"' // <-- apostrophe quotation-mark apostrophe
 
Back
Top