Read Text File with Binary Header - C#

D

dm3281

Hello, I have a text report from a mainframe that I need to parse.

The report has about a 2580 byte header that contains binary information
(garbage for the most part); although there are a couple areas that have
ASCII text that I need to extract. At the end of the 2580 bytes, I can read
the report like a standard text file. It should have CR/LF at the end of
each line.

What is the best way for me to read this report using C#. It is almost like
I need to access the file using seek() or something and then read it using
ReadLine() or something.

I have a sample file here. The extension is .BIN to cause your browser to
prompt for the file download.

http://members.verizon.net/dm3281/misc/TEST.BIN

Any assistance or sample code would be appreciated.
 
D

dm3281

This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);
//s.Seek(1171, SeekOrigin.Begin);

string str = sr.ReadLine();
while (str != null)
{
Console.WriteLine(str);
str = sr.ReadLine();
}
sr.Close();
fs.Close();
}

}
}
 
M

Mach58

This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);

Shouldn't this be 2581? Might also want a sr.DiscardBufferedData()
here as well. Going back to 1171 gets garbage.
 
M

Morten Wennevik [C# MVP]

Hi,

As Mach58 pointed out, your report position is wrong and since StreamReader
is reading the the data as text you probably have some binary data making the
StreamReader return a null line prematurely. If you change the position you
should get the text. Alternately you could treat everything as a byte array
and extract necessary text using Encoding.ASCII.

Below is another way to do the same as your method. It uses a StringBuilder
to assemble the string. The reason for this was mainly due to using a
windows application and assembling everything to a single string object
before displaying it. It copies all the binary data to a byte array and uses
the byte array to read from instead of a stream. It isn't necessarily better
or worse reading from a byte array instead of a stream, but using a stream I
would probably use fs.Read and store the data in a byte arrays instead of
using a StreamReader.


StringBuilder sb = new StringBuilder();

int reportIdPosition = 877;
int custIdPosition = 1178;
int reportPosition = 2581;

byte[] data = File.ReadAllBytes(@"C:\TEST.BIN");
byte[] reportId = new byte[43];
byte[] custId = new byte[3];
byte[] report = new byte[data.Length - reportPosition];

// get report name
Array.Copy(data, reportIdPosition, reportId, 0, reportId.Length);
sb.AppendLine("ReportID: " + Encoding.ASCII.GetString(reportId));

// get customer ID
Array.Copy(data, custIdPosition, custId, 0, custId.Length);
sb.AppendLine("CustID: " + Encoding.ASCII.GetString(custId));

// get report
Array.Copy(data, reportPosition, report, 0, report.Length);
sb.AppendLine(Encoding.ASCII.GetString(report));

Console.WriteLine(sb.ToString());
 
D

DavidM

Thanks everyone from the reply.

Morten, regarding your way or the approach I was taking...

How difficult would it be to then parse the report for various columns and
totals? Basically, I will need to scan report looking for the BLOCKED USED
section and then pull out the amounts for the various block numbers.








Morten Wennevik said:
Hi,

As Mach58 pointed out, your report position is wrong and since StreamReader
is reading the the data as text you probably have some binary data making the
StreamReader return a null line prematurely. If you change the position you
should get the text. Alternately you could treat everything as a byte array
and extract necessary text using Encoding.ASCII.

Below is another way to do the same as your method. It uses a StringBuilder
to assemble the string. The reason for this was mainly due to using a
windows application and assembling everything to a single string object
before displaying it. It copies all the binary data to a byte array and uses
the byte array to read from instead of a stream. It isn't necessarily better
or worse reading from a byte array instead of a stream, but using a stream I
would probably use fs.Read and store the data in a byte arrays instead of
using a StreamReader.


StringBuilder sb = new StringBuilder();

int reportIdPosition = 877;
int custIdPosition = 1178;
int reportPosition = 2581;

byte[] data = File.ReadAllBytes(@"C:\TEST.BIN");
byte[] reportId = new byte[43];
byte[] custId = new byte[3];
byte[] report = new byte[data.Length - reportPosition];

// get report name
Array.Copy(data, reportIdPosition, reportId, 0, reportId.Length);
sb.AppendLine("ReportID: " + Encoding.ASCII.GetString(reportId));

// get customer ID
Array.Copy(data, custIdPosition, custId, 0, custId.Length);
sb.AppendLine("CustID: " + Encoding.ASCII.GetString(custId));

// get report
Array.Copy(data, reportPosition, report, 0, report.Length);
sb.AppendLine(Encoding.ASCII.GetString(report));

Console.WriteLine(sb.ToString());


--
Happy Coding!
Morten Wennevik [C# MVP]


dm3281 said:
This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);
//s.Seek(1171, SeekOrigin.Begin);

string str = sr.ReadLine();
while (str != null)
{
Console.WriteLine(str);
str = sr.ReadLine();
}
sr.Close();
fs.Close();
}

}
}
 
M

Morten Wennevik [C# MVP]

Hi David,

If you are looking for the least amount of code lines, it could be done with

string reportString = Encoding.ASCII.GetString(report);
string[] reportLines = reportString.Split(new string[] {
Environment.NewLine }, StringSplitOptions.None);

string searchPhrase = "* * * * * * * * * * B L O C K S U S E
D * * * * * * * * * *";
int startIndex = Array.FindIndex<string>(reportLines, 0,
delegate(string s) { return s.Trim() == searchPhrase; });
int endIndex = Array.FindIndex<string>(reportLines, startIndex,
delegate(string s) { return s.Trim() == ""; });

string totalsLine = reportLines[endIndex - 1];
string[] totals = totalsLine.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);

string totalDebits = totals[1].Trim();
string totalCredits = totals[2].Trim();


You could manage with even less if there is always a SUSPECT BLOCKS at after
the BLOCKS USED section

string searchPhrase = "**** SUSPECT DUPLICATE BLOCKS ****";
int startIndex = Array.FindIndex<string>(reportLines, 0,
delegate(string s) { return s.Trim() == searchPhrase; });

string totalsLine = reportLines[startIndex - 2];


In the end it all depends on the realiability of the report file. Identify
markers that will always be there and use those to find the sections you need.

--
Happy Coding!
Morten Wennevik [C# MVP]


DavidM said:
Thanks everyone from the reply.

Morten, regarding your way or the approach I was taking...

How difficult would it be to then parse the report for various columns and
totals? Basically, I will need to scan report looking for the BLOCKED USED
section and then pull out the amounts for the various block numbers.








Morten Wennevik said:
Hi,

As Mach58 pointed out, your report position is wrong and since StreamReader
is reading the the data as text you probably have some binary data making the
StreamReader return a null line prematurely. If you change the position you
should get the text. Alternately you could treat everything as a byte array
and extract necessary text using Encoding.ASCII.

Below is another way to do the same as your method. It uses a StringBuilder
to assemble the string. The reason for this was mainly due to using a
windows application and assembling everything to a single string object
before displaying it. It copies all the binary data to a byte array and uses
the byte array to read from instead of a stream. It isn't necessarily better
or worse reading from a byte array instead of a stream, but using a stream I
would probably use fs.Read and store the data in a byte arrays instead of
using a StreamReader.


StringBuilder sb = new StringBuilder();

int reportIdPosition = 877;
int custIdPosition = 1178;
int reportPosition = 2581;

byte[] data = File.ReadAllBytes(@"C:\TEST.BIN");
byte[] reportId = new byte[43];
byte[] custId = new byte[3];
byte[] report = new byte[data.Length - reportPosition];

// get report name
Array.Copy(data, reportIdPosition, reportId, 0, reportId.Length);
sb.AppendLine("ReportID: " + Encoding.ASCII.GetString(reportId));

// get customer ID
Array.Copy(data, custIdPosition, custId, 0, custId.Length);
sb.AppendLine("CustID: " + Encoding.ASCII.GetString(custId));

// get report
Array.Copy(data, reportPosition, report, 0, report.Length);
sb.AppendLine(Encoding.ASCII.GetString(report));

Console.WriteLine(sb.ToString());


--
Happy Coding!
Morten Wennevik [C# MVP]


dm3281 said:
This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);
//s.Seek(1171, SeekOrigin.Begin);

string str = sr.ReadLine();
while (str != null)
{
Console.WriteLine(str);
str = sr.ReadLine();
}
sr.Close();
fs.Close();
}

}
}






Hello, I have a text report from a mainframe that I need to parse.

The report has about a 2580 byte header that contains binary information
(garbage for the most part); although there are a couple areas that have
ASCII text that I need to extract. At the end of the 2580 bytes, I can
read the report like a standard text file. It should have CR/LF at the
end of each line.

What is the best way for me to read this report using C#. It is almost
like I need to access the file using seek() or something and then read it
using ReadLine() or something.

I have a sample file here. The extension is .BIN to cause your browser to
prompt for the file download.

http://members.verizon.net/dm3281/misc/TEST.BIN

Any assistance or sample code would be appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top