Read Text File with Binary Header - C#

  • Thread starter Thread starter dm3281
  • Start date Start date
D

dm3281

Hello, I have a text report from a mainframe that I need to parse.

The report has about a 2580 byte header that contains binary information
(garbage for the most part); although there are a couple areas that have
ASCII text that I need to extract. At the end of the 2580 bytes, I can read
the report like a standard text file. It should have CR/LF at the end of
each line.

What is the best way for me to read this report using C#. It is almost like
I need to access the file using seek() or something and then read it using
ReadLine() or something.

I have a sample file here. The extension is .BIN to cause your browser to
prompt for the file download.

http://members.verizon.net/dm3281/misc/TEST.BIN

Any assistance or sample code would be appreciated.
 
This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);
//s.Seek(1171, SeekOrigin.Begin);

string str = sr.ReadLine();
while (str != null)
{
Console.WriteLine(str);
str = sr.ReadLine();
}
sr.Close();
fs.Close();
}

}
}
 
This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);

Shouldn't this be 2581? Might also want a sr.DiscardBufferedData()
here as well. Going back to 1171 gets garbage.
 
Hi,

As Mach58 pointed out, your report position is wrong and since StreamReader
is reading the the data as text you probably have some binary data making the
StreamReader return a null line prematurely. If you change the position you
should get the text. Alternately you could treat everything as a byte array
and extract necessary text using Encoding.ASCII.

Below is another way to do the same as your method. It uses a StringBuilder
to assemble the string. The reason for this was mainly due to using a
windows application and assembling everything to a single string object
before displaying it. It copies all the binary data to a byte array and uses
the byte array to read from instead of a stream. It isn't necessarily better
or worse reading from a byte array instead of a stream, but using a stream I
would probably use fs.Read and store the data in a byte arrays instead of
using a StreamReader.


StringBuilder sb = new StringBuilder();

int reportIdPosition = 877;
int custIdPosition = 1178;
int reportPosition = 2581;

byte[] data = File.ReadAllBytes(@"C:\TEST.BIN");
byte[] reportId = new byte[43];
byte[] custId = new byte[3];
byte[] report = new byte[data.Length - reportPosition];

// get report name
Array.Copy(data, reportIdPosition, reportId, 0, reportId.Length);
sb.AppendLine("ReportID: " + Encoding.ASCII.GetString(reportId));

// get customer ID
Array.Copy(data, custIdPosition, custId, 0, custId.Length);
sb.AppendLine("CustID: " + Encoding.ASCII.GetString(custId));

// get report
Array.Copy(data, reportPosition, report, 0, report.Length);
sb.AppendLine(Encoding.ASCII.GetString(report));

Console.WriteLine(sb.ToString());
 
Thanks everyone from the reply.

Morten, regarding your way or the approach I was taking...

How difficult would it be to then parse the report for various columns and
totals? Basically, I will need to scan report looking for the BLOCKED USED
section and then pull out the amounts for the various block numbers.








Morten Wennevik said:
Hi,

As Mach58 pointed out, your report position is wrong and since StreamReader
is reading the the data as text you probably have some binary data making the
StreamReader return a null line prematurely. If you change the position you
should get the text. Alternately you could treat everything as a byte array
and extract necessary text using Encoding.ASCII.

Below is another way to do the same as your method. It uses a StringBuilder
to assemble the string. The reason for this was mainly due to using a
windows application and assembling everything to a single string object
before displaying it. It copies all the binary data to a byte array and uses
the byte array to read from instead of a stream. It isn't necessarily better
or worse reading from a byte array instead of a stream, but using a stream I
would probably use fs.Read and store the data in a byte arrays instead of
using a StreamReader.


StringBuilder sb = new StringBuilder();

int reportIdPosition = 877;
int custIdPosition = 1178;
int reportPosition = 2581;

byte[] data = File.ReadAllBytes(@"C:\TEST.BIN");
byte[] reportId = new byte[43];
byte[] custId = new byte[3];
byte[] report = new byte[data.Length - reportPosition];

// get report name
Array.Copy(data, reportIdPosition, reportId, 0, reportId.Length);
sb.AppendLine("ReportID: " + Encoding.ASCII.GetString(reportId));

// get customer ID
Array.Copy(data, custIdPosition, custId, 0, custId.Length);
sb.AppendLine("CustID: " + Encoding.ASCII.GetString(custId));

// get report
Array.Copy(data, reportPosition, report, 0, report.Length);
sb.AppendLine(Encoding.ASCII.GetString(report));

Console.WriteLine(sb.ToString());


--
Happy Coding!
Morten Wennevik [C# MVP]


dm3281 said:
This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);
//s.Seek(1171, SeekOrigin.Begin);

string str = sr.ReadLine();
while (str != null)
{
Console.WriteLine(str);
str = sr.ReadLine();
}
sr.Close();
fs.Close();
}

}
}
 
Hi David,

If you are looking for the least amount of code lines, it could be done with

string reportString = Encoding.ASCII.GetString(report);
string[] reportLines = reportString.Split(new string[] {
Environment.NewLine }, StringSplitOptions.None);

string searchPhrase = "* * * * * * * * * * B L O C K S U S E
D * * * * * * * * * *";
int startIndex = Array.FindIndex<string>(reportLines, 0,
delegate(string s) { return s.Trim() == searchPhrase; });
int endIndex = Array.FindIndex<string>(reportLines, startIndex,
delegate(string s) { return s.Trim() == ""; });

string totalsLine = reportLines[endIndex - 1];
string[] totals = totalsLine.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);

string totalDebits = totals[1].Trim();
string totalCredits = totals[2].Trim();


You could manage with even less if there is always a SUSPECT BLOCKS at after
the BLOCKS USED section

string searchPhrase = "**** SUSPECT DUPLICATE BLOCKS ****";
int startIndex = Array.FindIndex<string>(reportLines, 0,
delegate(string s) { return s.Trim() == searchPhrase; });

string totalsLine = reportLines[startIndex - 2];


In the end it all depends on the realiability of the report file. Identify
markers that will always be there and use those to find the sections you need.

--
Happy Coding!
Morten Wennevik [C# MVP]


DavidM said:
Thanks everyone from the reply.

Morten, regarding your way or the approach I was taking...

How difficult would it be to then parse the report for various columns and
totals? Basically, I will need to scan report looking for the BLOCKED USED
section and then pull out the amounts for the various block numbers.








Morten Wennevik said:
Hi,

As Mach58 pointed out, your report position is wrong and since StreamReader
is reading the the data as text you probably have some binary data making the
StreamReader return a null line prematurely. If you change the position you
should get the text. Alternately you could treat everything as a byte array
and extract necessary text using Encoding.ASCII.

Below is another way to do the same as your method. It uses a StringBuilder
to assemble the string. The reason for this was mainly due to using a
windows application and assembling everything to a single string object
before displaying it. It copies all the binary data to a byte array and uses
the byte array to read from instead of a stream. It isn't necessarily better
or worse reading from a byte array instead of a stream, but using a stream I
would probably use fs.Read and store the data in a byte arrays instead of
using a StreamReader.


StringBuilder sb = new StringBuilder();

int reportIdPosition = 877;
int custIdPosition = 1178;
int reportPosition = 2581;

byte[] data = File.ReadAllBytes(@"C:\TEST.BIN");
byte[] reportId = new byte[43];
byte[] custId = new byte[3];
byte[] report = new byte[data.Length - reportPosition];

// get report name
Array.Copy(data, reportIdPosition, reportId, 0, reportId.Length);
sb.AppendLine("ReportID: " + Encoding.ASCII.GetString(reportId));

// get customer ID
Array.Copy(data, custIdPosition, custId, 0, custId.Length);
sb.AppendLine("CustID: " + Encoding.ASCII.GetString(custId));

// get report
Array.Copy(data, reportPosition, report, 0, report.Length);
sb.AppendLine(Encoding.ASCII.GetString(report));

Console.WriteLine(sb.ToString());


--
Happy Coding!
Morten Wennevik [C# MVP]


dm3281 said:
This is what I have so far and it kind of works for ReportID and CustID.
Then I try and do a ReadLine using streamreader and it re-reads the entire
file and prints the garbage at the beginning?

using System;
using System.IO;

namespace sample
{
public class test
{
static void Main()
{
FileStream fs = new FileStream(@"C:\TEMP\TEST.BIN",FileMode.Open);


// get report name
Console.Write("ReportID: ");
fs.Seek(877, SeekOrigin.Begin);
for (int i = 0; i < 43 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}

// get customer ID
Console.WriteLine();
Console.Write("CustID: ");
fs.Seek(1178, SeekOrigin.Begin);
for (int i = 0; i < 3 && i < fs.Length; i++)
{
Console.Write((char) fs.ReadByte());
}


StreamReader sr = new StreamReader(fs);

// jump to start of report
Console.WriteLine();
sr.BaseStream.Seek(1171,SeekOrigin.Begin);
//s.Seek(1171, SeekOrigin.Begin);

string str = sr.ReadLine();
while (str != null)
{
Console.WriteLine(str);
str = sr.ReadLine();
}
sr.Close();
fs.Close();
}

}
}






Hello, I have a text report from a mainframe that I need to parse.

The report has about a 2580 byte header that contains binary information
(garbage for the most part); although there are a couple areas that have
ASCII text that I need to extract. At the end of the 2580 bytes, I can
read the report like a standard text file. It should have CR/LF at the
end of each line.

What is the best way for me to read this report using C#. It is almost
like I need to access the file using seek() or something and then read it
using ReadLine() or something.

I have a sample file here. The extension is .BIN to cause your browser to
prompt for the file download.

http://members.verizon.net/dm3281/misc/TEST.BIN

Any assistance or sample code would be appreciated.
 
Back
Top