Testing File Format

T

Tom

Hi all,

I am looking for a smart way to assure a file is indeed a text file
within a C# method and not binary.

For example: Will "thisMysteryFile.dat" be legible if opened in a
RichTextBox ... or is it a binary file?

I have searched various methods in the string class and am having no
luck.

Under consideration >>

Open the file in a binary reader and then test either the first 1000
char or until File End and if any char are less than 32 or greater
than 127 ... then flag it as binary.

If not binary >> open in a RichTextBox

Can anyone tell me a more efficient way to accomplish this task?

Thanks !!
 
T

Tom

Peter -- Thanks. Your comments have me thinking outside the match box
in which I was stuck. I'm now digging into the RichTextBoxStreamType
enumeration >> UnicodePlainText.

I'll experiment with this enumeration and see if loading a binary data
file throws an exception. All this RichTextBox stuff is new for me ...
so I have a lot to learn for sure.

Perhaps a restricted load of a tiny size for a preview and then have
control buttons with "Load Full File" or "Clear RichTextBox" options?

Avoiding the accidental loading of a huge binary data file is part of
my objective. The other part of the objective is read only viewing the
small parameter data file as part of a data run initialization.

I am always amazed at how another's input can cause me to refocus.
Darn trees ruining my view of the forrest!! LOL

Have a great day. Thanks again!

-- Tom
 
P

Peter Duniho

Peter -- Thanks. Your comments have me thinking outside the match box
in which I was stuck. I'm now digging into the RichTextBoxStreamType
enumeration >> UnicodePlainText.

If you do that, won't you limit your input to Unicode files?

I think that one approach would be to use a StreamReader to
automatically detect the encoding of the file for you, and then read
the first 1K or so, counting how many characters return true for the
Char.IsLetterOrDigit method and comparing that to the total number of
characters.

It still won't be perfect, but you should be able to come up with a
reasonably good heuristic regarding what the ratio of alphanumeric
characters to other characters you would expect to see in a text file.

Of course, you can still include the user in the determination. For
example, run the above test and if the file passes go ahead and use it,
but if it fails provide the user with a chance to override your
analysis. You could even do this just as you suggest: provide a brief
preview of the initial part of the file to the user so that they can
visually decide whether it's a file they want treated as text.

Caveat: I have basically no experience with non-alphabetic languages,
and I don't know if in a non-alphabetic language a word character would
be considered a "letter" for the purpose of the above test. If that's
important to you, you'll want to verify that and/or find a form of
classification that will correctly detect those characters as text.

Pete
 
M

Mihai N.

Unicode (which can certainly
be the contents of a "text file" supports 65536 characters.

Unicode goes up to 10FFFF, which is a bit more than one million.
Other than that, very good warning :)
 
T

Tom

Pete --

Thank you! I am new to C# and I am exploring StreamReader a.s.a.p.

I work only in the English language and am not developing programs for
global distribution. Your methodology seems solid to this newb. Usage
of Char.IsLetterOrDigit would effectively provide some language
independence. That independence makes for a MUCH better tool than what
I had been focused upon.

Very, very thought provoking!

Again, thanks. -- Tom
 
T

Tom

Hey folks --

I've been rethinking my usage of RichTextBox long and hard. At first
it seemed the do all new magic class. For some tasks it is just that!
Accidentally opening a huge file from a ListView selection is
painfully slow and consumes resources like no tomorrow. Ouch.

What I really crave is a Text Viewer class without editing capability.
One that only loads a screen worth of text at a time. Where the thumb
is sized to reflect the file size and placement of the thumb loads
just that section of the data file. Like Petzold's painting with text
example from Programming Windows 95 ... only in .Net 2.0 C# and
integrated with a simpler TextBox? Or another text viewing control
that is more appropriate.

I'm still searching for such a Text Viewer. A search on "Thumb Size
..Net 2.0" led me to some graphics intensive TrackBarRenderer,
trackRectangle, thumbRectangle, etc. usage that goes way beyond the
WinForms book and C# Instructional Texts that I have. Certainly
steepening my learning curve!

My guess is someone has already duplicated that Petzold example in C#
2.0 and that I would learn more and faster from studying a guru's
coding than creating my own.

If anyone can point me towards such a useful, compact, and also
complex tool ... I would be without doubt grateful.

Thanks. -- Tom
 
P

Peter Duniho

[...]
I'm still searching for such a Text Viewer. A search on "Thumb Size
.Net 2.0" led me to some graphics intensive TrackBarRenderer,
trackRectangle, thumbRectangle, etc. usage that goes way beyond the
WinForms book and C# Instructional Texts that I have. Certainly
steepening my learning curve!

My guess is someone has already duplicated that Petzold example in C#
2.0 and that I would learn more and faster from studying a guru's
coding than creating my own.

If anyone can point me towards such a useful, compact, and also
complex tool ... I would be without doubt grateful.

I'm not familiar with Petzold's examples, so I can't comment on that.
As far as what you're asking about, I'm not aware of a specific
text-box implementation that does what you're talking about. It
wouldn't be that hard to do, at least for the basic implementation
(duplicating the full functionality of the TextBoxBase classes would be
harder, but it sounds like you only need a minimal subset).

Interestingly, taking a suggestion from a different thread -- in which
someone suggested using a ListBox to implement a console-output-like
control -- you could use the DataGridView in a similar way, taking
advantage of its "VirtualMode" mechanism. Using that, the control
handles all of the display and you provide the code that virtualizes
the data rather than having it all in memory at once.

It could be overkill -- the DataGridView control has lots of stuff in
it that would be of no value for this purpose -- and you might have
trouble getting it to look just right, since the DataGridView does have
a specific look and I don't know if you could get rid of the elements
that would be distracting in this use.

But hey, when you're hacking stuff, you can't be picky. :)

Pete
 
T

Tom

Pete --

Pete --

Pete --

Using a DataGrid is very thought provoking. Still beyond my beginner
status capabilities. The curiosity seed is however now planted and
when the right combination of skill accumulation and need occurs I
shall try to baby step my way into the Grid.

My data file picker/viewer is now working. :)

I went back to the RichTextBox usage. My attempts at "painting" text
just caused me headaches and errors. Mixing EventArgs from my ListView
object with PaintEventArgs to control the graphics is beyond my
understanding ... although I gave it many hours of effort.

What ended up being VERY helpful is >>

1) Using StreamReader. ( ** Thank You ** for the suggestion !! )
2) Setting >> rtb.WordWrap = false; (rtb = RichTextBox object)
3) Implementing FontDialog for selecting the text attributes.
4) Using a line reading counter to limit loading too large of files.
5) Read only is set. Background is gray and I am ok with that.

The top left of my FilePicker is a Directory TreeView and the top
right is a File ListView. This top half acts like a weak version of
FileExplorer. The bottom splitter panel is a RichTextBox that is now
behaving very well!!

When I click on a huge binary data file >> the RichTextBox shows a
fast response 40 lines of gibberish. No big deal and NOT having the
entire file load is NICE! That rtb.LoadFile() can be a pain.

Usually viewing 40 lines is plenty for me to validate that the correct
parameter file is indeed selected ... so it's a winner. I have some
future enhancement ideas too. A two thumb sliding bar controller to
select where to start and end file input is one such idea. I'd like
better list control too. Currently my File ListView does not sort by
variable columns. I'd like to be able to sort by time in addition to
the current ascending FileName ordering.

Amazing how difficult these simple tasks are for beginners. And how
IMPOSSIBLY difficult these tasks use to be. Wow!! I can only shake my
head at the effort needed to duplicate my WinForm 2.0 functionality
using older methods. Ouch!!

Single screen file selection and quick viewing without concern for
accidentally editing it is sweet. A very functional tool I shall use
aplenty.

For emphasis >> All the comments I received were helpful and kept me
digging when I felt like waving the white flag. Now I am all smiles.
Dare I look at my long list of other projects? Yikes !!!

A great day to all.

-- Tom





[...]
I'm still searching for such a Text Viewer. A search on "Thumb Size
.Net 2.0" led me to some graphics intensive TrackBarRenderer,
trackRectangle, thumbRectangle, etc. usage that goes way beyond the
WinForms book and C# Instructional Texts that I have. Certainly
steepening my learning curve!

My guess is someone has already duplicated that Petzold example in C#
2.0 and that I would learn more and faster from studying a guru's
coding than creating my own.

If anyone can point me towards such a useful, compact, and also
complex tool ... I would be without doubt grateful.

I'm not familiar with Petzold's examples, so I can't comment on that.
As far as what you're asking about, I'm not aware of a specific
text-box implementation that does what you're talking about. It
wouldn't be that hard to do, at least for the basic implementation
(duplicating the full functionality of the TextBoxBase classes would be
harder, but it sounds like you only need a minimal subset).

Interestingly, taking a suggestion from a different thread -- in which
someone suggested using a ListBox to implement a console-output-like
control -- you could use the DataGridView in a similar way, taking
advantage of its "VirtualMode" mechanism. Using that, the control
handles all of the display and you provide the code that virtualizes
the data rather than having it all in memory at once.

It could be overkill -- the DataGridView control has lots of stuff in
it that would be of no value for this purpose -- and you might have
trouble getting it to look just right, since the DataGridView does have
a specific look and I don't know if you could get rid of the elements
that would be distracting in this use.

But hey, when you're hacking stuff, you can't be picky. :)

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top