Importing Text files w/o losing special characters

NickHK · Oct 13, 2006

Tony,
Probably, but what use are 6 x null characters ?
Do they actually mean anything to you ?
Or are they some garbage that your Fortran app outputs ?

NickHK

Guest · Oct 13, 2006

Nick,

The presence of the null characters is important because I'm importing data
from thousands of files like these using an Excel VBA routine. The data
within these files is not well-organized by today's standards, but it can be
extracted using string functions such as MID(,,) as long as it is known where
particular fields begin and end. A series of FORTRAN programs has been used
for 30-40 years to manipulate these files, and now the company wants to
modernize.

Up until now, my Excel VBA routine has successfully extracted the data from
thousands of these files so that it can be placed into Excel format. Except
for certain files; the ones having this problem. Apparently a FORTRAN
programmer inadvertantly placed the null characters into certain of these
files while developing certain aspects of the programs.

If the null characters were spaces instead of null characters there would be
little problem since the MID(,,) function would still collect the following
fields in the string correctly. But the null characters being stripped away
means that everything past that point gets offset by 6 characters, causing
things like string conversion to doubles, for example, to cause the program
to crash.

If I can't find a way to keep Excel from stripping the nulls, I suppose I'll
work out an error trapping routine that tells the user to manually fix that
file. I just hope it doesn't amount to hundreds of files!

-Tony

NickHK · Oct 13, 2006

Tony,
To continue using your current, working solution, why not pre-process these
problem files by replacing the nulls with spaces.
If it is only those 6 bytes you need to change, using something with:
Open Yourfile For Binary as #freefile
Replace the "00"s with "20" '&H20=decimal 32, [SPACE]
Close #freefile

NickHK

Art H · Oct 13, 2006

Tony,

After investigating the contents of the sample data file, I conclude
that the file contains single byte characters--Unicode is not involved.
Further, the file does contain nulls as suggested by others. The first
few characters of the first line of the data file are:
$$158<s><s><nl><nl><nl><nl><nl><nl><s><s>1<s><s>8
where <s> represents the space character and <nl> represents the null
character. Of course, the decimal equivalent of the space character is
32 and the decimal equivalent of the null character is 0.

The decimal equivalent of a y with double dots above it (this letter is
called a "Latin small letter y with diaeresis" per Character Map) is
255 (hex FF). I don't understand why Word uses this character to
display nulls, but I guess, that's not relevant, but did lead to wrong
conclusions.

I agree with NickHK. Since you need the character positions maintained,
modify your macro to replace the null characters with spaces then place
the resulting string into your Excel worksheet.

Have fun!

Art

T_o_n_y said:
Tony,
To continue using your current, working solution, why not pre-process these
problem files by replacing the nulls with spaces.
If it is only those 6 bytes you need to change, using something with:
Open Yourfile For Binary as #freefile
Replace the "00"s with "20" '&H20=decimal 32, [SPACE]
Close #freefile

NickHK

T_o_n_y said:

Nick,

The presence of the null characters is important because I'm importing data
from thousands of files like these using an Excel VBA routine. The data
within these files is not well-organized by today's standards, but it can be
extracted using string functions such as MID(,,) as long as it is known where
particular fields begin and end. A series of FORTRAN programs has been used
for 30-40 years to manipulate these files, and now the company wants to
modernize.

Up until now, my Excel VBA routine has successfully extracted the data from
thousands of these files so that it can be placed into Excel format. Except
for certain files; the ones having this problem. Apparently a FORTRAN
programmer inadvertantly placed the null characters into certain of these
files while developing certain aspects of the programs.

If the null characters were spaces instead of null characters there would be
little problem since the MID(,,) function would still collect the following
fields in the string correctly. But the null characters being stripped away
means that everything past that point gets offset by 6 characters, causing
things like string conversion to doubles, for example, to cause the program
to crash.

If I can't find a way to keep Excel from stripping the nulls, I suppose I'll
work out an error trapping routine that tells the user to manually fix that
file. I just hope it doesn't amount to hundreds of files!

-Tony

Click to expand...

Art H · Oct 13, 2006

Just for grins put the following Word macro into a blank document and
see what Word does with all characters whose value is from 0 to 255. I
used line continuation hopefully to prevent word wrap issues.

Sub ASCIICharSet()
Selection.WholeStory
Selection.TypeText _
Text:= _
"The data presented below represents all single " & _
"byte characters from 0 to 255 where the format " & _
"presented below is the hexadecimal equivalent of " & _
"the character, followed by the decimal equivalent " & _
"followed by the ASCII character. It might be of " & _
"interest to toggle Show All to see how Word's " & _
"display of certain characters changes. " & _
"Also note that certain characters perform Word " & _
"formatting functions (e.g., 0x0C, which is called " & _
"formfeed, causes a page break)." & vbCr & vbCr & vbCr

For i = 0 To 255
Selection.TypeText _
Text:= _
"(0x" & Format(Hex(i), "00") & _
", " & Format(i, "000") & ")" & Chr(i)
Selection.MoveEnd
Next
End Sub

Art

Art said:
Tony,

After investigating the contents of the sample data file, I conclude
that the file contains single byte characters--Unicode is not involved.
Further, the file does contain nulls as suggested by others. The first
few characters of the first line of the data file are:
$$158<s><s><nl><nl><nl><nl><nl><nl><s><s>1<s><s>8
where <s> represents the space character and <nl> represents the null
character. Of course, the decimal equivalent of the space character is
32 and the decimal equivalent of the null character is 0.

The decimal equivalent of a y with double dots above it (this letter is
called a "Latin small letter y with diaeresis" per Character Map) is
255 (hex FF). I don't understand why Word uses this character to
display nulls, but I guess, that's not relevant, but did lead to wrong
conclusions.

I agree with NickHK. Since you need the character positions maintained,
modify your macro to replace the null characters with spaces then place
the resulting string into your Excel worksheet.

Have fun!

Art

Tony,
To continue using your current, working solution, why not pre-process these
problem files by replacing the nulls with spaces.
If it is only those 6 bytes you need to change, using something with:
Open Yourfile For Binary as #freefile
Replace the "00"s with "20" '&H20=decimal 32, [SPACE]
Close #freefile

NickHK

T_o_n_y said:

Nick,

The presence of the null characters is important because I'm importing data
from thousands of files like these using an Excel VBA routine. The data
within these files is not well-organized by today's standards, but it can be
extracted using string functions such as MID(,,) as long as it is known where
particular fields begin and end. A series of FORTRAN programs has been used
for 30-40 years to manipulate these files, and now the company wants to
modernize.

Up until now, my Excel VBA routine has successfully extracted the data from
thousands of these files so that it can be placed into Excel format. Except
for certain files; the ones having this problem. Apparently a FORTRAN
programmer inadvertantly placed the null characters into certain of these
files while developing certain aspects of the programs.

If the null characters were spaces instead of null characters there would be
little problem since the MID(,,) function would still collect the following
fields in the string correctly. But the null characters being stripped away
means that everything past that point gets offset by 6 characters, causing
things like string conversion to doubles, for example, to cause the program
to crash.

If I can't find a way to keep Excel from stripping the nulls, I suppose I'll
work out an error trapping routine that tells the user to manually fix that
file. I just hope it doesn't amount to hundreds of files!

-Tony

:

Tony,
Probably, but what use are 6 x null characters ?
Do they actually mean anything to you ?
Or are they some garbage that your Fortran app outputs ?

NickHK

Nick,

What you've written makes sense to me, and that is why I've had trouble
concluding that it is a unicode file. As I wrote in an earlier post, I'm
leaning toward it being a text file that has 6 Ascii(0) characters
inserted
into it.

The question remains: is there a way to have Excel import these characters
rather than completely ignoring them?

-Tony

:

Tony,
The "special characters" I see in your uploaded file are Asc(0). It is
not a
Unicode file.

As for the "0" for every other cell, that is expected if you have all
ANSI
text stored in a UNICODE format. The lower byte will always be 0 as no
values exceed decimal 255 or FF hex.

So do you have a Unicode file or not ?

NickHK

Tom,

I'm perplexed at your response because, I'm about as far from ignoring
your
posts as possible. Indeed, I generally skip directly to your posts
when
on
this newsgroup since they are more helpful than anyone's, containing
actual
sample code that can be used.

It's just that the output from the macro you sent me led me to the
conclusion that you were mistaken this time. The output shows "0" for
every
other cell which is not what I would expect from UNICODE with 2 bytes
per
character. Furthermore, rather than revealing the presence of the
special
characters, your macro also had them stripped away.

Here's what I mean. The file I've uploaded contains the following in
the
first line,

$$158++yyyyyy++1++8++4.50 etc...

I've substituting + for spaces and y for the special characters above.
As
you can see there are 2 spaces followed by 6 special characters
followed
by 2
spaces. The output from your macro completely omits the 6 special
characters, if I'm reading it correctly.

As I wrote, I spent "anoter few hours" researching into UNICODE in
order
to
investigate the possibility you raised...but nothing I found seemed to
confirm it. In addition, Excel has two different UNICODE types (UTF-8
and
UTF-7) which one can select in the text import wizard. I tried both
of
them
and neither gave me success in importing the special characters, as
judged
by
using c pearson's CellView add-in, which allows character by character
visualization of cell contents.

Thank you again for your help,
-Tony

:

Guess it was a waste of time trying to explain it to you. Did you
bother
to
read it?

--
Regards,
Tom Ogilvy

I tried your macro, but unfortunately Excel still did not import
the
special
characters. Recall that there are 6 special characters between
the
$$158
and
the 1 8 in the first line of the file:
$$158 1 8 4.50 1.0000 0.8000 3.0010
1.5740

For that section, the output from your macro looked like this:
36 = $
0 =
36 = $
0 =
49 = 1
0 =
53 = 5
0 =
56 = 8
0 =
32 =
0 =
32 =
0 =
32 =
0 =
32 =
0 =
49 = 1
0 =
32 =
0 =
32 =
0 =
56 = 8

In other words, the 6 characters got stripped away again so that
all
you
see
are the 2 spaces which appear on either side of the 6 special
characters.

The only way I've found for Excel to even recognize that those
characters
exist is to use the "Delimited" option during text import and
specify
"spaces" as the delimiting character with the "Treat consecutive
delimiters
as one" feature unchecked. Unfortunately, that method of
importing
would
mean a huge rework of my existing code.

I spent another few hours trying to research the UNICODE
possibilty
you
mentioned, but still was unable to come up with anything.

At a loss...
-Tony

:

put this in a workbook. Change the path to point to your file:

Sub ReadStraightTextFile()
Dim strTest As String
Dim bytArray() As Byte
Dim intcount As Integer
Dim col As Long
Open "E:\Data1\W158.DAT" For Input As #1
col = 0
Do While Not EOF(1)
Line Input #1, strTest
col = col + 1
bytArray = strTest
i = 0
For intcount = LBound(bytArray) To UBound(bytArray)
i = i + 1
Cells(i, col) = bytArray(intcount) & " = " &
Chr(bytArray(intcount))
Next

Loop 'Close the file

Close #1
End Sub

Have blank sheet as the activesheet. Run the macro.

It appears to me that the file is UNICODE. unlike an ascii file
that
has
one byte per character, a unicode file has two bytes per
character.

there are 8 bits to a byte, so an ascii file can have 8^2 = 256
different/unique character codes. In a unicode file, 2 bytes is
16
bits,
so 2^16 = 65536 possible unique characters.

I didn't see any actual characters that couldn't be represented
by
Ascii,
so
you could read every Odd character .

It appears that opening it in Excel automatically converts it to
Ascii,
so
you haven't lost any information, but if you want to edit it and
write it
back out, you would need to save it as Unicode Text. I know that
is
an
option in at least xl2000 and I assume later.

--
Regards,
Tom Ogilvy

Tom,

Thank you for your reply. I followed your procedure but only
got
four
"32"
s in that blank section; that is, there are only 4 spaces
there.
This
confirms what I've suspected, namely, that Excel is simply not
importing
those characters. I've also tried using C. Pearson's Cell View
Add-in
with
the same result (http://www.cpearson.com/excel/CellView.htm).

As you point out, the characters also get stripped when I cut
and
paste
into
this forum. Therefore, I've emailed you separately the file I
referred
to
as
an attachment (it's a text document called W158.DAT) sent from
myother_acct.
If I knew how to post it to this forum, I would.

I appreaciate your help...this is a frustrating problem for me.
Is
there
a
way to import the text file character by character?

-Tony

:

put your string in cell A1. Then in B1 or another cell in the
first
row
put
in this formula

=CODE(MID($A$1,ROW(),1))
Assume the above formula is in B1
in C1:
=CHAR(B1)

now select B1:C1 and drag fill down until the formula starts
returning
#Value errors.

The only thing between the characters in your post are ascii
code
32
which
is a space.

Possibly they didn't get carried forward in the email.

--

Click to expand...

Click to expand...

Tom Ogilvy · Oct 14, 2006

I will eat crow on this one Tony. I was incorrect and misinterpreted the
results and I apologize for suggesting an incorrect solution and sticking
with it.

This code will show you the content of your file:

Sub Test1()
Dim b As Byte
Dim rw As Long
Open "E:\Data1\W158.DAT" For Binary Access Read As #1
Do While Not EOF(1)
Get #1, , b
rw = rw + 1
Cells(rw, 1) = b
Loop
Close #1
End Sub

My problem was forgetting that Visual Basic/VBA internally converts strings
to Unicode for processing.

Guest · Oct 14, 2006

Tom,

Thank you very much. I was able to use your routine to import all the
characters of the file, including the null characters.

Once again, I appeciate that you post real code that works to solve real
problems.

Thanks,
-Tony

Guest · Oct 14, 2006

Art,

Although I agree with you and Nick that preprocessing the files to replace
null characters would solve the problem, Tom Ogilvy has now provided a
solution that will work nicely with my current code.

Thank you for all the help.

Art H said:
Tony,

After investigating the contents of the sample data file, I conclude
that the file contains single byte characters--Unicode is not involved.
Further, the file does contain nulls as suggested by others. The first
few characters of the first line of the data file are:
$$158<s><s><nl><nl><nl><nl><nl><nl><s><s>1<s><s>8
where <s> represents the space character and <nl> represents the null
character. Of course, the decimal equivalent of the space character is
32 and the decimal equivalent of the null character is 0.

The decimal equivalent of a y with double dots above it (this letter is
called a "Latin small letter y with diaeresis" per Character Map) is
255 (hex FF). I don't understand why Word uses this character to
display nulls, but I guess, that's not relevant, but did lead to wrong
conclusions.

I agree with NickHK. Since you need the character positions maintained,
modify your macro to replace the null characters with spaces then place
the resulting string into your Excel worksheet.

Have fun!

Art

Tony,
To continue using your current, working solution, why not pre-process these
problem files by replacing the nulls with spaces.
If it is only those 6 bytes you need to change, using something with:
Open Yourfile For Binary as #freefile
Replace the "00"s with "20" '&H20=decimal 32, [SPACE]
Close #freefile

NickHK

T_o_n_y said:

Nick,

The presence of the null characters is important because I'm importing data
from thousands of files like these using an Excel VBA routine. The data
within these files is not well-organized by today's standards, but it can be
extracted using string functions such as MID(,,) as long as it is known where
particular fields begin and end. A series of FORTRAN programs has been used
for 30-40 years to manipulate these files, and now the company wants to
modernize.

Up until now, my Excel VBA routine has successfully extracted the data from
thousands of these files so that it can be placed into Excel format. Except
for certain files; the ones having this problem. Apparently a FORTRAN
programmer inadvertantly placed the null characters into certain of these
files while developing certain aspects of the programs.

If the null characters were spaces instead of null characters there would be
little problem since the MID(,,) function would still collect the following
fields in the string correctly. But the null characters being stripped away
means that everything past that point gets offset by 6 characters, causing
things like string conversion to doubles, for example, to cause the program
to crash.

If I can't find a way to keep Excel from stripping the nulls, I suppose I'll
work out an error trapping routine that tells the user to manually fix that
file. I just hope it doesn't amount to hundreds of files!

-Tony

:

Tony,
Probably, but what use are 6 x null characters ?
Do they actually mean anything to you ?
Or are they some garbage that your Fortran app outputs ?

NickHK

Nick,

What you've written makes sense to me, and that is why I've had trouble
concluding that it is a unicode file. As I wrote in an earlier post, I'm
leaning toward it being a text file that has 6 Ascii(0) characters
inserted
into it.

The question remains: is there a way to have Excel import these characters
rather than completely ignoring them?

-Tony

:

Tony,
The "special characters" I see in your uploaded file are Asc(0). It is
not a
Unicode file.

As for the "0" for every other cell, that is expected if you have all
ANSI
text stored in a UNICODE format. The lower byte will always be 0 as no
values exceed decimal 255 or FF hex.

So do you have a Unicode file or not ?

NickHK

Tom,

I'm perplexed at your response because, I'm about as far from ignoring
your
posts as possible. Indeed, I generally skip directly to your posts
when
on
this newsgroup since they are more helpful than anyone's, containing
actual
sample code that can be used.

It's just that the output from the macro you sent me led me to the
conclusion that you were mistaken this time. The output shows "0" for
every
other cell which is not what I would expect from UNICODE with 2 bytes
per
character. Furthermore, rather than revealing the presence of the
special
characters, your macro also had them stripped away.

Here's what I mean. The file I've uploaded contains the following in
the
first line,

$$158++yyyyyy++1++8++4.50 etc...

I've substituting + for spaces and y for the special characters above.
As
you can see there are 2 spaces followed by 6 special characters
followed
by 2
spaces. The output from your macro completely omits the 6 special
characters, if I'm reading it correctly.

As I wrote, I spent "anoter few hours" researching into UNICODE in
order
to
investigate the possibility you raised...but nothing I found seemed to
confirm it. In addition, Excel has two different UNICODE types (UTF-8
and
UTF-7) which one can select in the text import wizard. I tried both
of
them
and neither gave me success in importing the special characters, as
judged
by
using c pearson's CellView add-in, which allows character by character
visualization of cell contents.

Thank you again for your help,
-Tony

:

Guess it was a waste of time trying to explain it to you. Did you
bother
to
read it?

--
Regards,
Tom Ogilvy

I tried your macro, but unfortunately Excel still did not import
the
special
characters. Recall that there are 6 special characters between
the
$$158
and
the 1 8 in the first line of the file:
$$158 1 8 4.50 1.0000 0.8000 3.0010
1.5740

For that section, the output from your macro looked like this:
36 = $
0 =
36 = $
0 =
49 = 1
0 =
53 = 5
0 =
56 = 8
0 =
32 =
0 =
32 =
0 =
32 =
0 =
32 =
0 =
49 = 1
0 =
32 =
0 =
32 =
0 =
56 = 8

In other words, the 6 characters got stripped away again so that
all
you
see
are the 2 spaces which appear on either side of the 6 special
characters.

The only way I've found for Excel to even recognize that those
characters
exist is to use the "Delimited" option during text import and
specify
"spaces" as the delimiting character with the "Treat consecutive
delimiters
as one" feature unchecked. Unfortunately, that method of
importing
would
mean a huge rework of my existing code.

I spent another few hours trying to research the UNICODE
possibilty
you
mentioned, but still was unable to come up with anything.

At a loss...
-Tony

:

put this in a workbook. Change the path to point to your file:

Sub ReadStraightTextFile()
Dim strTest As String
Dim bytArray() As Byte
Dim intcount As Integer
Dim col As Long
Open "E:\Data1\W158.DAT" For Input As #1
col = 0
Do While Not EOF(1)
Line Input #1, strTest
col = col + 1
bytArray = strTest
i = 0
For intcount = LBound(bytArray) To UBound(bytArray)
i = i + 1
Cells(i, col) = bytArray(intcount) & " = " &
Chr(bytArray(intcount))
Next

Loop 'Close the file

Click to expand...

Click to expand...

Guest · Oct 14, 2006

Nick,

Please see my note to Art. I appeciate all the help and I've definitely
learned a thing or two on this one.

-Tony

NickHK said:
Tony,
To continue using your current, working solution, why not pre-process these
problem files by replacing the nulls with spaces.
If it is only those 6 bytes you need to change, using something with:
Open Yourfile For Binary as #freefile
Replace the "00"s with "20" '&H20=decimal 32, [SPACE]
Close #freefile

NickHK

T_o_n_y said:

Nick,

The presence of the null characters is important because I'm importing data
from thousands of files like these using an Excel VBA routine. The data
within these files is not well-organized by today's standards, but it can be
extracted using string functions such as MID(,,) as long as it is known where
particular fields begin and end. A series of FORTRAN programs has been used
for 30-40 years to manipulate these files, and now the company wants to
modernize.

Up until now, my Excel VBA routine has successfully extracted the data from
thousands of these files so that it can be placed into Excel format. Except
for certain files; the ones having this problem. Apparently a FORTRAN
programmer inadvertantly placed the null characters into certain of these
files while developing certain aspects of the programs.

If the null characters were spaces instead of null characters there would be
little problem since the MID(,,) function would still collect the following
fields in the string correctly. But the null characters being stripped away
means that everything past that point gets offset by 6 characters, causing
things like string conversion to doubles, for example, to cause the program
to crash.

If I can't find a way to keep Excel from stripping the nulls, I suppose I'll
work out an error trapping routine that tells the user to manually fix that
file. I just hope it doesn't amount to hundreds of files!

-Tony

Click to expand...

NickHK · Oct 16, 2006

The Unicode and "Latin small letter y with diaeresis" were red herring in
this.
Seems that The OP has his solution now anyway.

NickHK

Art H said:
Tony,

After investigating the contents of the sample data file, I conclude
that the file contains single byte characters--Unicode is not involved.
Further, the file does contain nulls as suggested by others. The first
few characters of the first line of the data file are:
$$158<s><s><nl><nl><nl><nl><nl><nl><s><s>1<s><s>8
where <s> represents the space character and <nl> represents the null
character. Of course, the decimal equivalent of the space character is
32 and the decimal equivalent of the null character is 0.

The decimal equivalent of a y with double dots above it (this letter is
called a "Latin small letter y with diaeresis" per Character Map) is
255 (hex FF). I don't understand why Word uses this character to
display nulls, but I guess, that's not relevant, but did lead to wrong
conclusions.

I agree with NickHK. Since you need the character positions maintained,
modify your macro to replace the null characters with spaces then place
the resulting string into your Excel worksheet.

Have fun!

Art

Tony,
To continue using your current, working solution, why not pre-process these
problem files by replacing the nulls with spaces.
If it is only those 6 bytes you need to change, using something with:
Open Yourfile For Binary as #freefile
Replace the "00"s with "20" '&H20=decimal 32, [SPACE]
Close #freefile

NickHK

T_o_n_y said:

Nick,

The presence of the null characters is important because I'm importing data
from thousands of files like these using an Excel VBA routine. The data
within these files is not well-organized by today's standards, but it

Click to expand...

can
be

extracted using string functions such as MID(,,) as long as it is

Click to expand...

known
where

particular fields begin and end. A series of FORTRAN programs has

Click to expand...

been
used

for 30-40 years to manipulate these files, and now the company wants to
modernize.

Up until now, my Excel VBA routine has successfully extracted the data from
thousands of these files so that it can be placed into Excel format. Except
for certain files; the ones having this problem. Apparently a FORTRAN
programmer inadvertantly placed the null characters into certain of these
files while developing certain aspects of the programs.

If the null characters were spaces instead of null characters there

Click to expand...

would
be

little problem since the MID(,,) function would still collect the following
fields in the string correctly. But the null characters being

Click to expand...

stripped
away

means that everything past that point gets offset by 6 characters, causing
things like string conversion to doubles, for example, to cause the program
to crash.

If I can't find a way to keep Excel from stripping the nulls, I

Click to expand...

suppose
I'll

work out an error trapping routine that tells the user to manually fix that
file. I just hope it doesn't amount to hundreds of files!

-Tony

:

Tony,
Probably, but what use are 6 x null characters ?
Do they actually mean anything to you ?
Or are they some garbage that your Fortran app outputs ?

NickHK

Nick,

What you've written makes sense to me, and that is why I've had trouble
concluding that it is a unicode file. As I wrote in an earlier

Click to expand...

post,
I'm

leaning toward it being a text file that has 6 Ascii(0) characters
inserted
into it.

The question remains: is there a way to have Excel import these characters
rather than completely ignoring them?

-Tony

:

Tony,
The "special characters" I see in your uploaded file are Asc(0).

Click to expand...

It
is

not a
Unicode file.

As for the "0" for every other cell, that is expected if you

Click to expand...

have
all

ANSI
text stored in a UNICODE format. The lower byte will always be 0

Click to expand...

as
no

values exceed decimal 255 or FF hex.

So do you have a Unicode file or not ?

NickHK

Tom,

I'm perplexed at your response because, I'm about as far from ignoring
your
posts as possible. Indeed, I generally skip directly to your posts
when
on
this newsgroup since they are more helpful than anyone's, containing
actual
sample code that can be used.

It's just that the output from the macro you sent me led me to the
conclusion that you were mistaken this time. The output shows

Click to expand...

"0"
for

every
other cell which is not what I would expect from UNICODE with

Click to expand...

2
bytes

per
character. Furthermore, rather than revealing the presence of the
special
characters, your macro also had them stripped away.

Here's what I mean. The file I've uploaded contains the

Click to expand...

following
in

the
first line,

$$158++yyyyyy++1++8++4.50 etc...

I've substituting + for spaces and y for the special

Click to expand...

characters
above.

As
you can see there are 2 spaces followed by 6 special characters
followed
by 2
spaces. The output from your macro completely omits the 6 special
characters, if I'm reading it correctly.

As I wrote, I spent "anoter few hours" researching into UNICODE in
order
to
investigate the possibility you raised...but nothing I found seemed to
confirm it. In addition, Excel has two different UNICODE

Click to expand...

types
(UTF-8

and
UTF-7) which one can select in the text import wizard. I

Click to expand...

tried
both

of
them
and neither gave me success in importing the special

Click to expand...

characters,
as

judged
by
using c pearson's CellView add-in, which allows character by character
visualization of cell contents.

Thank you again for your help,
-Tony

:

Guess it was a waste of time trying to explain it to you.

Click to expand...

Did
you

bother
to
read it?

--
Regards,
Tom Ogilvy

I tried your macro, but unfortunately Excel still did not import
the
special
characters. Recall that there are 6 special characters between
the
$$158
and
the 1 8 in the first line of the file:
$$158 1 8 4.50 1.0000 0.8000 3.0010
1.5740

For that section, the output from your macro looked like this:
36 = $
0 =
36 = $
0 =
49 = 1
0 =
53 = 5
0 =
56 = 8
0 =
32 =
0 =
32 =
0 =
32 =
0 =
32 =
0 =
49 = 1
0 =
32 =
0 =
32 =
0 =
56 = 8

In other words, the 6 characters got stripped away again

Click to expand...

so
that

all
you
see
are the 2 spaces which appear on either side of the 6 special
characters.

The only way I've found for Excel to even recognize that those
characters
exist is to use the "Delimited" option during text import and
specify
"spaces" as the delimiting character with the "Treat consecutive
delimiters
as one" feature unchecked. Unfortunately, that method of
importing
would
mean a huge rework of my existing code.

I spent another few hours trying to research the UNICODE
possibilty
you
mentioned, but still was unable to come up with anything.

At a loss...
-Tony

:

put this in a workbook. Change the path to point to your file:

Sub ReadStraightTextFile()
Dim strTest As String
Dim bytArray() As Byte
Dim intcount As Integer
Dim col As Long
Open "E:\Data1\W158.DAT" For Input As #1
col = 0
Do While Not EOF(1)
Line Input #1, strTest
col = col + 1
bytArray = strTest
i = 0
For intcount = LBound(bytArray) To UBound(bytArray)
i = i + 1
Cells(i, col) = bytArray(intcount) & " = " &
Chr(bytArray(intcount))
Next

Loop 'Close the file

Close #1
End Sub

Have blank sheet as the activesheet. Run the macro.

It appears to me that the file is UNICODE. unlike an

Click to expand...

ascii
file

that
has
one byte per character, a unicode file has two bytes per
character.

there are 8 bits to a byte, so an ascii file can have 8^2

Click to expand...

=
256

different/unique character codes. In a unicode file, 2 bytes is
16
bits,
so 2^16 = 65536 possible unique characters.

I didn't see any actual characters that couldn't be represented
by
Ascii,
so
you could read every Odd character .

It appears that opening it in Excel automatically

Click to expand...

converts it
to

Ascii,
so
you haven't lost any information, but if you want to edit

Click to expand...

it
and

write it
back out, you would need to save it as Unicode Text. I

Click to expand...

know
that

is
an
option in at least xl2000 and I assume later.

--
Regards,
Tom Ogilvy

Tom,

Thank you for your reply. I followed your procedure

Click to expand...

but
only

got
four
"32"
s in that blank section; that is, there are only 4 spaces
there.
This
confirms what I've suspected, namely, that Excel is

Click to expand...

simply
not

importing
those characters. I've also tried using C. Pearson's

Click to expand...

Cell
View

Add-in
with
the same result (http://www.cpearson.com/excel/CellView.htm).

As you point out, the characters also get stripped when

Click to expand...

I
cut

and
paste
into
this forum. Therefore, I've emailed you separately the file I
referred
to
as
an attachment (it's a text document called W158.DAT)

Click to expand...

sent
from

myother_acct.
If I knew how to post it to this forum, I would.

I appreaciate your help...this is a frustrating problem

Click to expand...

for
me.

Is
there
a
way to import the text file character by character?

-Tony

:

put your string in cell A1. Then in B1 or another

Click to expand...

cell in
the

first
row
put
in this formula

=CODE(MID($A$1,ROW(),1))
Assume the above formula is in B1
in C1:
=CHAR(B1)

now select B1:C1 and drag fill down until the formula starts
returning
#Value errors.

The only thing between the characters in your post are ascii
code
32
which
is a space.

Possibly they didn't get carried forward in the email.

--

Click to expand...

Click to expand...

CPearsons import text code	3	Mar 2, 2008
Import & Parsing a small 400 line CSV file	3	Mar 11, 2006
Run-time error '1004': The PivotTable field name is not valid....	1	Jul 2, 2008
Change delimiter in import function from comma to tab	1	Jun 13, 2006
Problems with Importing a File	1	Jul 27, 2006
How Do I Modify Pearson's Code to Add "Treat Consecutive Delimiters As One"?	4	Jun 12, 2006
Skipping files which exist as named cell values - 1004 application error	2	Mar 28, 2007
importing several text files into different excel worksheet	1	Sep 2, 2004

Importing Text files w/o losing special characters

NickHK

Guest

NickHK

Art H

Art H

Tom Ogilvy

Guest

Guest

Guest

NickHK

Ask a Question

Similar Threads