Unicode text and carriage returns

G

Greg Lovern

If I enter carriage returns in a cell (Alt + Enter), then save the
sheet from Excel as "Unicode Text" file type, then open the file, the
carriage returns are interpreted as delimiters.

In the Text Import Wizard, I've made sure that the only delimter is
Tab, which is the default when opening that file.

If I look at the file in Notepad, the Alt + Enter characters do not
look like tabs at all. I have other tabs in the file and they look
like tabs, but the Alt + Enter characters look different.

In Step 1 of Excel's Text Import Wizard, I've tried all the variations
I can think off -- UTF-7, UTF-8, 1252, US-ASCII.

In Step 2 of the Wizard, the text qualifier is """ (double-quote, the
default), and in Notepad I see that the whole cell contents, including
Alt+Enter carriage returns, are enclosed in doublequotes.

Any ideas? How can I save carriage returns in cells, save the sheet as
Unicode text, and import the text file without the carriage returns
being interpreted as delimiters?


Thanks,

Greg
 
G

Gary''s Student

You can cheat.

Before saving the file as Unicode Text, change all the embedded hard returns
( CHAR(10) ) into another character that you are not using (perhaps tilda).

When you re-import the file, the tildas will not be considered cell
delimiters. You can change them back into hard returns after the import.
 
G

Greg Lovern

If I enter carriage returns in a cell (Alt + Enter), then save the
sheet from Excel as "Unicode Text" file type, then open the file, the
carriage returns are interpreted as delimiters.

Correction -- they are interpreted as carriage returns, not
delimiters. So the next character goes to column A of the next row
down, not to the next cell to the right.

Still seems like an Excel bug, since the whole text in the cell is
enclosed in doublequotes when saving as Unicode Text.


For example, this is how it looks in the cell:

========================
First text row

Next text row
========================


This is how that part looks in Notepad after saving from Excel as
Unicode Text:

========================
"First text rowNext text row"
========================


Then when it is imported to Excel, the first part, "First text row",
is imported to the correct cell, then the next row of cells is blank,
then the "Next text row" part is in column A of the next row down.

Then, everything else in that row is in the wrong column, due to
starting the row over in column A.


Greg
 
G

Greg Lovern

You can cheat.

Before saving the file as Unicode Text, change all the embedded hard returns
( CHAR(10) ) into another character that you are not using (perhaps tilda).

When you re-import the file, the tildas will not be considered cell
delimiters.  You can change them back into hard returns after the import.


Thanks, but that would break other apps that read the files, and would
require other apps that write the files to make the same change.

For the time being, I've just removed the carriage returns from the
one cell that had them. But I expect that the users will want more of
them in the future, so I don't see that as a long-term solution.

Since the files have an expected number of columns for all rows, I
think what I might do is, after importing the file, check the last
column of data for any blanks, and assume the blank was caused by a
carriage return. Then move the data back to the correct positions
piece by piece, inserting the Chr(10) carriage return characters as
needed, until I get to a row where the expected last column is
occupied. (This would all be in VBA, which is already doing the
import.)

Another option would be to do the import with the Input Statement.
That would be safer but possibly slow with these very large files.


It would certainly be less painful to find a way to get it to import
correctly. Any ideas?



By the way, it turns out to be weirder than I thought. Excel's help on
the Text Import Wizard says that if a delimeter (Tab) is found within
two text qualifier (doublequote) characters, then the the text
qualifier characters are removed. But if no delimiter is found between
two text qualifier characters, then the text qualifier characters are
preserved:

===============================================
Text qualifier
Select the character that encloses values in your text file. When
Excel encounters the text qualifier character, all of the text that
follows that character and precedes the next occurrence of that
character is imported as one value, even if the text contains a
delimiter character. For example, if the delimiter is a comma (,) and
the text qualifier is a quotation mark ("), "Dallas, Texas" would be
imported into one cell as Dallas, Texas. If no qualifier or if the
apostrophe (') qualifier were specified, "Dallas, Texas" would be
imported into two adjacent cells as "Dallas and Texas".

If the delimiter character occurs between text qualifiers, Excel omits
the qualifiers in the imported value. If no delimiter character occurs
between text qualifiers, Excel includes the qualifier character in the
imported value. Hence, "Dallas Texas" (using the quotation mark text
qualifier) would be imported into one cell as "Dallas Texas".
===============================================

When Excel saves it out as Unicode Text, it encloses the cell contents
containing carriage returns in doublequote characters:

========================
"First text rowNext text row"
========================

But when Excel imports that text file, look what it does:

========================
(in the correct cell)
First text row

(in column A of the next row down)
Next text row"
========================

The doublequote is removed from the first part, as if it had found a
delimiter enclosed by doublequotes, while the doublequote is preserved
on the last part, as if it did NOT find a delimiter enclosed by
doublequotes. Weird.



Thanks,

Greg
 
G

Greg Lovern

The doublequote is removed from the first part, as if it had found a
delimiter enclosed by doublequotes, while the doublequote is preserved
on the last part, as if it did NOT find a delimiter enclosed by
doublequotes. Weird.

Sorry about so many posts, but I just had to add:

It's even more weird saving as CSV. When the CSV file is imported, all
text in each cell after the cell's first in-cell carriage returns is
just deleted.

For example, if I save this cell out as CSV:

========================
First text row


Next text row
========================


It imports like this:

========================
First text row
========================

If I look at the CSV in Notepad, it looks the same as when saved out
as Unicode Text:

========================
"First text rowNext text row"
========================

BTW, if I arrow across the characters in Notepad, it pauses between
the first "w" and the "N", apparently indicating the presence of a
character that has a width of zero -- the carriage return of course.


Also BTW - same problem with plain text (tab delim) as with Unicode
text).


Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top