Unix text files

Peter Hucker · Jul 19, 2004

Trying to Make a unix text file for my webserver, it has to be in unix format (ie. only a linefeed and no carriage return). Am I right in thinking I should save using Notepad in "unicode"? How can I check the file is correct?

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

Wedding rings: the world's smallest handcuffs.

Tim Slattery · Jul 20, 2004

Peter Hucker said:
Trying to Make a unix text file for my webserver, it has to
be in unix format (ie. only a linefeed and no carriage return).
Am I right in thinking I should save using Notepad in "unicode"?
How can I check the file is correct?

There are probably other ways to do this, but here's an easy one: go
to http://www.lancs.ac.uk/people/cpaap/pfe/ and download Programmer's
File Editor PFE). The author has, unfortunately, stopped development
on it, but it will do exactly what you need.

Open it up, type your file (or create it whatever way you create it).
Look at the status bar at the bottom of the PFE window. One of the
little boxes says "DOS". That's telling you that the file is in
DOS/Windows format, which means that lines are terminated by a CR/LF
pair. To switch to Unix format, just double-click that box. It will
now say "Unix". Just save the file and you're done.

Peter Hucker · Jul 20, 2004

There are probably other ways to do this, but here's an easy one: go
to http://www.lancs.ac.uk/people/cpaap/pfe/ and download Programmer's
File Editor PFE). The author has, unfortunately, stopped development
on it, but it will do exactly what you need.

Open it up, type your file (or create it whatever way you create it).
Look at the status bar at the bottom of the PFE window. One of the
little boxes says "DOS". That's telling you that the file is in
DOS/Windows format, which means that lines are terminated by a CR/LF
pair. To switch to Unix format, just double-click that box. It will
now say "Unix". Just save the file and you're done.

Thanks - opening the text file I made in notepad and saved as unicode appears as "unix", but has weird characters. Likewise making the file again in PFE and opening in notepad produces weird characters! You'd think text would be simple! Why on earth does windows need TWO things to indicate a new line?

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

"These stretch pants come with a warranty of one year or 500,000 calories... whichever comes first."

Rob Schneider · Jul 20, 2004

My understanding is that the program on the Unix box which presents the
file system to the XP box has settings which control how
linefeeds/carriage returns are handled. Look at this software. On
Linux boxes this is typically Samba. For my setup of access to shares
on a Linux box via Samba ... I just use Notepad/Wordpad from XP, and
Pico/Emax/Jedit on the Linux side. Works without having to do anything
special at all.

Hope this is useful to you. Let us know.

rms

Tim Slattery · Jul 20, 2004

Peter Hucker said:
Thanks - opening the text file I made in notepad and saved as unicode
appears as "unix", but has weird characters. Likewise making the file
again in PFE and opening in notepad produces weird characters! You'd
think text would be simple! Why on earth does windows need TWO things
to indicate a new line?

Forget Unicode! "Unix File Format" means an ASCII file that uses a
single byte as a line delimiter (I forget whether it's CR or LF.
Probably CR). Opening a Unicode file in PFE would indeed generate
garbage characters, since Unicode uses two bytes per character, but
PFE (and Unix) expects one byte per character. A file created in PFE
and saved in DOS format should be readable in Notepad. If you save it
in Unix format, Notepad will have trouble with the line terminations,
and will show some trash characters.

Why the difference in line termination codes? I don't know. I suppose
some early PC printers needed to be told to return the carriage to the
left edge and to scroll the paper up.

Peter Hucker · Jul 20, 2004

Forget Unicode!

Well the "uni" bit you see

"Unix File Format" means an ASCII file that uses a
single byte as a line delimiter (I forget whether it's CR or LF.
Probably CR). Opening a Unicode file in PFE would indeed generate
garbage characters, since Unicode uses two bytes per character, but
PFE (and Unix) expects one byte per character. A file created in PFE
and saved in DOS format should be readable in Notepad. If you save it
in Unix format, Notepad will have trouble with the line terminations,
and will show some trash characters.

Why the difference in line termination codes? I don't know. I suppose
some early PC printers needed to be told to return the carriage to the
left edge and to scroll the paper up.

Yes I'd heard that - wasn't unix around then?

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

How does an Italian get into an honest business?
Through the skylight.

Juan I. Cahis · Jul 20, 2004

Dear Peter & friends:

I have two very small programs, from the old happy MsDos days, D2U.EXE
and U2D.EXE, less than 10k each, that they will do the task happily
and without any problem.

I will try to attach them here, and to send them to you by email.

Peter Hucker said:
Trying to Make a unix text file for my webserver, it has to be in unix format (ie. only a linefeed and no carriage return). Am I right in thinking I should save using Notepad in "unicode"? How can I check the file is correct?

Thanks
Juan I. Cahis
Santiago de Chile (South America)
Note: Please forgive me for my bad English, I am trying to improve it!

Peter Hucker · Jul 20, 2004

Can't see it here, but I got the email thanks.

Dear Peter & friends:

I have two very small programs, from the old happy MsDos days, D2U.EXE
and U2D.EXE, less than 10k each, that they will do the task happily
and without any problem.

I will try to attach them here, and to send them to you by email.

Thanks
Juan I. Cahis
Santiago de Chile (South America)
Note: Please forgive me for my bad English, I am trying to improve it!

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

Del *.* for 100% file compression

Tim Slattery · Jul 21, 2004

Yes I'd heard that - wasn't unix around then?

I think so. I think Unix appeared in the late sixties, and the first
8-bit PCs were in the mid to late 70s.

Peter Hucker · Jul 21, 2004

I think so. I think Unix appeared in the late sixties, and the first
8-bit PCs were in the mid to late 70s.

So how come Unix printers were ok?

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

Men are from Mars, Women are from Venus, Managers are from Uranus.

David Candy · Jul 21, 2004

Unix really uses CR/LF. But to save a byte they imply a CR (but there are terminal settings that can put it back). The printer would recieve CR/LF. This is based on teletype machines, far superior to those new fangled fax things. Fax forced business communication from the printed word to the scribbled, washed out, and unreadable word that was also evidence of a legal contract (that would fade in two years).

Alexander Grigoriev · Jul 22, 2004

FTP it to the (Unix) server in ASCII mode. It will be converted to UNIX
format (LF-separated lines). FTP it back in BINARY mode - you'll get it
still as Unix format.

Peter Hucker said:
Thanks - opening the text file I made in notepad and saved as unicode

Click to expand...

appears as "unix", but has weird characters. Likewise making the file again
in PFE and opening in notepad produces weird characters! You'd think text
would be simple! Why on earth does windows need TWO things to indicate a
new line?

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

"These stretch pants come with a warranty of one year or 500,000

Click to expand...

calories... whichever comes first."

Peter Hucker · Jul 22, 2004

Actually it is a file served by my windows web server (this machine) - I needed it to be in unix format.

Peter Hucker said:
FTP it to the (Unix) server in ASCII mode. It will be converted to UNIX
format (LF-separated lines). FTP it back in BINARY mode - you'll get it
still as Unix format.

Peter Hucker said:

appears as "unix", but has weird characters. Likewise making the file again
in PFE and opening in notepad produces weird characters! You'd think text
would be simple! Why on earth does windows need TWO things to indicate a
new line?
calories... whichever comes first."

Click to expand...

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

In the event that all else has failed, and it seems tempting to actually read the instructions, don't panic: Get a bigger hammer!

David Candy · Jul 22, 2004

Save it in Plain Text in Word. You'll be asked what line endings you want (recent versions). Or Search (^p = paragraph mark) / Replace (^l - NewLine, ie LF). A paragraph mark is CR internally but is also an index into a database in Word so is different to a plain CR, similar to LF in Unix really being CR/LF.

Or write a script. Here's a generic one (though more complex than just a CR stripper - that would be easy to write)

Script this.

ReplaceRegExp Filename SearchString [Replace String]
Enclose stuff with quotes if it contains a space.

The file is attached too.

If searching it pops up a dialog on each match. If Replacing it also pops up a dialog saying what file was replaced. It support RegExp (see end of post for syntax). I wrote this (without the replace part) to search for unicode or ansi in binary files.

To do many files use the for command

for %A in (c:\*.*) do start /w ReplaceRegExp "%A" dog "cat lover"

On Error Resume Next
Set ShellApp = CreateObject("Shell.Application")
ReportErrors "Creating Shell.App"
set WshShell = WScript.CreateObject("WScript.Shell")
ReportErrors "Creating Wscript.Shell"
Set objArgs = WScript.Arguments
ReportErrors "Creating Wscript.Arg"
Set regEx = New RegExp
ReportErrors "Creating RegEx"
Set fso = CreateObject("Scripting.FileSystemObject")
ReportErrors "Creating FSO"

WshShell.RegWrite "HKLM\Software\Microsoft\Windows\CurrentVersion\App Paths\" & Wscript.ScriptName & "\", Chr(34) & Wscript.ScriptFullName & Chr(34)
WshShell.RegWrite "HKLM\Software\Microsoft\Windows\CurrentVersion\App Paths\" & Left(Wscript.ScriptName, Len(Wscript.ScriptName)-3) & "exe" & "\", Chr(34) & Wscript.ScriptFullName & Chr(34)
ReportErrors "Updating App Paths"
If objArgs.Count = 0 then
MsgBox "No parameters", 16, "Serenity's Run Shortcut"
ReportErrors "Help"
ElseIf objArgs.Count = 1 then
MsgBox "Only one parameter", 16, "Serenity's Run Shortcut"
ReportErrors "Help"
ElseIf objArgs.Count = 2 then
Set srcfile = fso.GetFile(objArgs(0))
ReportErrors "srcFile"
If err.number = 0 then Set TS = srcFile.OpenAsTextStream(1, 0)
If err.number <> 0 then
Msgbox err.description & " " & srcFile.path, 48, "Serenity's Search"
err.clear
else
ReportErrors "TS" & " " & srcFile.path
Src=ts.readall
If err.number = 62 then
err.clear
else
ReportErrors "ReadTS" & " " & srcFile.path
regEx.Pattern = objArgs(1)
regEx.IgnoreCase = True
regEx.Global = True
If regEx.Test(Src) = True then
Msgbox "Found in " & srcfile.path, 64, "Serenity's Search"
End If
End If
End If
ReportErrors "Check OK" & " " & srcFile.path

Elseif objArgs.count = 3 then
Set srcfile = fso.GetFile(objArgs(0))
ReportErrors "srcFile"
If err.number = 0 then Set TS = srcFile.OpenAsTextStream(1, 0)
If err.number <> 0 then
Msgbox err.description & " " & srcFile.path, 48, "Serenity's Search"
err.clear
else
ReportErrors "TS" & " " & srcFile.path
Src=ts.readall
If err.number = 62 then
err.clear
else
ReportErrors "ReadTS" & " " & srcFile.path
regEx.Pattern = objArgs(1)
regEx.IgnoreCase = True
regEx.Global = True
NewSrc= regEx.Replace(Src, objArgs(2))
If NewSrc<>Src then
Msgbox "Replacement made in " & srcfile.path, 64, "Serenity's Search"
TS.close
Set TS = srcFile.OpenAsTextStream(2, 0)
ts.write newsrc
ReportErrors "Writing file"
End If
End If
End If
ReportErrors "Check OK" & " " & srcFile.path

Else
MsgBox "Too many parameters", 16, "Serenity's Run Shortcut"
ReportErrors "Help"

ReportErrors "All Others"
End If

Sub ReportErrors(strModuleName)
If err.number<>0 then Msgbox "An unexpected error occurred. This dialog provides details on the error." & vbCRLF & vbCRLF & "Error Details " & vbCRLF & vbCRLF & "Script Name" & vbTab & Wscript.ScriptFullName & vbCRLF & "Module" & vbtab & vbTab & strModuleName & vbCRLF & "Error Number" & vbTab & err.number & vbCRLF & "Description" & vbTab & err.description, vbCritical + vbOKOnly, "Something unexpected"
Err.clear
End Sub

=============================
Settings
Special characters and sequences are used in writing patterns for regular expressions. The following table describes and gives an example of the characters and sequences that can be used.

Character Description
\ Marks the next character as either a special character or a literal. For example, "n" matches the character "n". "\n" matches a newline character. The sequence "\\" matches "\" and "$" matches "(".
^ Matches the beginning of input.
$ Matches the end of input.
* Matches the preceding character zero or more times. For example, "zo*" matches either "z" or "zoo".
+ Matches the preceding character one or more times. For example, "zo+" matches "zoo" but not "z".
? Matches the preceding character zero or one time. For example, "a?ve?" matches the "ve" in "never".
. Matches any single character except a newline character.
(pattern) Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using Item [0]...[n]. To match parentheses characters ( ), use "\(" or "$".
x|y Matches either x or y. For example, "z|wood" matches "z" or "wood". "(z|w)oo" matches "zoo" or "wood".
{n} n is a nonnegative integer. Matches exactly n times. For example, "o{2}" does not match the "o" in "Bob," but matches the first two o's in "foooood".
{n,} n is a nonnegative integer. Matches at least n times. For example, "o{2,}" does not match the "o" in "Bob" and matches all the o's in "foooood." "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
{n,m} m and n are nonnegative integers. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood." "o{0,1}" is equivalent to "o?".
[xyz] A character set. Matches any one of the enclosed characters. For example, "[abc]" matches the "a" in "plain".
[^xyz] A negative character set. Matches any character not enclosed. For example, "[^abc]" matches the "p" in "plain".
[a-z] A range of characters. Matches any character in the specified range. For example, "[a-z]" matches any lowercase alphabetic character in the range "a" through "z".
[^m-z] A negative range characters. Matches any character not in the specified range. For example, "[m-z]" matches any character not in the range "m" through "z".
\b Matches a word boundary, that is, the position between a word and a space. For example, "er\b" matches the "er" in "never" but not the "er" in "verb".
\B Matches a non-word boundary. "ea*r\B" matches the "ear" in "never early".
\d Matches a digit character. Equivalent to [0-9].
\D Matches a non-digit character. Equivalent to [^0-9].
\f Matches a form-feed character.
\n Matches a newline character.
\r Matches a carriage return character.
\s Matches any white space including space, tab, form-feed, etc. Equivalent to "[ \f\n\r\t\v]".
\S Matches any nonwhite space character. Equivalent to "[^ \f\n\r\t\v]".
\t Matches a tab character.
\v Matches a vertical tab character.
\w Matches any word character including underscore. Equivalent to "[A-Za-z0-9_]".
\W Matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
\num Matches num, where num is a positive integer. A reference back to remembered matches. For example, "(.)\1" matches two consecutive identical characters.
\n Matches n, where n is an octal escape value. Octal escape values must be 1, 2, or 3 digits long. For example, "\11" and "\011" both match a tab character. "\0011" is the equivalent of "\001" & "1". Octal escape values must not exceed 256. If they do, only the first two digits comprise the expression. Allows ASCII codes to be used in regular expressions.
\xn Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows ASCII codes to be used in regular expressions.

--
----------------------------------------------------------
'Not happy John! Defending our democracy',
http://www.smh.com.au/articles/2004/06/29/1088392635123.html

Peter Hucker said:
Actually it is a file served by my windows web server (this machine) - I needed it to be in unix format.

Stuck WAV definition	3	Jul 13, 2004
PCI X	1	Sep 10, 2004
Trouble with hosts file - am I doing this right?	5	Aug 6, 2004
Missing "winlogon" in event viewer? Need to see chkdsk log!	2	Jun 23, 2004
Cannot import foreign disks!	1	Jul 28, 2004
Two local IP addresses, cannot see my own webserver!	7	Sep 5, 2004
64 bit PCI - anyone seen a card that works?!	4	Jul 9, 2004
Dell Powervault Error (?) message	52	Jun 24, 2004

Unix text files

Peter Hucker

Tim Slattery

Peter Hucker

Rob Schneider

Tim Slattery

Peter Hucker

Juan I. Cahis

Peter Hucker

Tim Slattery

Peter Hucker

David Candy

Alexander Grigoriev

Peter Hucker

David Candy

Ask a Question

Similar Threads