Duplicate Text Line Remover?

B

Bill Hinds

Anybody know of a utility that will remove duplicate lines in a sorted text
file? For instance, if I have...

abcd
abcd
abcd

....it would remove two lines.

Thanks.
 
R

rir3760

It was a dark and stormy night when "Bill Hinds"
Anybody know of a utility that will remove duplicate lines in a
sorted text file? For instance, if I have...

abcd
abcd
abcd

...it would remove two lines.

If you don't mind using command line tools you can use Uniq:

<Quote>
Usage: Uniq. [OPTION] ... [INPUT [OUTPUT]]

Discard all but one of successive identical lines from INPUT (or
standard input), writing to OUTPUT (or standard output).

Options:

-c -or- --count
Prefix lines by the number of occurrences.

-d -or- --repeated
Only print duplicate lines.

-D -or- --all-repeated[=delimit-method]
Print all duplicate lines delimit-method={none
(default),prepend,separate} Delimiting is done with blank lines.

-f -or- --skip-fields=N
Avoid comparing the first N fields.

-i -or- --ignore-case
Ignore differences in case when comparing.

-s -or- --skip-chars=N
Avoid comparing the first N characters.

-u -or- --unique
Only print unique lines.

-w -or- --check-chars=N
Compare no more than N characters in lines.

--help
Display this help and exit

--version
Output version information and exit

A field is a run of whitespace, then non-whitespace characters.
Fields are skipped before chars.

Report bugs to <[email protected]>.
</Quote>

Uniq is part of the 'UnxUtils' package:
<http://unxutils.sourceforge.net/>

Hope this helps
 
B

Bjorn Simonsen

Bill Hinds wrote in
Anybody know of a utility that will remove duplicate lines in a sorted text
file?

Another commandline option is SED, see Eric Pement's site:
<http://www.student.northpark.edu/pemente/sed/index.htm

including SED FAQ:
<http://www.student.northpark.edu/pemente/sed/sedfaq.html>
<http://www.student.northpark.edu/pemente/sed/sedfaq.txt>

and "100 handy one-liners for sed"
<http://www.student.northpark.edu/pemente/sed/sed1line.txt>

From his very handy "handy one-liners" doc:
<quote>

# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'

# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

</quote>

You might also want to investigate some of the alternative programs
he suggested under "related batch editing files" on the main page
<http://www.student.northpark.edu/pemente/sed/index.htm>


All the best,
Bjorn Simonsen
 
G

Gabriele Neukam

On that special day, Bill Hinds, ([email protected]) said...
Anybody know of a utility that will remove duplicate lines in a sorted text
file?

Note Tab Free should do that, according to the help file. Quote:

"Sort
Sorts the selected lines or the entire document in ascending or
descending alphanumerical order. Use the settings "Case Sensitive
Sorting" and "Sort Removes Duplicates" on the Tools tab in the Options
dialog box to control the result. Depending on the number of lines to
sort and the amount of available RAM, this procedure may take quite
long!"

The "Options" menu is the lowest entry in the "View" menu.

www.notetab.com


Gabriele Neukam

(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Extract only text from column 6
Text Form Field Options 2
Need SUBSTRING like function 3
case sensitive filter 5
Insert Rows Macro 7
convert letters to lines 1
Keeping Leading Zeros 2
sumproduct by year?? 5

Top