[ANN] -- Program to remove duplicate lines from large files

D

Dewey Edwards

Hi,

About two months ago JohnF asked if anyone knew of a program to remove
duplicate lines from 100K+ line text files. I shot my mouth off, that
it would be a piece of cake to write.

Maybe it WASN'T that easy, but it's now done.

home.nycap.rr.com/dewed/rm_dup_lines_v0.1.zip

about 24K

Program is freeware, actually greenware too. No install and no
registry entries added by the program. Current version requires it to
be run in a DOS box, however, there is no 640K limitations. The
program has run successfully on a 20+ MEG text file. Run time was
under 5 seconds.

The program neither sorts the input file, nor stores it internally.
It does, however, save line "differences" rather than the lines
themselves in a TRIE data structure. See the included README file for
more information.

Hope it helps somebody,

dewey
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top