find and remove duplicate files

  • Thread starter Thread starter destatnj2
  • Start date Start date
D

destatnj2

Hi All,

I would like to do a case insensitive search in a directory and all it's
subdirectories for duplicate files, and if any are found, delete them. I
have written this:

@echo off
setlocal

for /f "tokens=*" %%A in ('dir /b /s /a:-d') do call :checkForDups "%%A"
"%%~nA" "%%~pA"
goto :eof

:::::::::::::::::::::::::::::::
:checkForDups

echo.
echo Search string: %2, file is: %1

for /f "tokens=*" %%B in ('dir /b /s /a:-d') do (
if /i "%%~nB" EQU %2 (
if "%%~pB" NEQ %3 (
echo Match found: %%B
del /q "%%B"
)
)
)

As you can see, this does NOT work as desired... The current file will
eventually be deleted because the DIR command in the original FOR loop isn't
updated after duplicates are deleted. Can someone clue me in?

Thanks,
Pete
 
destatnj2 said:
Hi All,

I would like to do a case insensitive search in a directory and all it's
subdirectories for duplicate files, and if any are found, delete them. I
have written this:

I see several disadvatages with your aproach:
- it is unnecessary timeconsuming, to iterate the whole tree for every
file. If there weren't better aproaches, the checkForDupes could use
the name _and_ extension for a dir /s restricting this for name dupes.
- %%~nA searches only the name part - what about file content/date time?

I stumbled today over a batch from Herbert Kleebauer: dupli.bat which
builds a file with md5 keys, sorting these, this way finding content
dupes. The idea of a file could be extended to include an iso-date&time,
the name part and be sorted after different params to decide what are the
dupes and eventually creating links instead of simply deleting the dupes

http://groups.google.com/[email protected]

HTH
 
destatnj2 said:
I would like to do a case insensitive search in a directory and all it's
subdirectories for duplicate files, and if any are found, delete them.

This really isn't a good idea. In an NT-based system, some files are
meant to be duplicates and deleting them will eventually cause problems.
As a trivial example, there are 40 files named readme.txt on my system.
They're most likely all different, and while deleting all but one of them
probably* wouldn't cause a real problem, it would be a loss of possibly
valuable information. Once you've acquired a list of possible duplicates,
you really need to evaluate each case to determine what should be deleted.


* I did once encounter a program that would not run if its readme file was
missing.
 
Matthias said:
I see several disadvatages with your aproach:
- it is unnecessary timeconsuming, to iterate the whole tree for every
file. If there weren't better aproaches, the checkForDupes could use
the name _and_ extension for a dir /s restricting this for name dupes.
- %%~nA searches only the name part - what about file content/date time?

I stumbled today over a batch from Herbert Kleebauer: dupli.bat which
builds a file with md5 keys, sorting these, this way finding content
dupes. The idea of a file could be extended to include an iso-date&time,
the name part and be sorted after different params to decide what are the
dupes and eventually creating links instead of simply deleting the dupes

http://groups.google.com/[email protected]

This is an old version with a bug in md5.com (it will not find all
duplicat files). Here the new version (for Win9x and W2k):

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

@echo off
:: usage: dupli.bat searchdir
:: e.g. : dupli.bat c:\*.jpg
:: search for duplicate files in searchdir and all subdirs
:: check _5.bat for duplicate files
:: execute _5.bat to delete all duplicate files

echo Bj@jzh`0X-`/PPPPPPa(DE(DM(DO(Dh(Ls(Lu(LX(LeZRR]EEEUYRX2Dx=>ech.com
echo 0DxFP,0Xx.t0P,=XtGsB4o@$?PIyU WwX0GwUY Wv;ovBX2Gv0ExGIuht6>>ech.com
echo ?@xAyJHmH@=a?}VjuN?_LEkS?`w`s_{OCIvJDGEHtc{OCIKGMgELCI?GGg>>ech.com
echo EL?s?WL`LRBcx=k_K?AxVD?fCo?Cd?BLDs0x>>ech.com

echo Bj@jzh`0X-`/PPPPPPa(DE(DM(DO(Dh(Ls(Lu(LX(LeZRR]EEEUYRX2Dx=>md5.com
echo 0DxFP,0Xx.t0P,=XtGsB4o@$?PIyU WwX0GwUY Wv;ovBX2Gv0ExGIuht6>>md5.com
echo lpepy~saxCEbp??@`LZN=fqDt??PBeHcg?Dvj?gpWecfD@epd@e=E}BwE@>>md5.com
echo Co?CEbCEEL}@?wqDDOw=uBIq?C?BgF}?pkjCDrHeffC}mK?sgBBa}ACE~M>>md5.com
echo ?OxA?CIqgCgrI?V_cECCNgNKma?{y?sexCEbr?__`L@KoCBBux{OfCA?FD>>md5.com
echo GQ`eBfeBaOCubIe`eHbeBceCyQBi`A}beh`eCceBmOCAbMa}eH??B?Zpep>>md5.com
echo Nz?MfYvPEJeXHeCcefAVGUbCOve=befDeKEPjOH{z?FxYv1CJefdeKEPfA>>md5.com
echo H]UbePveCsefOVGEbCOHep?gjbvnzRefPSCEJErefDfGEPErePQjKKJefd>>md5.com
echo eKEPfOv1EJePefCSGErCBefDefESzuNg?Bu?ef?S=CB_?CEBef?S\CB_?B>>md5.com
echo uAIf?cBuJEe??Of?R1EBePefCSfEHQEHeXBeOceCmRU}@M??B?eCrPOEBM>>md5.com
echo ?B}nN?zF~HAuu`eh@eCCBaE?E@e@eBqOCy`IE@e@eBeOUE@?`eBceC}QiE>>md5.com
echo @?@eBCBYE?A`ex@eCCBmE?E@e@BB]oLjHy[}n~uWyYOqS@Db@OjHfh}nL~>>md5.com
echo Wy[jqSuDb@O??fDDBA@?EDC?HGF?KJI?NML?JE@?ID??HCN?GBM?FAL?GD>>md5.com
echo K?@MJ?IFC?B?L?KHE??AN?DMF?IBK?NG@?CLE?HAJ?PKF?KFU?FUP?UPK?>>md5.com
echo PKF?HDU?DSM?SMH?MHD?HDS?CSM?VOJ?OJC?JCV?CVO?VOJ?NIE?IET?ET>>md5.com
echo N?TNI?NIE?cwTcUVi\gFv}_oZFMmc{n@=mt{NsFEikERFP@goG=ETu?WWj>>md5.com
echo vnhxpJCh~~Z=[V}]PaHARjOe=WpxxBMUG`eAaHsUu]do?r?XYP?Vie]`hu>>md5.com
echo FznO\@SRUF@AC`W`ezRzG~Lef~U``rBvFoTLFqSlsNDDYDhbhnnbwzAX{N>>md5.com
echo IfndLiK`yxApu@~zaFpHl\`XdwKoiC=vhc}iJ]N1zJ_d{oufE}~ygZ}H`f>>md5.com
echo yboDiJDSnNCG\GSOx{XdXnweZ~1a{HkUddaCCFVshkBi~RSbfa_xjaB{R}>>md5.com
echo dZXTKKQMs=NuP~n~DC\hg}Nde_n=S}kKb@B`GP`A}AMXtvRL=yqbVQz}RP>>md5.com
echo iwRjE}0x>>md5.com

echo Bj@jzh`0X-`/PPPPPPa(DE(DM(DO(Dh(Ls(Lu(LX(LeZRR]EEEUYRX2Dx=>edl.com
echo 0DxFP,0Xx.t0P,=XtGsB4o@$?PIyU WwX0GwUY Wv;ovBX2Gv0ExGIuht6>>edl.com
echo ?@}I{uNWEF~NPCkaEFAKLCmaIj@KguHaEFCKYCmavh@{HM?cCiuGGwHmYz>>edl.com
echo CgisCGH`LbuuGNO@hRgco{W?dOGg@N?]gBgoG}G?X_SgONks?GN`LBgDu}>>edl.com
echo G?I_DgGNoG?w@jgLiuuroD@?FHoGpBBDcB?1?pIoCRaICSbICn}ExvHmE?>>edl.com
echo coF?DO~yanxCqap?@?lpZrH~sa`LyNHKqDGwQVTNG`CiECICtdL{D?{esL>>edl.com
echo ysICu_{OuD@sCREGHt~F@lgNHYq`EE{S~{Hq_gC{Lr@CE{HQ}@ExuCNQmB>>edl.com
echo BwjFCs?osqs?}n`LKLj?o{}HwJvClpCSEGt~~1}HGGHCSaCU}GiuJaxLCS>>edl.com
echo c}BWuNC_FE{sCkEGFAPqCmEGNAcQNJwLECuQsa{Oe~CK~CkqCmeGmEFbCN>>edl.com
echo C?kEFbBaCGH1jnjBrz?JAcqo~O~?lJgvxs~CspajF{oFEBHijnjBrz?JAc>>edl.com
echo vx~O~?QJLqos~CspFjN{xFEByijnj@ComJcIpCSAijZNUmJaujC{U]JaJB>>edl.com
echo CcClmCJ\jbCS]GFrj~CkEGjBSookVBA_@NJBHmClnEj1JYjxCoIBrh{BFC>>edl.com
echo HtdCWECaBsCC@ZgB@WgB}fj~BsMV@NgB~chvsb{Os{AR{msDUsycsk{SK{>>edl.com
echo VQ{ZsH\sQdsq{Sj{cAICNWl{~B1CNW_K~BxVkSfCA?Cb@N}W@{=sIfjBH}>>edl.com
echo G}N}NK}NNguM@[umCqBJqD@mzDCCClmCJFuhClmC{@jJSN?`CWEG{Cs@Pt>>edl.com
echo cc?AyAFZp{CkEGjBEpEFDNCCkq=jBktx{S[zDgsjCKtl{S]zDgjjCKtc{S>>edl.com
echo _zDgajCKtG{SazDgXjCKtL{SczDgOjCKtR{SezDgFjCKtX{SYzMgFICG?K>>edl.com
echo gF@FIE?EgF}ZhziEuRN~CK}~DqgLoqo?t_ogIKEh?{JU=fCguGiuz_FrCC>>edl.com
echo sCyOjEEsjwr~EvPK~CSqCt~FS}Ha}HCGxCUqERNG]CRQa_BfsCoaoy?h@x>>edl.com
echo CGJH?w``LRaDBBobc?q?a_q?C_0x>>edl.com

ech >_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo.$3e$3e_5.bat$0d$0a
ech >>_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo :: %%b%% $3e$3e_5.bat$0d$0a
echo >>_2.bat set b=
ech >>_2.bat if (%%a%%)==(%%1) echo del %%2 $3e$3e_5.bat$0d$0a
echo >>_2.bat if not (%%a%%)==(%%1) set b=%%2
echo >>_2.bat if not (%%a%%)==(%%1) set a=%%1

if exist _.2 del _.2
echo generating file list
dir /l/b/s/a-d %1 >_.1
edl "" "ech $240d$22@#0$22 $0d$0amd5 <$22@#0$22$3e$3e_.2$0d$0aecho $22@#0$22$3e$3e_.2"<_.1>_3.bat
echo generating checksum
call _3.bat
echo.
echo sorting
sort <_.2 | edl "" "call _2.bat @#0">_4.bat
set a=
set b=
if exist _5.bat del _5.bat
echo find duplicate files
call _4.bat
echo check _5.bat for duplicate files in %1
for %%i in (ech edl md5) do del %%i.com
for %%i in (1 2) do if exist _.%%i del _.%%i
for %%i in (1 2 3 4) do if exist _%%i.bat del _%%i.bat

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: usage: edl "string1" "string2" <infile >outfile
:: replaces any non empty line in infile by string2
:: (a line is non empty if it contains at least one
:: character greater 0x20) and writes it to outfile.
::
:: Any character in string1 separates words
::
:: string2 can contain:
:: $00-$ff : hexbytes
:: $:abcd : input line [ab:cd] ab,cd hex values
:: $#0 : complete input line
:: $#n (n=1..8) : n. word in input line
:: $#9 : 9. word (or last word if there are
:: more than 9 words) in input line
:: $l : line till first separator char
:: $L : line till last separator char
:: $r : line after first separator char
:: $R : line after last separator char
:: $+ : increment number before $+
:: $- : decrement number before $-
:: $tY : year (upper 2 digits)
:: $ty : year (lower 2 digits)
:: $tm : month
:: $td : day
:: $tH : hour
:: $tM : minute
:: $tS : second
::
:: instead of $ you can also use @, then any % is doubled
 
Herbert,

This is very nice. Could you explain what is going on in the file please.
What are those strings of characters? Are they random or intentional? What
is ech.com, md5.com and edl.com?

Thanks,
Pete

Herbert Kleebauer said:
Matthias said:
I see several disadvatages with your aproach:
- it is unnecessary timeconsuming, to iterate the whole tree for every
file. If there weren't better aproaches, the checkForDupes could use
the name _and_ extension for a dir /s restricting this for name dupes.
- %%~nA searches only the name part - what about file content/date time?

I stumbled today over a batch from Herbert Kleebauer: dupli.bat which
builds a file with md5 keys, sorting these, this way finding content
dupes. The idea of a file could be extended to include an iso-date&time,
the name part and be sorted after different params to decide what are the
dupes and eventually creating links instead of simply deleting the dupes

http://groups.google.com/[email protected]

This is an old version with a bug in md5.com (it will not find all
duplicat files). Here the new version (for Win9x and W2k):

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

@echo off
:: usage: dupli.bat searchdir
:: e.g. : dupli.bat c:\*.jpg
:: search for duplicate files in searchdir and all subdirs
:: check _5.bat for duplicate files
:: execute _5.bat to delete all duplicate files

echo Bj@jzh`0X-`/PPPPPPa(DE(DM(DO(Dh(Ls(Lu(LX(LeZRR]EEEUYRX2Dx=>ech.com
echo 0DxFP,0Xx.t0P,=XtGsB4o@$?PIyU WwX0GwUY Wv;ovBX2Gv0ExGIuht6>>ech.com
echo ?@xAyJHmH@=a?}VjuN?_LEkS?`w`s_{OCIvJDGEHtc{OCIKGMgELCI?GGg>>ech.com
echo EL?s?WL`LRBcx=k_K?AxVD?fCo?Cd?BLDs0x>>ech.com

echo Bj@jzh`0X-`/PPPPPPa(DE(DM(DO(Dh(Ls(Lu(LX(LeZRR]EEEUYRX2Dx=>md5.com
echo 0DxFP,0Xx.t0P,=XtGsB4o@$?PIyU WwX0GwUY Wv;ovBX2Gv0ExGIuht6>>md5.com
echo lpepy~saxCEbp??@`LZN=fqDt??PBeHcg?Dvj?gpWecfD@epd@e=E}BwE@>>md5.com
echo Co?CEbCEEL}@?wqDDOw=uBIq?C?BgF}?pkjCDrHeffC}mK?sgBBa}ACE~M>>md5.com
echo ?OxA?CIqgCgrI?V_cECCNgNKma?{y?sexCEbr?__`L@KoCBBux{OfCA?FD>>md5.com
echo GQ`eBfeBaOCubIe`eHbeBceCyQBi`A}beh`eCceBmOCAbMa}eH??B?Zpep>>md5.com
echo Nz?MfYvPEJeXHeCcefAVGUbCOve=befDeKEPjOH{z?FxYv1CJefdeKEPfA>>md5.com
echo H]UbePveCsefOVGEbCOHep?gjbvnzRefPSCEJErefDfGEPErePQjKKJefd>>md5.com
echo eKEPfOv1EJePefCSGErCBefDefESzuNg?Bu?ef?S=CB_?CEBef?S\CB_?B>>md5.com
echo uAIf?cBuJEe??Of?R1EBePefCSfEHQEHeXBeOceCmRU}@M??B?eCrPOEBM>>md5.com
echo ?B}nN?zF~HAuu`eh@eCCBaE?E@e@eBqOCy`IE@e@eBeOUE@?`eBceC}QiE>>md5.com
echo @?@eBCBYE?A`ex@eCCBmE?E@e@BB]oLjHy[}n~uWyYOqS@Db@OjHfh}nL~>>md5.com
echo Wy[jqSuDb@O??fDDBA@?EDC?HGF?KJI?NML?JE@?ID??HCN?GBM?FAL?GD>>md5.com
echo K?@MJ?IFC?B?L?KHE??AN?DMF?IBK?NG@?CLE?HAJ?PKF?KFU?FUP?UPK?>>md5.com
echo PKF?HDU?DSM?SMH?MHD?HDS?CSM?VOJ?OJC?JCV?CVO?VOJ?NIE?IET?ET>>md5.com
echo N?TNI?NIE?cwTcUVi\gFv}_oZFMmc{n@=mt{NsFEikERFP@goG=ETu?WWj>>md5.com
echo vnhxpJCh~~Z=[V}]PaHARjOe=WpxxBMUG`eAaHsUu]do?r?XYP?Vie]`hu>>md5.com
echo FznO\@SRUF@AC`W`ezRzG~Lef~U``rBvFoTLFqSlsNDDYDhbhnnbwzAX{N>>md5.com
echo IfndLiK`yxApu@~zaFpHl\`XdwKoiC=vhc}iJ]N1zJ_d{oufE}~ygZ}H`f>>md5.com
echo yboDiJDSnNCG\GSOx{XdXnweZ~1a{HkUddaCCFVshkBi~RSbfa_xjaB{R}>>md5.com
echo dZXTKKQMs=NuP~n~DC\hg}Nde_n=S}kKb@B`GP`A}AMXtvRL=yqbVQz}RP>>md5.com
echo iwRjE}0x>>md5.com

echo Bj@jzh`0X-`/PPPPPPa(DE(DM(DO(Dh(Ls(Lu(LX(LeZRR]EEEUYRX2Dx=>edl.com
echo 0DxFP,0Xx.t0P,=XtGsB4o@$?PIyU WwX0GwUY Wv;ovBX2Gv0ExGIuht6>>edl.com
echo ?@}I{uNWEF~NPCkaEFAKLCmaIj@KguHaEFCKYCmavh@{HM?cCiuGGwHmYz>>edl.com
echo CgisCGH`LbuuGNO@hRgco{W?dOGg@N?]gBgoG}G?X_SgONks?GN`LBgDu}>>edl.com
echo G?I_DgGNoG?w@jgLiuuroD@?FHoGpBBDcB?1?pIoCRaICSbICn}ExvHmE?>>edl.com
echo coF?DO~yanxCqap?@?lpZrH~sa`LyNHKqDGwQVTNG`CiECICtdL{D?{esL>>edl.com
echo ysICu_{OuD@sCREGHt~F@lgNHYq`EE{S~{Hq_gC{Lr@CE{HQ}@ExuCNQmB>>edl.com
echo BwjFCs?osqs?}n`LKLj?o{}HwJvClpCSEGt~~1}HGGHCSaCU}GiuJaxLCS>>edl.com
echo c}BWuNC_FE{sCkEGFAPqCmEGNAcQNJwLECuQsa{Oe~CK~CkqCmeGmEFbCN>>edl.com
echo C?kEFbBaCGH1jnjBrz?JAcqo~O~?lJgvxs~CspajF{oFEBHijnjBrz?JAc>>edl.com
echo vx~O~?QJLqos~CspFjN{xFEByijnj@ComJcIpCSAijZNUmJaujC{U]JaJB>>edl.com
echo CcClmCJ\jbCS]GFrj~CkEGjBSookVBA_@NJBHmClnEj1JYjxCoIBrh{BFC>>edl.com
echo HtdCWECaBsCC@ZgB@WgB}fj~BsMV@NgB~chvsb{Os{AR{msDUsycsk{SK{>>edl.com
echo VQ{ZsH\sQdsq{Sj{cAICNWl{~B1CNW_K~BxVkSfCA?Cb@N}W@{=sIfjBH}>>edl.com
echo G}N}NK}NNguM@[umCqBJqD@mzDCCClmCJFuhClmC{@jJSN?`CWEG{Cs@Pt>>edl.com
echo cc?AyAFZp{CkEGjBEpEFDNCCkq=jBktx{S[zDgsjCKtl{S]zDgjjCKtc{S>>edl.com
echo _zDgajCKtG{SazDgXjCKtL{SczDgOjCKtR{SezDgFjCKtX{SYzMgFICG?K>>edl.com
echo gF@FIE?EgF}ZhziEuRN~CK}~DqgLoqo?t_ogIKEh?{JU=fCguGiuz_FrCC>>edl.com
echo sCyOjEEsjwr~EvPK~CSqCt~FS}Ha}HCGxCUqERNG]CRQa_BfsCoaoy?h@x>>edl.com
echo CGJH?w``LRaDBBobc?q?a_q?C_0x>>edl.com

ech >_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo.$3e$3e_5.bat$0d$0a
ech >>_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo :: %%b%% $3e$3e_5.bat$0d$0a
echo >>_2.bat set b=
ech >>_2.bat if (%%a%%)==(%%1) echo del %%2 $3e$3e_5.bat$0d$0a
echo >>_2.bat if not (%%a%%)==(%%1) set b=%%2
echo >>_2.bat if not (%%a%%)==(%%1) set a=%%1

if exist _.2 del _.2
echo generating file list
dir /l/b/s/a-d %1 >_.1
edl "" "ech $240d$22@#0$22 $0d$0amd5
echo generating checksum
call _3.bat
echo.
echo sorting
sort <_.2 | edl "" "call _2.bat @#0">_4.bat
set a=
set b=
if exist _5.bat del _5.bat
echo find duplicate files
call _4.bat
echo check _5.bat for duplicate files in %1
for %%i in (ech edl md5) do del %%i.com
for %%i in (1 2) do if exist _.%%i del _.%%i
for %%i in (1 2 3 4) do if exist _%%i.bat del _%%i.bat

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: usage: edl "string1" "string2" <infile >outfile
:: replaces any non empty line in infile by string2
:: (a line is non empty if it contains at least one
:: character greater 0x20) and writes it to outfile.
::
:: Any character in string1 separates words
::
:: string2 can contain:
:: $00-$ff : hexbytes
:: $:abcd : input line [ab:cd] ab,cd hex values
:: $#0 : complete input line
:: $#n (n=1..8) : n. word in input line
:: $#9 : 9. word (or last word if there are
:: more than 9 words) in input line
:: $l : line till first separator char
:: $L : line till last separator char
:: $r : line after first separator char
:: $R : line after last separator char
:: $+ : increment number before $+
:: $- : decrement number before $-
:: $tY : year (upper 2 digits)
:: $ty : year (lower 2 digits)
:: $tm : month
:: $td : day
:: $tH : hour
:: $tM : minute
:: $tS : second
::
:: instead of $ you can also use @, then any % is doubled
 
destatnj2 said:
This is very nice. Could you explain what is going on in the file please.
What are those strings of characters? Are they random or intentional? What
is ech.com, md5.com and edl.com?

ech.com, md5.com and edl.com are DOS com programs which are included
in the batch file and which are written at run time to the disk using
echo commands. ech.com is similar to the echo command, but doesn't
append a <CR><LF> (in Win2k you can use set /p instead) and allows
to include binary characters by using it's hex value (e.g. $0s$0a for
<CR><LF>). md5.com calculates the md5 checksum for the data read
from stdin. The checksum is written to stdout. edl.com is a small
list processing utility which allows some of the "for /f"
functionality also in Win9x.


This lines generate the file _2.bat with the content:

if (%a%)==(%1) if not (%b%)==() echo.>>_5.bat
if (%a%)==(%1) if not (%b%)==() echo :: %b% >>_5.bat
set b=
if (%a%)==(%1) echo del %2 >>_5.bat
if not (%a%)==(%1) set b=%2
if not (%a%)==(%1) set a=%1


_2.bat is later called with parameters like this:

call _2.bat 795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
call _2.bat a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"

If the check sum (the first parameter %1) of successive calls
is the same, it appends the file name (the second parameter %2)
to the file _5.bat. The first file name of the identical files
(same md5 checksum) is prefixed with a ":: ", the others are prefixed
wit a "del "

_5.bat:
:: "c:\klee\tmp1\a\a"
del "c:\klee\tmp1\a\b"
del "c:\klee\tmp1\x"



This generates the list of files which are tested for
the same contents.

_.1:
c:\klee\tmp1\x
c:\klee\tmp1\_3.bat
c:\klee\tmp1\edl.com
c:\klee\tmp1\a\a
c:\klee\tmp1\a\b


This edl command (one long line, watch for line wraps in your news reader)
generates for each file name in _.1 three lines in _3.bat. The first
line echoes the filename to the screen, the second calculates the md5
check sum of the file content and appends it to _.2 (without a <CR><LF>)
and the third line writes the filename behind the checksum.

_3.bat
ech $0d"c:\klee\tmp1\x"
md5 <"c:\klee\tmp1\x">>_.2
echo "c:\klee\tmp1\x">>_.2
ech $0d"c:\klee\tmp1\_3.bat"
md5 <"c:\klee\tmp1\_3.bat">>_.2
echo "c:\klee\tmp1\_3.bat">>_.2
ech $0d"c:\klee\tmp1\edl.com"
md5 <"c:\klee\tmp1\edl.com">>_.2
echo "c:\klee\tmp1\edl.com">>_.2
ech $0d"c:\klee\tmp1\a\a"
md5 <"c:\klee\tmp1\a\a">>_.2
echo "c:\klee\tmp1\a\a">>_.2
ech $0d"c:\klee\tmp1\a\b"

when 3.bat is executed it generates _.2:

_.2:
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"


_.2 is sorted (so identical files are listed successive) and
prefixed with "call _2.bat "

_4.bat:
call _2.bat 795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
call _2.bat a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"


This executes _2.bat for each file. All duplicated files are
written to _5.bat, the first prefixed with a ":: ", the other
prefixed with a "del " (explained above).

This is a clean up of the temparary files


Here the description of edl.com:
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: usage: edl "string1" "string2" <infile >outfile
:: replaces any non empty line in infile by string2
:: (a line is non empty if it contains at least one
:: character greater 0x20) and writes it to outfile.
::
:: Any character in string1 separates words
::
:: string2 can contain:
:: $00-$ff : hexbytes
:: $:abcd : input line [ab:cd] ab,cd hex values
:: $#0 : complete input line
:: $#n (n=1..8) : n. word in input line
:: $#9 : 9. word (or last word if there are
:: more than 9 words) in input line
:: $l : line till first separator char
:: $L : line till last separator char
:: $r : line after first separator char
:: $R : line after last separator char
:: $+ : increment number before $+
:: $- : decrement number before $-
:: $tY : year (upper 2 digits)
:: $ty : year (lower 2 digits)
:: $tm : month
:: $td : day
:: $tH : hour
:: $tM : minute
:: $tS : second
::
:: instead of $ you can also use @, then any % is doubled
 
Check out the free disk catalogger "Cathy" from a
programmer in Slovakia. All he asks for in payment is a
post card to his daughter for her collection. It is a
very simple, single executable, gui program that catalogs
floppys, hard drrives, CDs. It allows you to see all of
the files on a disk even if the disk is no longer
connected to the system. I use its very good optional
ability to display duplicates to find and see the
size/date differences before deleting.
http://rvas.webzdarma.cz or search words homepage Vasicek.
-----Original Message-----
Hi All,

I would like to do a case insensitive search in a directory and all it's
subdirectories for duplicate files, and if any are found, delete them. I
have written this:
[snip]

There is a very nice GUI program for this: DoubleKiller by Jan Schlüter
of "Big Bang enterprises" (sic)
<http://www.bigbangenterprises.de/en/doublekiller/>. Works for me.
 
You may also want to try "Duplicate Files Deleter". You can manage any duplicate files without worries and it's safe to use. It's the best way to manage unnecessary duplicate files.
 
Back
Top