finding duplicated files

  • Thread starter Thread starter Jean Pierre Daviau
  • Start date Start date


It's a minefield to find something command line driven and free - I was
looking recently and I ended up writing a batch script that relies
on sed and Fsum. (See below)

OTOH this tool is free and does the job swiftly but is GUI driven:

http://www.EasyDuplicateFinder.com


Here's my script - it writes a temporary batch file that contains all the
duplicates and rems out the delete command for the first duplicate in each set.

It might be clumsy, and probably doesn't need to test files using 6 hashing
algorithms - and it's slow - but for a small number of files it works fine.

It only compares files with the same filesize so in that respect it is
efficient, and can handle subdirectories.

Change this line to remove excess hashing algorithms "-crc32 -rmd -md5 -sha1 -sha512 -tiger"

If you examine the set of temp files it might be clearer as to what it does: %temp%.\delsametemp.txt?



@echo off

if "%~1"=="" (
echo Purpose: Deletes identical files using multiple checksums from FSUM
echo. Builds !delsame.bat for perusal...
echo. The first file in each set of identical files is marked REMed with a :
echo.
echo Syntax: %0 [filespec.ext] [/s]
echo.
echo. If /s is specified it will recurse through the subdirectory branch.
echo.
pause
goto :EOF
)

:: faster with delayed expansion

echo.Gathering file information...

set "file=%temp%.\delsametemp.txt"
:: goto :next

del "%file%*" 2>nul

chcp 1252
dir /a:-d %1 %2 |sed -e "s/!/|/g" -e "/^ .*/d" -e "/ Volume.*/d" -e "s/Directory of/Directory of : /" >"%file%0"

echo.Creating file list...

setlocal enabledelayedexpansion
for /f "tokens=1,2,3,*" %%a in ('type "%file%0"') do (
if "%%a"=="Directory" (
set "folder=%%d"
) else (
set num= %%c
set num=!num:~-14!
for /f "delims=" %%z in ("!folder!\%%d") do >>"%file%1" echo !num! ?%%z
)
)
endlocal
sed -e "s/|/!/g" -e "/System Volume Information/Id" -e "/recycler/Id" "%file%1"|sort>"%file%2"

echo.Finding duplicate filesizes and creating checksum list...
set num=1
set t1=
set t2=
set preva=
set prevb=
set same=

for /f "tokens=1,2 delims=?" %%a in ('type "%file%2"') do call :sub "%%a" "%%b"
if defined same call :sub2 "%preva%" "%prevb%"
goto :continue

:: start subroutine
:sub
set "t1=%~1"

:: if previous file is NOT the same as the current file,
:: but was the same as the one before (last one in a set) then write details to the file

if not "%t1%"=="%t2%" if defined same call :sub2 "%preva%" "%prevb%" & set same=

::

if "%t1%"=="%t2%" set same=1& call :sub2 "%preva%" "%prevb%"

set t2=%t1%
set "preva=%~1"
set "prevb=%~2"
goto :eof
:: end subroutine
:: start second routine
:sub2
pushd "%~dp2"
echo "%~2"
set fs=%~1
set fs=%fs:,=%
set fs=%fs: =%for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul
popd)
goto :eof
:: end second routine
:continue

if not exist "%file%4" echo no duplicates found&pause&del "%file%?"&goto :EOF

echo.Parsing checksum list...
sort<"%file%4" >"%file%5"

set t1=
set t2=
set preva=
set prevb=
set same=

set num=1
for /f "tokens=1,* delims= " %%a in ('type "%file%5"') do call :sub3 "%%a" %%b
if defined same >>"%file%6" echo. %preva% "%prevb%"
goto :continue2

:: start subroutine3
:sub3
set "t1=%~1"

:: if previous file is NOT the same as the current file,
:: but was the same as the one before (last one in a set) then write details to the file

if not "%t1%"=="%t2%" if defined same >>"%file%6" echo. %preva% "%prevb%"& set same=

::

if "%t1%"=="%t2%" set same=1& >>"%file%6" echo. %preva% "%prevb%"

set t2=%t1%
set "preva=%~1"
set "prevb=%~2"
goto :eof
:: end subroutine
:continue2

echo.Writing duplicates to a batch file "!delsame.bat"
echo.@echo off>"%file%7"
:: echo.chcp 850 >>"%file%7"
echo.chcp 1252 >>"%file%7"
set prev=0
for /f "tokens=1,*" %%a in ('type "%file%6"') do call :sub4 %%a %%b
echo.>>"%file%7"
echo.echo Done!>>"%file%7"
echo.pause>>"%file%7"

move /y "%file%7" !delsame.bat
: del "%file%?"

echo Done!
pause
goto :EOF

:sub4
if %1 EQU %prev% (>>"%file%7" echo del %2) else (>>"%file%7" echo.&>>"%file%7" echo : del %2)
set prev=%1
goto :EOF
 
Something does not work.

duplicated.cmd is at the root.

I have changed this line:
set "file=M:\tmp\delsametemp.txt" //I must not it was not working on the
C drive either with the line not changed.


----------------------
Does not work
-----------!delsame.bat-----------
@echo off
chcp 1252

: del "\Grand Classiques D'Edgard Encore Plu1"
del "\Grand Classiques D'Edgard Encore Plu2"
del "\Grand Classiques D'Edgard Encore Plu3"
del "\Grand Classiques D'Edgard Encore Plu4"
del "\Grand Classiques D'Edgard Encore Plu5"
del "\Grand Classiques D'Edgard Encore Plu6"

: del 148
del 148

: del 866
del 866
......
---------------------------------------------
Impossible de trouver M:\leloup2009
Impossible de trouver M:\Liebert
Cant find the file
Done!
===============

JP
 
Lets talk about chineese language:

for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2"
2^>nul') do >>"%file%4" set /p ="-%%x"<nul

what is fsum? Another Linux program?

I have to download crc32 -md5 -sha1 -sha512
 
foxidrive said:
OK


set fs=%fs:,=% replace , with nothing
set fs=%fs: =% replace space with nothing

for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger
"%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul

fsum all these and echo the second variable to nul? "%~2" 2^>nul'

do >>"%file%4" set /p ="-%%x"<nul

set /p= - variable x ----------------why is the nul trown in x?
 
set fs=%fs:,=% replace , with nothing
set fs=%fs: =% replace space with nothing

Try it and see what it does.

Check the filesize.

for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger
"%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul


-crc32 -rmd -md5 -sha1 -sha512 -tiger <--- those are all different hashing
algorithms. You probably don't need all of them and it'll speed up the
process if you remove some. Better get it working first though.
fsum all these and echo the second variable to nul? "%~2" 2^>nul'

2>nul redirects the STDERR stream to nul, not the %2
do >>"%file%4" set /p ="-%%x"<nul

set /p= - variable x ----------------why is the nul trown in x?

Try it and see what happens.

Then execute this and see what happens.
 
=============================
Try it and see what it does.


Check the filesize.

408 bytes???
Should it not be 3 bytes?
=================================
Try it and see what happens.


Then execute this and see what happens.

abc-123-456
============================
 
Gathering file information...
Creating file list...
Finding duplicate filesizes and creating checksum list...
"\a.mp3"
"\Saukare.mp3"
"\zazou.mp3"
Parsing checksum list...
Writing duplicates to a batch file "!delsame.bat"
1 file(s) moved(s).
Done!

=========file 0==========

Répertoire de M:\MediaPlayer\tmp\a

2008-05-27 19:19 3 429 027 Saukare.mp3
2008-05-27 19:19 3 429 027 zazou.mp3
2008-05-27 19:19 3 429 027 a.mp3
------------------file 1=================

?dummy
dans ?\le lecteur M s'appelle MUSIQUE
de ?\série du volume est 4836-7181
iaPlayer\tmp\a ?\
3 429 027 ?\Saukare.mp3
3 429 027 ?\zazou.mp3
3 429 027 ?\a.mp3
---------------------------
It does not take the blank space?some where in theese lines?

setlocal enabledelayedexpansion
for /f "tokens=1,2,3,*" %%a in ('type "%file%0"') do (
if "%%a"=="Directory" (
set "folder=%%d"
) else (
set num= %%c
set num=!num:~-14!
for /f "delims=" %%z in ("!folder!\%%d") do >>"%file%1" echo !num! ?%%z
)
)
endlocal
 
Gathering file information...
Creating file list...
Finding duplicate filesizes and creating checksum list...
"\a.mp3"
"\Saukare.mp3"
"\zazou.mp3"
Parsing checksum list...
Writing duplicates to a batch file "!delsame.bat"
1 file(s) moved(s).
Done!

=========file 0==========

Répertoire de M:\MediaPlayer\tmp\a

Try it on your English Vista.
 
I changed theese lines and it works
del %2 /S & echo.>>out.txt for a list of deleted files
in
if %1 EQU %prev% (>>"%file%7" echo del %2 /S & echo.>>out.txt) else
(>>"%file%7" echo.&>>"%file%7" echo : del %2)

and dir /-c
in
dir /-c /a:-d %1 %2 |sed -e "s/!/|/g" -e "/^ .*/d" -e "/
Volume.*/d" -e "s/Directory of/Directory of : /" >"%file%0"

the codepage are only cosmetic because I write all my files in english on my
french Vista. An old habit.

---------
set "t1=%~1"
why not set "t1=%1" ?

Was it my last question? ;-)

Thanks to you.
 
Back
Top