PC Review


Reply
Thread Tools Rate Thread

Avoiding dupes when merging files

 
 
google_groups3@hotmail.com
Guest
Posts: n/a
 
      24th Nov 2004
Hi all.

I currently have 2 text files which contain lists of file names. These
text files are updated by my code. What I want to do is be able to
merge these text files discarding the duplicates.

And to make it harder (or not???!!) my criteria for defining the
duplicate is the left 15 (or so) characters of the file path.
Help, as always, is greatly appreciated!

Thanks

 
Reply With Quote
 
 
 
 
Lucas Tam
Guest
Posts: n/a
 
      24th Nov 2004
(E-Mail Removed) wrote in news:1101328833.131813.52400
@c13g2000cwb.googlegroups.com:

> Hi all.
>
> I currently have 2 text files which contain lists of file names. These
> text files are updated by my code. What I want to do is be able to
> merge these text files discarding the duplicates.
>
> And to make it harder (or not???!!) my criteria for defining the
> duplicate is the left 15 (or so) characters of the file path.
> Help, as always, is greatly appreciated!



Take a look at the Microsoft Text Driver - you can run SQL queries on the
text file. Perhaps you can just query each file checking for dupes?

Or you could load the data into a datatable (or hash table type object?),
with the PK set as the filename... if a duplicate shows up, the datatable
should throw a duplicate PK exception which you would catch and ignore.

Or lastly... perhaps you should think of a different method of storing the
data? Maybe a database is a better idea than text files?

--
Lucas Tam ((E-Mail Removed))
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
 
Reply With Quote
 
Bob Hollness
Guest
Posts: n/a
 
      24th Nov 2004
>
> Take a look at the Microsoft Text Driver - you can run SQL queries on the
> text file. Perhaps you can just query each file checking for dupes?
>
> Or you could load the data into a datatable (or hash table type object?),
> with the PK set as the filename... if a duplicate shows up, the datatable
> should throw a duplicate PK exception which you would catch and ignore.
>
> Or lastly... perhaps you should think of a different method of storing the
> data? Maybe a database is a better idea than text files?
>
> --
> Lucas Tam ((E-Mail Removed))
> Please delete "REMOVE" from the e-mail address when replying.
> http://members.ebay.com/aboutme/coolspot18/


Thanks for the fast reply. I have to use text files so that really is not
an option. Any pointers or some sample code on how to use the datatable? I
like the idea of being able to trap a dupicate OK error.

Bob


 
Reply With Quote
 
Bob Hollness
Guest
Posts: n/a
 
      25th Nov 2004
> Take a look at the Microsoft Text Driver - you can run SQL queries on the
> text file. Perhaps you can just query each file checking for dupes?
>
> Or you could load the data into a datatable (or hash table type object?),
> with the PK set as the filename... if a duplicate shows up, the datatable
> should throw a duplicate PK exception which you would catch and ignore.
>
> Or lastly... perhaps you should think of a different method of storing the
> data? Maybe a database is a better idea than text files?
>


I like the idea of the PK exception as it will give an error that i can
trap. I am being forced to use text files though for simplicity. Do you
have any sample code for implementing a datatable/PK exception as this is
new to me!

Bob


 
Reply With Quote
 
Lucas Tam
Guest
Posts: n/a
 
      25th Nov 2004
"Bob Hollness" <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> I like the idea of the PK exception as it will give an error that i
> can trap. I am being forced to use text files though for simplicity.
> Do you have any sample code for implementing a datatable/PK exception
> as this is new to me!
>


Here's the example from MSDN:

http://msdn.microsoft.com/library/de...l=/library/en-
us/cpref/html/frlrfsystemdatadatatableclassprimarykeytopic.asp

I've used it a couple of times and it works fine.

Here is what you do in short:

1. Add your columns to a datatable.
2. Add the same column from step 2 into a primary key array.
3. Add the primary key array to the DataTable.PrimaryKey property.

--
Lucas Tam ((E-Mail Removed))
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
 
Reply With Quote
 
Bob Hollness
Guest
Posts: n/a
 
      25th Nov 2004
Thanks for this. But I guess i need something a little more basic. Also to
do it in memory or straight to disk. I guess i'll keep playing with the
loops

--
Bob Hollness

-------------------------------------
I'll have a B please Bob
"Lucas Tam" <(E-Mail Removed)> wrote in message
news:Xns95AC7AF869ED8nntprogerscom@140.99.99.130...
> "Bob Hollness" <(E-Mail Removed)> wrote in
> news:(E-Mail Removed):
>
>> I like the idea of the PK exception as it will give an error that i
>> can trap. I am being forced to use text files though for simplicity.
>> Do you have any sample code for implementing a datatable/PK exception
>> as this is new to me!
>>

>
> Here's the example from MSDN:
>
> http://msdn.microsoft.com/library/de...l=/library/en-
> us/cpref/html/frlrfsystemdatadatatableclassprimarykeytopic.asp
>
> I've used it a couple of times and it works fine.
>
> Here is what you do in short:
>
> 1. Add your columns to a datatable.
> 2. Add the same column from step 2 into a primary key array.
> 3. Add the primary key array to the DataTable.PrimaryKey property.
>
> --
> Lucas Tam ((E-Mail Removed))
> Please delete "REMOVE" from the e-mail address when replying.
> http://members.ebay.com/aboutme/coolspot18/



 
Reply With Quote
 
Lucas Tam
Guest
Posts: n/a
 
      25th Nov 2004
"Bob Hollness" <(E-Mail Removed)> wrote in
news:#(E-Mail Removed):

>>
>> Take a look at the Microsoft Text Driver - you can run SQL queries on
>> the text file. Perhaps you can just query each file checking for
>> dupes?
>>
>> Or you could load the data into a datatable (or hash table type
>> object?), with the PK set as the filename... if a duplicate shows up,
>> the datatable should throw a duplicate PK exception which you would
>> catch and ignore.
>>
>> Or lastly... perhaps you should think of a different method of
>> storing the data? Maybe a database is a better idea than text files?
>>
>> --
>> Lucas Tam ((E-Mail Removed))
>> Please delete "REMOVE" from the e-mail address when replying.
>> http://members.ebay.com/aboutme/coolspot18/

>
> Thanks for the fast reply. I have to use text files so that really is
> not an option. Any pointers or some sample code on how to use the
> datatable? I like the idea of being able to trap a dupicate OK error.


I replied to your message a bit early in the day, but I'm not sure if
you received it:

Here's the example from MSDN (particularly the SetPrimaryKeys Sub):

http://msdn.microsoft.com/library/de...l=/library/en-
us/cpref/html/frlrfsystemdatadatatableclassprimarykeytopic.asp

I've used it a couple of times and it works fine.

Here is what you do in short:

1. Add your columns to a datatable.
2. Add the same column from step 2 into a primary key array.
3. Add the primary key array to the DataTable.PrimaryKey property.


--
Lucas Tam ((E-Mail Removed))
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
 
Reply With Quote
 
Bob Hollness
Guest
Posts: n/a
 
      25th Nov 2004
> Hi all.
>
> I currently have 2 text files which contain lists of file names. These
> text files are updated by my code. What I want to do is be able to
> merge these text files discarding the duplicates.
>
> And to make it harder (or not???!!) my criteria for defining the
> duplicate is the left 15 (or so) characters of the file path.
> Help, as always, is greatly appreciated!
>
> Thanks
>


OK. This is the solution I came up with. Not as elegant as one would have
hoped. but then again, only I get to see how it functions under the bonnet
(hood for the Americans) !!! And of course, this is still to be tidied up
and made pretty. Feel free to pull it apart and embarrass me.......


Sub FindDupes(ByVal File2Compare As String, ByVal OriginalFile As
String, ByVal OutputFile As String)

Dim File1Reader As New StreamReader(File2Compare)
Dim File2Reader 'As New StreamReader(OriginalFile)
Dim File3Writer As New StreamWriter(OutputFile)
Dim Line1 As String = ""
Dim Line2 As String = ""
Dim Found As Boolean

Do
Line1 = File1Reader.ReadLine
Found = False

If Not Line1 Is Nothing Then

File2Reader = New StreamReader(OriginalFile)

Do
Line2 = File2Reader.ReadLine()
If Line1 = Line2 Then
Found = True
Exit Do
End If
Loop Until Line2 Is Nothing

If Found = False Then
File3Writer.WriteLine(Line1)
End If

Found = False

File2Reader.Close()

End If
Loop Until Line1 Is Nothing

File1Reader.Close()
File2Reader.Close()
File3Writer.Close()



--
Bob Hollness

-------------------------------------
I'll have a B please Bob


 
Reply With Quote
 
Anon-E-Moose
Guest
Posts: n/a
 
      26th Nov 2004
"Bob Hollness" <(E-Mail Removed)> wrote in news:uUD3YV00EHA.1392
@TK2MSFTNGP14.phx.gbl:

> Feel free to pull it apart and embarrass me.......




Very inefficent when compared to Cor's elegant example of a hash table!

 
Reply With Quote
 
Larry Serflaten
Guest
Posts: n/a
 
      26th Nov 2004

"Bob Hollness" <(E-Mail Removed)> wrote
> >
> > I currently have 2 text files which contain lists of file names. These
> > text files are updated by my code. What I want to do is be able to
> > merge these text files discarding the duplicates.
> > And to make it harder (or not???!!) my criteria for defining the
> > duplicate is the left 15 (or so) characters of the file path.
> > Help, as always, is greatly appreciated!

>
> OK. This is the solution I came up with. Not as elegant as one would have
> hoped. but then again, only I get to see how it functions under the bonnet
> (hood for the Americans) !!! And of course, this is still to be tidied up
> and made pretty. Feel free to pull it apart and embarrass me.......


As Cor suggested use a Hashtable, (or you might call it a Dictionary) it will
be much more efficient, and easier to code....

Paste the following in to a routine to see it in action:

HTH
LFS


Dim item As String
Dim hash As New System.Collections.Hashtable
Dim file1 As String() = New String() { _
"Pretend this is text from a file.", _
"It is contained in an array only for", _
"demo purposes."}
Dim file2 As String() = New String() { _
"This is the text from a second file.", _
"The next line is a duplicate line and", _
"will overwrite the original entry:", _
"It is contained (DUPLICATE)", _
"Only the first 10 characters", _
"were used toward duplicate testing."}

For Each item In file1
hash.Item(item.Substring(0, 10)) = item
Next

For Each item In file2
hash.Item(item.Substring(0, 10)) = item
Next

Dim entry As System.Collections.DictionaryEntry
For Each entry In hash
Debug.WriteLine(entry.Value)
Next

Debug.WriteLine("")
Debug.WriteLine("Note that the order is not maintained, and")
Debug.WriteLine("the duplicate line's original value was")
Debug.WriteLine("overwritten by the later (duplicate) entry.")

 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to check for dupes on 2 files, and ? about csv file ext PeterM Microsoft Excel Misc 2 4th Jun 2005 10:51 PM
Avoiding files in My Documents =?Utf-8?B?VmljZW50ZSBaYW1icmFubw==?= Windows XP General 2 20th May 2005 12:40 AM
Multiple personal folders and archives / merging / cleaning dupes. =?Utf-8?B?Q2hhcmxlcyBUb21hcmFz?= Microsoft Outlook Installation 2 18th Mar 2005 05:53 PM
importing .pst files , what about dupes?? bob Microsoft Outlook Discussion 0 28th Nov 2004 03:19 AM
Merging two spreadsheets, avoiding duplicates? George K Microsoft Excel Discussion 14 22nd Mar 2004 05:42 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 10:07 PM.