Maddening Dilemma - Compare each cell within column a to each cell in column b and find unique match

P

pogster

Balan,

Your macro does in fact work, but you are correct, it is very time
consuming (and resource consuming) due to its recursive nature.

Thank you again for all of your time and effort, you have done very
well! Especially as some just getting into VBA programming!

Well Done, Bravo!

You do not need to worry about this particular problem anymore, do not
let it distract you.
I will work with what Pete and Max have provided and figure it out
from there.

Thank you again Balan, you have been invaluable!

-Pogster
 
P

pogster

Pete,

The changes you suggested worked like a charm!

Your macro is fast, efficient, and does exactly what it is supposed
to.

It is amazing, the number of shortfalls Excel 2003 has compared to the
new 2007 version (which allows me to conditionally format by searching
for duplicates)
But your macro has brought 2003 up to speed, at least in this one
case.

Thanks so much for your help, this macro should definitley be posted
somewhere where it can be easily found, it is very useful, and
modifiable to suit other specific needs.

Thank you again for the time you spent in helping me with this issue,
and coding a great solution.
This forum is made great by people like yourself and the others who
posted with replies to my issue

You are all great! Many Thanks!

-pogster
 
P

Pete_UK

Well, thanks for such effusive praise - I'm glad the macro does what
you want it to do.

As regards storing it somewhere, well Google maintains archives of
these newsgroups, so it should be available for anyone who searches
for it.

If after thorough testing you are convinced that it does the job, then
you might like to apply it directly to your other files, as you said
in one post that you extract this data from a file and that the order
was very important so that you can paste it back to the same file - so
this could save you a bit of time. Add two new lines near the
beginning of the macro to insert two new columns, so that it would
read:

Dim my_top As Long
Dim my_bottom As Long
Application.ScreenUpdating = False
Columns("B:C").Select 'new
Selection.Insert Shift:=xlToRight 'new
Range("B1").Select

and so on ...

Then near the end of the macro change this line:

Columns("B:B").Select

to this:

Columns("B:C").Select

You could now safely apply the macro to your other files, as the
contents of the other columns get shifted out of the way - well, only
if the numbers are in column A of the other files.

Perhaps you could explain what it is you are working with sometime ...

Pete
 
P

pogster

Actually, the numbers are always in column O.

Thanks for the suggestion though.

What i am working with are balance sheets for clients of an insurance
company. I am weeding out all of the unnessecary entries (which
cancel) to track down the entries which actually contribute to the
final account balance. There are hundreds of balance sheets, spanning
an average of 2000 rows apiece. Its a big job to do manually, but this
macro really helps.

The only other variable to consider (one which is probably impossible
to code for), is that some of the numbers which DO NOT contribute to
the balance (and cancel out) are not just single numbres, but made up
of 2 or more other individual entries. A simple example: -100 cancel
with SUM(50+10+40) Except in reality the numbers are not that simple
or easy to locate.

To accomplish this, you would need a smart macro that could detect
numbers that could add together to create other numbers in this 2000
row sheet. That would take alot of code and thought, i think. No need
to bother with it. Its easier to try to do that part by hand. But
cancelling identical cancelling numbers is 90% of the task.

Like i said, hours of tedius head-ache inducing searching, simplified
to two clicks by your great macro. Cant thank you enough.

Hope this answers your question, which is a nice change from you
answering mine =)

Cheers.

-Pogster
 
P

Pete_UK

Yeah, thanks for taking the trouble to post back - it's always
interesting to find out what the real situation is, as often we just
get a snippet.

The other problem you refer to crops up quite often in the groups,
where people want to reconcile payments against invoices. I think
Harlan Grove had a macro to do it, but it is a very number-intensive
exercise (not one that I'm going to attempt).

I dabbled a bit with the macro last night in making the colour cycle
through a range of values, rather than just one colour (green), i.e.
the colour changes each time a pairing is found - would you be
interested in a multi-colour version?

You can't actually do very much with coloured cells - would you like
the macro to put something in column B (eg "Y") to indicate that the
cell has been paired, so that it can help you to eliminate them and
thus concentrate on your other problem?

Pete
 
P

pogster

Hey, sorry for the late post. Been busy with work lately.

As for the color changing, its really not necssary but thanks for the
suggestion.

As for the marker in column B, the trouble with moving the numbers out
of the order i have them in, or extracting unique values is that each
number also has about 11 columns of other information such as acct
numbers and journal ID's and dates and descriptions etc. that go along
with that particular value. So moving it out of the order i get it in
has the possibility of mucking everything up for me.

Its easier just to point out, visually, the values to ignore (or focus
on), and in the process retain the order.

If the marker just marked off the Unique values in column B, i could
of course insert a column next to col O and use it that way somehow,
to sort it, which would actually be quite helpful i think. Maybe if
the macro could look directly at column O, create a column to use for
its data, which it will delete, but also insert a column to the right
of Col O and place the markers there.

Would this be difficult to accomplish?

If its easier, yes, just placing a marker in column B would work just
as well, i can shift columns manually without a problem.

Thanks for the good idea though, as this would allow me to sort by
Unique or cancelled value (since you cannot sort by formatting in
2003)

-Pogster
 
P

Pete_UK

I was thinking that you would do this in your extracted file - in
addition to ending up with numbers in column A which are highlighted
if paired, you would also have a column B which would have some marker
in to indicate these pairings (like "x"). You can still copy column A
back to column O of your original sheet, and copy column B to the next
empty column in your original sheet (which might be column P, but I
didn't know the layout of your data in the original file).

I thought you could then apply autofilter to this new column, along
the lines of:

Custom, Not Equal To, x ...

so that only the rows which have not been paired would then be
visible, so that you can then concentrate on doing what you need to do
with those. Of course, you could sort the data by this new column, to
bunch up the unpaired amounts and then concentrate on them - it's up
to you how you use it.

The thing is - would you find it useful?

OK, I'll ditch the multi-coloured idea.

Pete
 
P

pogster

Hey Pete,

Yes, absolutely, that would probably be useful. Thing is, management
is going to look at what ive done and decide what format they want it
in anyway. So i could group them by paired or unpaired and then they
could just go and undo it with some stupid autofilter, but i believe
it would look better and be easier to digest in such a sorted format.

In short: yes this would be useful !!!

As for the multicolor idea which i shot down, i can see how it would
be useful if say a seperate color was applied to ammount under 50k,
50k - 150k, 150k - 250k, 250 - 500k, 500k - 1M, etc. etc.

This way the colors could represent ranges of values.

Again, i do not need this functionality, but it would be interesting
if you wanted to implement it for others use.

Thanks again for keeping up with this, i hope your code can be put to
good use outside of my particular dilemma, because i think its such
useful functionality.

-Pogster
 
P

Pete_UK

Okay, here's an amended macro in full, along the lines you have
suggested:

Sub Mark_duplicates_a()
'
' 04/10/2007, Pete Ashurst
' amended 17/10/2007
'
Dim my_top As Long
Dim my_bottom As Long
Dim colour As Integer
Application.ScreenUpdating = False
Columns("B:D").Select
Selection.Insert Shift:=xlToRight
Range("C1").Select
ActiveCell.Value = "1"
Range(Selection, Selection.End(xlDown)).Select
Selection.DataSeries Rowcol:=xlColumns, _
Type:=xlLinear, Date:=xlDay, _
Step:=1, Trend:=False
Columns("A:C").Select
Selection.Sort Key1:=Range("A1"), Order1:=xlDescending, _
Header:=xlNo, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
Range("A1").Select
my_top = 1
my_bottom = Cells(Rows.Count, "A").End(xlUp).Row
Do Until my_top >= my_bottom
If Int(Cells(my_top, 1).Value * 10) / 10 =
Int(Abs(Cells(my_bottom, 1).Value) * 10) / 10 Then
Select Case Cells(my_top, 1).Value
Case Is < 50000
colour = 4 'Bright Green
Case Is < 150000
colour = 6 'Yellow
Case Is < 250000
colour = 8 'Turquoise
Case Is < 500000
colour = 39 'Lavendar
Case Else
colour = 15 'Grey
End Select
Range("A" & my_top).Interior.ColorIndex = colour
Cells(my_top, 2).Value = "Y"
Range("A" & my_bottom).Interior.ColorIndex = colour
Cells(my_bottom, 2).Value = "Y"
my_top = my_top + 1
my_bottom = my_bottom - 1
ElseIf Cells(my_top, 1).Value > Abs(Cells(my_bottom, 1).Value)
Then
my_top = my_top + 1
Else
my_bottom = my_bottom - 1
End If
Loop
Columns("A:C").Select
Selection.Sort Key1:=Range("C1"), Order1:=xlAscending, _
Header:=xlNo, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
Columns("C:D").Select
Selection.Delete Shift:=xlToLeft
Range("C1").Select
Application.ScreenUpdating = True
End Sub

This inserts a new column B, so any other data on the sheet will be
moved to the right. Column B will contain "Y" wherever there is a
pairing - you could change this to "paired" or some-such by making the
obvious two changes mid-way in the macro.

The macro also applies colour banding for the ranges you suggested. It
should be fairly obvious how to introduce other ranges in the CASE
part of the macro (just keep the numbers in sequence), and you can
easily change colours if you don't like mine - here's some other
numbers you might like to play about with:

Red - 3, Aqua - 42, Orange - 46, Pink - 7, Tan - 40

Maybe marginally slower, but still less than 3 seconds on my test data
of nearly 2200 values.

Hope this helps.

Pete
 
P

pogster

Pete,

Great mod! The color coding works great, though in my version of the
macro i removed your Select case as i did not need the color coding.
But the column B identifier is also very useful. Thank you for this
valuable addition to your already superb macro!

The speed decrease is completely trivial, as it is barely noticeable.

I will toy around with this macro and possibly add more comments to
the file itself as i figure out its full functionality.

You are so the man.

Thanks pete.

-Pogster
 
P

Pete_UK

Hi Pogster,

another variation - instead of just a "Y" marker in column B, I could
introduce a count that gets incremented each time a pairing is found.
That way the pairs can easily be identified later if there was any
query, and you could still use a Filter on column B (looking for
blanks) to hide the pairs and concentrate on accounting for the
unpaired values.

By the way, here's two links that might help in your second task:

http://www.tushar-mehta.com/excel/templates/match_values/index.html

http://groups.google.com/group/microsoft.public.excel.misc/browse_thr...

Hope this helps.

Pete
 
P

pogster

Hi pete,

The numerical count is a good idea, would it assign a single value to
both numbers in each pair? Or a value to each number with the status
"paired"? Does this distinction make sense to you?

Either way, it is a good idea, i would like to see that code
modification, see if i can apply its functionality.

As for the links to aid in my second maddening dilemma, the first link
provides good explanations and some good solutions, though the second
link you posted is either incomplete or somehow wrong, the page does
not seem to exist.

Thanks for the info and yet another good code suggestion!

-pogster
 
P

Pete_UK

Nearly forgot to post the revised code before going to bed - here it
is in full:

Sub Mark_duplicates_b()
'
' 04/10/2007, Pete Ashurst
' amended 17/10/07
' amended 22/10/07
'
Dim my_top As Long
Dim my_bottom As Long
Dim colour As Integer
Dim my_pair As Integer
Application.ScreenUpdating = False
Columns("B:D").Select
Selection.Insert Shift:=xlToRight
Selection.NumberFormat = "General"
Range("C1").Select
ActiveCell.Value = "1"
Range(Selection, Selection.End(xlDown)).Select
Selection.DataSeries Rowcol:=xlColumns, _
Type:=xlLinear, Date:=xlDay, _
Step:=1, Trend:=False
Columns("A:C").Select
Selection.Sort Key1:=Range("A1"), Order1:=xlDescending, _
Header:=xlNo, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
Range("A1").Select
my_pair = 1
my_top = 1
my_bottom = Cells(Rows.Count, "A").End(xlUp).Row
Do Until my_top >= my_bottom
If Int(Cells(my_top, 1).Value * 10) / 10 =
Int(Abs(Cells(my_bottom, 1).Value) * 10) / 10 Then
Select Case Cells(my_top, 1).Value
Case Is < 50000
colour = 4 'Bright Green
Case Is < 150000
colour = 6 'Yellow
Case Is < 250000
colour = 8 'Turquoise
Case Is < 500000
colour = 39 'Lavendar
Case Else
colour = 15 'Grey
End Select
Range("A" & my_top).Interior.ColorIndex = colour
Cells(my_top, 2).Value = my_pair
Range("A" & my_bottom).Interior.ColorIndex = colour
Cells(my_bottom, 2).Value = my_pair
my_top = my_top + 1
my_bottom = my_bottom - 1
my_pair = my_pair + 1
ElseIf Cells(my_top, 1).Value > Abs(Cells(my_bottom, 1).Value)
Then
my_top = my_top + 1
Else
my_bottom = my_bottom - 1
End If
Loop
Columns("A:C").Select
Selection.Sort Key1:=Range("C1"), Order1:=xlAscending, _
Header:=xlNo, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
Columns("C:D").Select
Selection.Delete Shift:=xlToLeft
Range("C1").Select
Application.ScreenUpdating = True
End Sub

I've left the colour banding in, as per the previous version. One
advantage of this approach is that you can see exactly how many pairs
have been found (523 in my test data of approx 2200 numbers). You
might want to change the line:

Cells(my_bottom, 2).Value = my_pair
to:
Cells(my_bottom, 2).Value = - my_pair

to show these as negative numbers.

By the way, the revised second link works okay for me, even though it
doesn't appear in full in the post and seems exactly the same as I had
posted previously - strange !!

Hope this helps.

Pete
 
P

pogster

Pete,

I believe, after autofiltering the numbers you generated, i realized
the macro finds the largest values first. The color banding gave that
one away, and i really did not notice that before. Very nice touch
with marking negative pairs with a negative numbered value. This
modification could be mighty useful for data analysis later on, thanks
so much for the code!

As for the link, which does in fact work now, i read into it and it
seems that the process is mighty complicated, and rather impractical
given the size of my data. With nearly 2200 entries being my average
data size, a FindSum program like Harlons would take a billion years
to complete on my computer...literally. And even if it could complete
in 2 seconds, it would generate billions of possible results, which i
could never sift through to find my desired answer. Unfortunatley, i
do not think i will be investigating those possibilities any
further.

But i thank you for referring me to those great sources!

Again, you have thought of a positive addition to an already great
macro. Thank you again for all of your effort, and for sticking with
this thread and with me throughout this learning process!!

You have taught me much ;)

I really cant think of any ways in which this macro could get any
better, at least for my application. I think you are officially off
the hook pete. But if you think of anything else, feel free to post
here, ill check back if i see updates!

Thanks pete.

-Pogster
 
P

Pete_UK

I think one of the main differences with your application, though, is
that you have both positive and negative numbers, and that you want to
match positive with negative. Presumably you could have one positive
number which matched with two, three or more negatives (or is it the
other way round?). Obviously after the first scan in the current macro
to get a one-to-one pairing, there are fewer numbers left, and a
similar scan could then try to get a one-to-two matching, then a one-
to-three matching etc. Once a number has been matched, then it
wouldn't need to be considered in later scans, thus speeding up the
process further.

You have the experience of doing this manually - do you have many one-
to-ten or one-to-twenty pairings? Do you have to account for every
number in the list?

Perhaps if you could put up another set of test data which shows how
you have matched the numbers, I might have a go at revising the macro
further (sometime).

Thanks for your good wishes.

Pete
 
P

pogster

Wow Pete,

Inredibly sorry for taking the better part of a month in getting back
to you here.

To answer your first question, the numbers do not have muliple
matches, i.e. -12 matches with +12 once, and they are counted, end of
story. That is how your macro works and that is the correct method.
But there can be multiple matches in the sense that there are more
than one -12 cancelling with more than one +12, and obv. if there is
an odd number of either, one will be uncancelled in the end, or a
numerous ammount of equal value and sign, which also remain
uncancelled.

Every number in the list is accounted for, but only in the sense of a
1-to-1 pairing, or lack thereof.

In continuing to use your very successful macro for most of this past
month, i have noticed that, the order my data is in (by date)
conflicts with the way the macro searches through the data.

For example, one entry of -100 should cancel with a +100 about three
entires down. But there is a +100 at the very bottom of the list. So
the first, and last -/+ 100 cancel out, are highlighted and forgotten
by the macro...whereas that second +100 a few down from the first
value, was the one which actually cancelled with that first value,
while the one at the bottom is simply a new outstanding balance.

Your macro processes one by one top value, compared to bottom, and
advances each variable position closer until they essentially meet in
the middle, am i correct?

Given this pattern, i sometimes have problems with the wrong pairs
being created. Usually not too many so going through by hand and
correcting doesnt take too long. The pairs are not wrong when
speaking strictly numerically, but in regards to real-world scenarios,
the wrong ammounts are being paired.

I have tried resorting my data in a way that would better suit the
macro's processes, but theres really no way i can think of to do it
properly.

To explain the reason for this (and i will attach an expanded dataset
as well) is that some of these journal entires are marked "N" for
manual, "B" for Will reverse at the beginning of next period, and "R"
for this is a reversal entry. Any reversal entries will obviously be
cancelling something out prior to the date at which that reversal is
posted and should be highlighted. The N is simply a manual entry
which could be unique (uncancelled) or a manual reversal as well.
Whereas the B marks an entry which will be automatically reversed at
the beginning of the next month/period. If the B is in this period, it
is highly probable that it remains uncancelled, unless some idiot went
in and reversed it manually...which would only create a problem later
since the system will still autoreverse.

I guess this will be easiest to understand with a dataset so i will
attach one for you to look at.
A Few tips:
The journal ID's usually coincide with B's and their corresponding
R's...unless it is a manual N entry, in which case the Journal ID's
could differ. I thought about this when i was trying to resort my data
to better cater to the macro's process.

I hope this answers your questions and doesnt cause too many
problems. If you do not feel the need to work on this project, that
is totally fine, as you have done so much for me already. I will
simply continue forward manually(highly likely) until i become a VBA
guru (highly unlikely).

If you are still interested in taking a further crack at this problem,
your help would be greatly appreciated!!
I will post a link to a secure website containing the dataset of which
I speak tonight (as i cant upload anything here at work). It will
expire in 7 days, let me know if the link does not work and i will
repost.

Thanks again pete.

-Pogster

P.S. - i swear not to let this forum get too dusty, i will check back
regularly from now on!
 
P

pogster

Pete,

Also, part of it may be that i requested the macro just check till the
TENTHS place (first after the decimal), when in fact, it may be wiser
to check to the hundreth's place, just to be safe.

Can you modify the code to do that? Or how can I?

Link to sample will be up tonight.

Thanks Pete.
 
P

Pete_UK

Thanks for resurrecting the thread and giving some feedback on using
the macro.

As I understand it, you extract just a single column of numbers from a
multi-column file, and the macro works on that column, ensuring it
remains in the same order at the end so that you can paste it back
into the original file.

But, if you also copied the dates associated with each number into
your extracted file, then it would be possible to sort the data not
only by number, but also by date, such that you have the largest
positive number going down to the largest negative number, but also
ensuring that equal (positive) values are listed in increasing date
order and equal negative values are listed in decreasing date order.
That way the macro would find the earliest +100 and match it with the
earliest -100.

However, this might not be exactly what you want, though. Imagine you
have -100 on 1st Nov, +100 on 3rd Nov and -100 on 4th Nov. YOU might
have reasons for pairing the 3rd and 4th November numbers and leaving
the 1st Nov value unpaired, but if these were the only values of 100
then the (date-enhanced) macro would pair the 1st and 3rd Nov values
together.

This being the case, would you prefer to match equal numbers on the
minimum days spread? I'd have to think a bit about an appropriate
algorithm for that !! Or could it be that you always have a +ve number
first, and the -ve match to this always occurs after the +ve? (or vice-
versa). Is it possible to incorporate unpaired values from an earlier
month (or whatever period your file covers), and then begin matching
those?

Of course, another approach would be to have the macro applied
directly to your original file, as I mentioned in an earlier post.
There might be other items of data in there that would help to
identify the appropriate pairings, and it would cut down on you having
to extract the data as you do now. You can easily use File | Save As
to save the file with a different name, so that the original is not
changed, and even this can be automated quite easily within the macro.

If you feel you can trust me, you might like to send a file directly
to me:

pashurst <at> auditel.net

Change the obvious.

Though I prefer to keep discussions going in the newsgroups, so that
everyone can benefit, there are times when sensitive data needs to be
kept more secure.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top