Our users are allowed to split up a bill and assign percents to other
customers.
Our users can specify the start and end dates (inclusive) that a
percentage split is active. On any given day, the percents must add up
to 100%.
Here is an example of a valid percentage split, assuming it's
February, 2009:
1st to the 2nd - 50%
1st to the 15th - 50%
3rd to the 15th - 50%
16th to the 30th - 100%
Optionally, our users can leave the end date blank, which means never
ending.
I tried to solve this problem by breaking it into date ranges
(windows): 1 - 2, 3 - 15, 16 - 30. However, I have not been having
much luck in finding a correct algorithm.
If you only have to deal with a single month at a time, and you only ever
have a relatively small number of ranges to deal with, I think that
"rossum"'s suggestion may well be the best approach. Even though it's
essentially a brute-force algorithm, IMHO brute-force can in fact
sometimes be the most elegant approach, because of its simplicity.
An alternative would in fact be to analyze the date ranges themselves, as
you seem to have tried. You can enumerate the ranges, creating new ranges
as you go based on the existing ones and any non-identity overlap. As you
find overlap, maintain a list of all the percentages that apply to that
overlap range. As you do this maintenance, check the total for a given
range and see if it exceeds 100%.
In very vague pseudo-code:
for each rangeIn in the input
{
for each rangeCur in the output
{
if rangeIn overlaps rangeOut non-identically
{
replace rangeOut in the output with
up to three new ranges representing the
overlap and the parts of the originals not
included in the overlap, preserving the original
ranges percentage total for the non-overlap part(s)
and adding the two ranges percentage total for the
overlap part
}
else if rangeIn is identical to rangeOut
{
add the rangeIn percentage to the rangeOut's percentage,
storing it back into rangeOut
}
else
{
add rangeIn to the output
}
}
}
At each point where you sum the percentages, you can check the result to
make sure it's <= 100 and report an error immediately. Alternatively, you
could just go do all the processing, and then report all the ranges that
exceed 100 at the end. If you maintain the output range collection in
sorted order (which is not expensive, assuming the input is sorted too),
then you can minimize the cost of enumerating the output range collection,
by terminating your enumeration once the "rangeCur" start date is after
the "rangeIn" end date. If the collection of ranges could be especially
large (thousands or tens of thousands, for example), it could even be
beneficial to maintain it as an indexed list you can binary search.
This latter approach is, as you can see, much more complicated than just
doing a brute-force approach. I would think that the input data should be
small enough that the brute-force approach should work fine. The only
slightly complicating factor is dealing with the open-ended "blank date"
range, but even there I think that should be just a matter of procesing
all the non-open-ended ranges (including the values for any open-ended
ranges that would be included), and then checking any open-ended ranges
together at the end.
But if you are dealing with a very large number of ranges, the algorithm I
describe may allow you to find the same results with fewer computations.
Note that either algorithm is essentially O(m * n), but the actual "m" and
"n" can be very different depending on your input. I suspect that in your
case, the values of "m" and "n" are small enough that a complex
optimization isn't warranted, and doing the brute-force approach is fast
enough (in fact, given that there is additional maintenance overhead for
the theoretically more efficient approach, the brute-force should in fact
outperform the theoretically more efficient approach for input that's
small enough).
I feel like this should be an easy task, but it is turning into a
nightmare. Any help or suggestions would be appreciated.
I feel your pain. I had a similar range-based process I had to deal with,
and while conceptualizing the basic task was easy, there were a lot of
little details to deal with. For what it's worth, I did wind up with a
basic "range management" class to do the heavy lifting for me.
Unfortunately, my scenario wasn't quite the same as yours; I needed to
perform different operations on the ranges than you are, and my input data
was plain integers, not dates. But if you are having trouble and you
think that code example would help more than it would hurt (after all,
looking at a similar but still fundamentally different example could just
make things more confusing), I'm happy to dig it up and post it.
Pete