mind helping me find a good approach to this algorithm problem?

not_a_commie · Aug 13, 2011

This sounds like a textbook problem, but I promise it's not for a
class. Rather, it's part of an audio filter I'm working on.

Problem statement: Given a collection of 1D points and a positive
intensity value at each point in the collection, find a subcollection
of the points such that they are all equally spaced (+/-E) and the
intensity sum is maximized.

Some of the points are clumped together, but I would only want one of
a clump in the result set. My initial approach to the problem was to
find the subsets that are equally spaced, but I'm not sure how to deal
with the clumps; they expand my list of subsets exponentially.

Thanks for any ideas.

not_a_commie · Aug 14, 2011

• What does it mean for points to be "equally spaced"? Are you

saying that the magnitude of the difference between consecutive points
is the same?

Yes, that's exactly what I'm saying. As there is only one coordinate per point, the magnitude is a simple difference.

• If either of the first two guesses above are correct, what rule do
you have to deal with a sequence of points where the difference between
consecutive pairs of points is within your epsilon of each other, but
pairs of points at either end of the sequence have differences greater
than your epsilon? That is, do you consider only the consecutive
differences, or is it important to compare each difference to the first
difference found in the sequence of points for the set?

It's important that the epsilon is accumulated. The sum of the epsilons over the list of points may make the distance between the first and last outside of the distance multiplier (aka, Pn - P0 is outside n*D +/- E).

• What does it mean for points to be "clumped together"? What does
it mean to have "one of a clump in the result set"?

By "clump" I mean that it's possible for several points to be within the epsilon distance.

Marcel Müller · Aug 14, 2011

This sounds like a textbook problem, but I promise it's not for a
class. Rather, it's part of an audio filter I'm working on.

Problem statement: Given a collection of 1D points and a positive
intensity value at each point in the collection, find a subcollection
of the points such that they are all equally spaced (+/-E) and the
intensity sum is maximized.

Sounds like a non-linear fit problem.

Your model function is periodic in E. Maybe simply a sine wave:
y = A cos((x - P)/(2*Pi*E)) + B
You have to adjust the model parameters A (amplitude), B (level), P
(phase) and E (periodicity) so that the points are reproduced best (x =
point location, y = intensity).

I would recommend the Levenberg-Marquardt algorithm. Unfortunately the
periodicity in E makes the algorithm a bit unstable, because it could
lock at a harmonics or sub-harmonics of the periodicity E. So a good
starting value for E is helpful. If the periodicity E is a constant
rather than something you want to get out of the data, it gets quite
stable. The phase has to be normalized by a modulo E operation after
completion.

Seek for a implementation of the Levenberg-Marquardt algorithm in .NET.
You probably don't want to write this yourself. Sometimes it is called LMA.

Some of the points are clumped together,

This makes no difference if you use a non-linear fit.

Marcel

not_a_commie · Aug 15, 2011

So put another way, there are two ways for a consecutive point to be

included in the set thus far identified: the difference between it and
the previous point is within the epsilon of the difference between the
previous two points; _or_ the difference between it and the previous
point is itself less than the epsilon.

Yes to the former, no to the latter.

For example, if the input is something like this:

1, 2, 4, 6, 6, 8, 9
then you would identify the sequence of values 2, 4, 6, 6, 8 as a
qualifying set?

No, there would only be one 6 -- the one with the highest magnitude.

Then, the "intensity sum" of that sequence would be 20: 2 + 4 + 6 + 8
(i.e. the value of 6 is counted only once).

The intensity is actually a separate number not equal to the position.

I infer from Marcel's reply that he's actually familiar with the problem
domain, and while I haven't looked up the algorithm he suggests, assume
that his suggestion may be applicable. If so, then I suppose there's no
need to further explain the problem.

I'm looking into his suggestions. To give an example on the problem,
suppose the following pairs of (position, intensity) with an epsilon
of 0.11:

(3, 5), (5, 2), (5.9, 5.2), (6, 5), (7.1, 5), (9, 2), (10,3)

Possible equal spacings and their scores are:
A: 3, 5, 7.1, 9 = 14
B: 3, 5.9, 9 = 12.2
C: 3, 6, 9 = 12
D: 5, 5.9 = 7.2 // not 7.1 because 1.2 is outside distance of 1 plus
epsilon (and outside 0.9 + epsilon if that was the distance being
tested)
E: 5, 6, 7.1 = 12
F: 9, 10 = 5
G: 7, 10 = 8
.... other pair combinations

mind helping me find a good approach to this algorithm problem?

not_a_commie

not_a_commie

Marcel Müller

not_a_commie