List Contains Words

S

Shapper

Hello,

I have a List<Post> Posts where Post object is:

public class Post {
public Int32 Id { get; set; }
public String Description { get; set; }
}

I have a String[] that has 2 to 4 words. I would like find the posts
which Description contains at least one of these words.

I think I should use Linq's Contains on Description but I am not sure
how to handle the fact I need to check 2, 3 ou 4 words.

And I would like to check it invariant culture and case so:

Camões = camões = camoes = CAMOES

How can I do this?

Thank You,
Miguel
 
A

Arne Vajhøj

I have a List<Post> Posts where Post object is:

public class Post {
public Int32 Id { get; set; }
public String Description { get; set; }
}

I have a String[] that has 2 to 4 words. I would like find the posts
which Description contains at least one of these words.

I think I should use Linq's Contains on Description but I am not sure
how to handle the fact I need to check 2, 3 ou 4 words.

And I would like to check it invariant culture and case so:

Camões = camões = camoes = CAMOES

How can I do this?

Maybe something like:

postlist.FindAll(p => arr.Contains(p.Description));

I have not tested, but ...

Arne
 
A

Arne Vajhøj

I have a List<Post> Posts where Post object is:

public class Post {
public Int32 Id { get; set; }
public String Description { get; set; }
}

I have a String[] that has 2 to 4 words. I would like find the posts
which Description contains at least one of these words.

I think I should use Linq's Contains on Description but I am not sure
how to handle the fact I need to check 2, 3 ou 4 words.

And I would like to check it invariant culture and case so:

Camões = camões = camoes = CAMOES

How can I do this?

Maybe something like:

postlist.FindAll(p => arr.Contains(p.Description));

I read his question to mean he wants the search the other direction.
That is, he wants to know if p.Description contains any of "arr".

For that, a LINQ syntax might be:

var result = from post in postlist
where arr.Any(str => post.Description.Contains(str);

I have also not tested :). But I think it's more likely to address the
original request. Note that the search is O(m*n), which is not very
efficient. But for small lists of words to check and/or small lists of
Post objects, it's probably okay.

My idea was definitely not good.

Description can obviously contain more than one word.

Your LINQ handles multi word descriptions.

But it is actually more difficult, because we need
to handle words instead of substrings and we need
to handle the invariant case insensitive.

After some testing I ended up with this:

lst.Where(p => arr.Intersect(p.Description.Split(' '), new
InvariantCaseInsenstiveStringComparion()).Count() > 0)

where:

public class InvariantCaseInsenstiveStringComparion :
IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return string.Compare(x, y,
StringComparison.InvariantCultureIgnoreCase) == 0;
}
public int GetHashCode(string obj)
{
return obj.ToLowerInvariant().GetHashCode();
}
}

Arne
 
A

Arne Vajhøj

[...]
Your LINQ handles multi word descriptions.

But it is actually more difficult, because we need
to handle words instead of substrings and we need
to handle the invariant case insensitive.

Yes, my example was strictly intended to demonstrate the LINQ aspect
(which Miguel is always seeking).
After some testing I ended up with this:

lst.Where(p => arr.Intersect(p.Description.Split(' '), new
InvariantCaseInsenstiveStringComparion()).Count() > 0)

Unfortunately, we don't have a clear definition of "word" or the format
of the Description property. The above will fail for descriptions where
words may be adjacent to punctuation, for example. There may be other
word delimiters that are valid too.

True.

But my code is easily extensible in that regard.

Just replace:

p.Description.Split(' ')

with:

p.Description.Split(" ,.:;!?", StringSplitOptions.RemoveEmptyEntries)

or how many word delimiters there may be.
And of course, while it's reasonably readable, it's not very efficient
(even less so than the version I posted, which itself wasn't all that
efficient either), because it doesn't take advantage of the fact that we
only need to find the first match, not all of them.

If performance is an issue then other data structures are needed.

Arne
 
A

Arne Vajhøj

[...]
Your LINQ handles multi word descriptions.

But it is actually more difficult, because we need
to handle words instead of substrings and we need
to handle the invariant case insensitive.

Yes, my example was strictly intended to demonstrate the LINQ aspect
(which Miguel is always seeking).
After some testing I ended up with this:

lst.Where(p => arr.Intersect(p.Description.Split(' '), new
InvariantCaseInsenstiveStringComparion()).Count() > 0)

Unfortunately, we don't have a clear definition of "word" or the format
of the Description property. The above will fail for descriptions where
words may be adjacent to punctuation, for example. There may be other
word delimiters that are valid too.

True.

But my code is easily extensible in that regard.

Just replace:

p.Description.Split(' ')

with:

p.Description.Split(" ,.:;!?", StringSplitOptions.RemoveEmptyEntries)

Insert:

..ToCharArray()

after the string literal.
or how many word delimiters there may be.


If performance is an issue then other data structures are needed.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top