Finding most common word in a list

G

General

Hi,

I have data organized in the following way:

Thousands of rows, each row has from 10-100 columns. In each cell is
text, either one or two words.

Within each row, is there an easy way to find which cells are repeated
the most?

So for example:
A1 = "America"
B1 = "Ecuador"
C1 = "Spain"
D1 = "America"

For row 1, I would want the macro to return "America".

Note that each row is totally independent of each other row.

Also, I need to run this on literally 250,000 rows with an avg of 50
entries per row...so I have to make the code as efficient as possible.
Please help!!!

Thanks,

Phil
 
R

R.VENKATARAMAN

in an empty cell type

=COUNTIF(1:1,"America")

try this and see whether you get what you want.

explanation is to find out number of times "america" comes in row 1.
you can do a loop in vba for each of the rows with the variable text to find
in different rows.
by the by can you get 250,000 rows in a sheet?
I though maximum number of rows is 65,536.
 
G

General

Thanks for the suggestion, but I was hoping for a more elegant way to
do it. This method would require a "countif" for each cell and then
additional processing after that. Any simpler ideas?

Also, you are correct that it's 65,536 rows max...my data spans 4
sheets. You can understand my need for simplicty....

Thanks,

Phil
 
R

R.VENKATARAMAN

I am giving you below a sub which you copy in vb editor and run
I have placed a few comments before the sub
the sub may not be elegant but the solution is quick and what you want.
CONFIRM you got what you want and there are no bugs
my email address is
(e-mail address removed)

regards.
conditions-there are only these sheets and
'no other sheets in the activeworkbook(I wonder whether this condition is
necessary???)
'when you type in inputbox names for e.g. america
'type correct spelling otherwise you wil get zero occasions

Public Sub test()
Dim sh As Worksheet
Dim c
Dim i As Integer
Dim n As Integer
n = 0
Dim name As String
name = InputBox("type your name")
line3:
For Each sh In ActiveWorkbook.Sheets
sh.Activate
Range("a1").Activate
On Error Resume Next
Cells.Find(what:=name, after:=ActiveCell).Activate
If Err.Number = 91 Then GoTo line1
c = ActiveCell.Address
n = n + 1
line2:
Cells.FindNext(after:=ActiveCell).Activate
If ActiveCell.Address = c Then GoTo line1
n = n + 1
GoTo line2
line1:
Next sh
line4:
MsgBox "number of occasions " & name & " occuring in all the sheets is " & n
End Sub
 
G

General

Thanks for your help thus far, but let me clear up my problem. Data
will be of the following form:
A1 = "America"
B1 = "Ecuador"
C1 = "Spain"
D1 = "America"

A2 = "Gerbil"
B2 = "Hamster"
C3 = "Goat"
D3 = "Gerbil"

My point is that each row will have different words, and I do not know
what the words will be in advance. So the program, for each row should
find whatever word appears the most. It has to do this automatically
for about 250,000 rows so this needs to be fast and not require any
user inputs.

Thanks in advance,

Phil
 
G

General

Actually, my problem can be simplified as follows:
Given a random amount of words (Anywhere from 1 to 50), find the word
that appears the most.

You don't know what the words will be ahead of time and it has to be
quick and automated and involved no user interaction.

Thanks,

Phil
 
R

R.VENKATARAMAN

I have started on your problem and have come to a stage.

please clairfy to me. do you mean you want ONLY the name which has maximum
occurences in a partiuclar sheet or in all the four sheets. . (only one name
in each sheet or only one name taken all thes sheets together) and natually
the no. of occasions.
 
R

R.VENKATARAMAN

your requirment appears, to a non expert like me, highly customised. I have
practically COMPLETED the task.
you reqire the the name which occurs maximum no. of occasions in each of the
sheets.
let me see whether I understood your requiremnts correctly
you have names in 65536 rows in each sheet(four sheets)
you dont know what the names are

you need the particular name which occurs maximum no. of times It may also
be that more than one name may occur same maximum number of times. this has
been taken intoaccount in my sub.
all your rows(65536) are filled.

1. But do you have any blank columns beyond the data. will you please tell
me the the column name where the blank columns start. I need these columns
to park some data temporarily and programatically.
i have done it for the only one sheet (sheet1) for the present
this can be easily extended to other sheets also.
this can be done only after I get confirmation from you that the programme
gives what your want.

2. what is your data base address e.g. A1 to BB65536. As i dont know your
database I have designed a small database with few rows and columns of names
(they are merely alphabets randomly selectd).


Of course it would be convenient if we exchange messages through email to
our perosnal email addresses. this is a home computer and not any
organisational or business computer. I am NOT a professional but purely
doing work on excel for my brother and friends and purely to keep myself
cerebrally active..

however if you do not feel comfortable with email messages then we shall
continue our messages through the newsgroup

regards
my eamil address
(e-mail address removed)
 
G

General

Hi, I would be happy to correspond via email, but Google cut out a
portion of your email address. my email is pakman36 and the provider is
aol.com.

To answer your question...
I have 4 sheets filled with data. Each sheet has entries in every row
from 1 to 65536. Each row has a varying number of entries (from 1 to
maybe 50).

For each ROW I need to find the most common word that appears in that
ROW. So approximately 260,000 times I need to scan a row and find out
the one word which appears the most. Some rows will have no
duplicates, and some rows will be all the same word. Howver most rows
will have a variety of different words.

Thanks in advance for your help. I hope this clears up the problem,
but if it doesn't, then please let me know.

Thanks,

Phil
 
R

R.VENKATARAMAN

thank you. this needs a change to my programme. i shall try my best. give
me couple of day. I thinkit can be solved. there is two more email addresses
of mine. you need the result i.e the name (most occurences) in EACH ROW.
That means you need 260,000 results.
am I right. you also must be needing the name and also how many occurences
or do you not want this inforamtion.

kind regarads.



(e-mail address removed)
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top