extracting numbers from variable text

J

JulianActon

hi

I am trying to extract a numeric value from a column of text that does
not always have any relevant data. Typical column entries are like
this:

Employees: 46 | Employee Growth: -4.17%
Employees: 3336 | Employee Growth: -5.66%
Employees: 1700 | Employee Growth: %

but in this mixed bag of data there are many odd entries of free text
e.g.
Two areas are using xx. Production studio's . . . . .

The data I want to extract is the number of employees, which can be
anything between a 1 and 6 digit number. I can do a basic extraction
using =MID(cellref, 12,6) to get the raw information, but this will not
always return a number. I need to remove the junk and just be left with
numbers in a new column.

If someone could give me some pointers here. Will I need to use VBA?

thanks
 
D

Dave O

I don't think you'll need to use VBA- depending on the way cells are
set up you can write an IF statement that only returns a numeric value.
In your example
Employees: 46 | Employee Growth: -4.17%
is the | meant to be a column border?

If yes, then try modifying your MID formula to
=VALUE(MID(A1,FIND(":",A1,1)+2,LEN(A1)))
.... where A1 is the Employees: cell. This will extract just the
numeric portion of that cell.

However, since you indicate that this formula is applied to cells that
do not contain a number, you can modify it to
=IF(ISNUMBER(VALUE(MID(A1,FIND(":",A1,1)+2,LEN(A1)))),VALUE(MID(A1,FIND(":",A1,1)+2,LEN(A1))),"")

You might also try an IF that applies the formula only to those cells
that contain the word "Employee":
=IF(ISNUMBER(FIND("Employee",A1,1)),VALUE(MID(A1,FIND(":",A1,1)+2,LEN(A1))),"")
 
R

Ron Rosenfeld

hi

I am trying to extract a numeric value from a column of text that does
not always have any relevant data. Typical column entries are like
this:

Employees: 46 | Employee Growth: -4.17%
Employees: 3336 | Employee Growth: -5.66%
Employees: 1700 | Employee Growth: %

but in this mixed bag of data there are many odd entries of free text
e.g.
Two areas are using xx. Production studio's . . . . .

The data I want to extract is the number of employees, which can be
anything between a 1 and 6 digit number. I can do a basic extraction
using =MID(cellref, 12,6) to get the raw information, but this will not
always return a number. I need to remove the junk and just be left with
numbers in a new column.

If someone could give me some pointers here. Will I need to use VBA?

thanks

If your format with regard to the Employees is always the same, then it appears
as if your number of employees will always be located between the 1st and 2nd
spaces in your string.

That being the case, the following formula will extract that number for any
number of employees:

=TRIM(MID(TRIM(A1),FIND(" ",TRIM(A1)),FIND(
" ",TRIM(MID(A1,FIND(" ",TRIM(A1)),1024)))))

This extracts the number as TEXT. If you require that the number be numeric,
then prepend a double unary.

=--TRIM(MID(TRIM(A1),FIND(" ",TRIM(A1)),FIND(
" ",TRIM(MID(A1,FIND(" ",TRIM(A1)),1024)))))

If your entries have greater variation, then post back with more data.


--ron
 
J

JulianActon

Gents, many thanks for your suggestions.

Have tried both solutions, both give #Value! when the data is not
“Employees 12345 . . . . etc”

The | is not a column border, it’s just part of the text.

Am currently trying to diagnose why am still getting #value!
 
R

Ron Rosenfeld

Gents, many thanks for your suggestions.

Have tried both solutions, both give #Value! when the data is not
“Employees 12345 . . . . etc”

The | is not a column border, it’s just part of the text.

Am currently trying to diagnose why am still getting #value!

As I posted in my response with regard to your data format, "If your entries
have greater variation, then post back with more data".

You are getting the VALUE error because the data is not in the format that you
posted.

It's hard for us to guess at what your various formats might be. Unless you
share that with us, coming up with a solution will be very difficult and time
consuming.

So give examples of the various formats of these entries.


--ron
 
J

JulianActon

Apologies.

This is the contents of a standard cell:
Employees: 1516 | Employee Growth: 9.38%

- these cells are pretty consistent, the main variant here is
Employees: 20 | Employee Growth: %
ie no figure shown for the % growth, but I'm not interested in that
data.

The other data in this column is of varying length, from between 25 and
600 chars, and is made up of free text. Here's an example:
David xxxxxxxx is no longer CIO, but Corporate Responsibility. However
he will pass the mail onto the CTO Kevin and his secretary Lynn. 020
1111 4444.

Or, the cell is completely empty.
 
R

Ron Rosenfeld

Apologies.

This is the contents of a standard cell:
Employees: 1516 | Employee Growth: 9.38%

- these cells are pretty consistent, the main variant here is
Employees: 20 | Employee Growth: %
ie no figure shown for the % growth, but I'm not interested in that
data.

The other data in this column is of varying length, from between 25 and
600 chars, and is made up of free text. Here's an example:
David xxxxxxxx is no longer CIO, but Corporate Responsibility. However
he will pass the mail onto the CTO Kevin and his secretary Lynn. 020
1111 4444.

Or, the cell is completely empty.

If it is TRUE that cells from which you do NOT want to extract data do NOT have
"Employees: " in them, then:

=IF(ISERROR(FIND("Employees: ",A1)),0,
--TRIM(MID(TRIM(A1),FIND(" ",TRIM(A1)),
FIND(" ",TRIM(MID(A1,FIND(" ",TRIM(A1)),1024))))))

will return zero for lines that do not contain that string.

If we have to do more sophisticated pattern matching, that is possible also,
but would be best done by using the morefunc.xll addin and regular expressions.


--ron
 
J

JulianActon

Thank you very much Ron. That works.

I have learnt a lot from this.

regards

Julian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top