String parsing with variable lenght strings

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert
 
Perhaps something like this:

For a value in A1

The numeric (left) part:
B1:
=--LEFT(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0)-1)

The unit of measure (right) part:
C1:
=TRIM(MID(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0),255))

Note: Commit those array formulas by holding down the [Ctrl][Shift] keys
when you press [Enter].

Copy those formulas down as far as you need.

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro
 
It's a bit ugly, but it seems to work. Assuming your values start in
A1 and go down, the following formula in cell B1 will break out the
number.

=IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF(ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISERR(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VALUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check
num length"))))

This will test the first two characters, then the first 3, then the
first 4 etc. to see if it is a number, and when it hits a length that
includes an alpha, it will drop down a character and return the value.
I stopped at 4 digits, but obviously you can extend it as much as you
need.

Then you can use the following formula in cell C1 to give you the text.

=RIGHT(A1,LEN(A1)-LEN(B1))

Note: The second formula will include a space if there is one. You
may want to remove all spaces before you begin to avoid that.

Hope that helps...
 
I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert

Ron C.'s formulas should work.

For a different approach, and to gain access to a number of other useful
functions, you could download and install Longre's free morefunc.xll add-in
from http://xcell05.free.fr/

Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")

(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")

(means return the first series of characters that are not in the class [0-9 ]
-- note that there is a <space> included in the expression, so that any space
in between the number and the units will not be returned).


--ron
 
frosterrj wrote...
I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.
....

Does each such record contain only 4 fields? And does every field
contain both initial numbers followed by text? And are all fields
separated by commas and possibly spaces?

If so, then define a name like seq referring to

=ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,256))

Then if your record were in cell A1, either copy it into cell A2 then
select A2:A3 and run Edit > Replace to replace all spaces with nothing,
thus deleting all spaces, or enter the following formula in A2.

A2:
=SUBSTITUTE(A1," ","")

Then try these formulas.

A3:
=--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq))

B3:
=MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),FIND(",",$A2&",",
SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3))
-COUNT($A3:A3))

C3:
=--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1),
LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3))
+COUNT($A3:B3,1),seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc.

Alternatively, download and install Laurent Longre's MOREFUNC.XLL
add-in, freely available from http://xcell05.free.fr/english/ , and use
its REGEX.MID and REGEX.SUBSTITUTE functions like so.

A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.
 
Ron Rosenfeld wrote...
....
Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")

It works, but \d+ is more compact.
(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")

It picks up the commas as well. Those could be excluded with a positive
lookahead assertion and a comma appended to the string.

B3:
=REGEX.MID($A1&",","[^0-9 ]+(?=,)",COLUMNS($A3:B3)/2)
 
Actually, I am having the most trouble because the numbers can contain
decimals:
7.63 lb, 66.5 oz etc.
Seems to be stopping at the '.' .

Harlan Grove said:
frosterrj wrote...
I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.
....

Does each such record contain only 4 fields? And does every field
contain both initial numbers followed by text? And are all fields
separated by commas and possibly spaces?

If so, then define a name like seq referring to

=ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,256))

Then if your record were in cell A1, either copy it into cell A2 then
select A2:A3 and run Edit > Replace to replace all spaces with nothing,
thus deleting all spaces, or enter the following formula in A2.

A2:
=SUBSTITUTE(A1," ","")

Then try these formulas.

A3:
=--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq))

B3:
=MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),FIND(",",$A2&",",
SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3))
-COUNT($A3:A3))

C3:
=--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1),
LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3))
+COUNT($A3:B3,1),seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc.

Alternatively, download and install Laurent Longre's MOREFUNC.XLL
add-in, freely available from http://xcell05.free.fr/english/ , and use
its REGEX.MID and REGEX.SUBSTITUTE functions like so.

A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.
 
frosterrj wrote...
Actually, I am having the most trouble because the numbers can contain
decimals:
7.63 lb, 66.5 oz etc.
Seems to be stopping at the '.' .

What is? Rather, which formulas are? The one I proposed below allow for
decimal fractions.
.....

And with the following modified string in cell A1

5.5#, 10Tins, 4 lb, 100.23Pcs

and the following defined name
....

And the following intermediate formula

The following formulas
....

produce the following results (A3:H3).

5.5 # 10 Tins 4 lb 100.23 Pcs

And the following formulas
A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.

produce the same,

5.5 # 10 Tins 4 lb 100.23 Pcs

Doesn't appear to stop at decimal fractions. Or are you referring to
some of the other responses that took your one sample record as
exhaustively indicative of your data?
 
Ron Rosenfeld wrote...
...
Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")

It works, but \d+ is more compact.

I phrased it this way for clarity, especially when comparing with the second
formula.
(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")

It picks up the commas as well. Those could be excluded with a positive
lookahead assertion and a comma appended to the string.

For some reason, I thought each combination of value/units was in a separate
cell. If so, commas would not be an issue.
--ron
 
I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert

Ron C.'s formulas should work.

For a different approach, and to gain access to a number of other useful
functions, you could download and install Longre's free morefunc.xll add-in
from http://xcell05.free.fr/

Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")

(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")

(means return the first series of characters that are not in the class [0-9 ]
-- note that there is a <space> included in the expression, so that any space
in between the number and the units will not be returned).


--ron

Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


--ron
 
Ron Rosenfeld wrote...
....
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.

It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.
 
Ron Rosenfeld wrote...
...
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.

It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.

Yes, the "best" solution often depends critically on the nature of the data
being analyzed. You've got a lot more experience than I on generalizing from
incomplete data specifications. That's very useful in these NG's.


--ron
 
I was using this suggestion from above
=IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF(ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISERR(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VALUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check
num length"))))

I think I misunderstood what you were telling me below (too late in the day
for really complex formulas!).

Robert

Harlan Grove said:
frosterrj wrote...
Actually, I am having the most trouble because the numbers can contain
decimals:
7.63 lb, 66.5 oz etc.
Seems to be stopping at the '.' .

What is? Rather, which formulas are? The one I proposed below allow for
decimal fractions.
.....

And with the following modified string in cell A1

5.5#, 10Tins, 4 lb, 100.23Pcs

and the following defined name
....

And the following intermediate formula

The following formulas
....

produce the following results (A3:H3).

5.5 # 10 Tins 4 lb 100.23 Pcs

And the following formulas
A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.

produce the same,

5.5 # 10 Tins 4 lb 100.23 Pcs

Doesn't appear to stop at decimal fractions. Or are you referring to
some of the other responses that took your one sample record as
exhaustively indicative of your data?
 
Nope, my data is just one type per cell:
9.5 OZ
30#
#10 TIN
1 GAL
38#
the basic regex.mid works.

Thanks for the pointers to the morefunc.xls. Got it an am using now.

Robert

Harlan Grove said:
Ron Rosenfeld wrote...
....
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.

It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.
 
Nope, my data is just one type per cell:
9.5 OZ
30#
#10 TIN
1 GAL
38#
the basic regex.mid works.

Thanks for the pointers to the morefunc.xls. Got it an am using now.

Robert

Glad you've got it working.

By the way, if you need to distribute your workbook, and if you installed
morefunc using the suggested defaults, there is an option on the Tools Menu to
save morefunc with the workbook.


--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top