Extract text from large Text

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have large text like this: and I want just :

A-X000-1XS-1XSX1-OUK-5507 I want 1XS in a column and 1XSX1 in another
A-Y00-2XXA-2XXAX-OUK-5507 I want 2XXA in a column and 2XXAX in another
A-X000-MS-MSX-OUK-5502 I want MS in a column and MSX in another
A-Y000-LR-LRR-OUK-5503 I want LR in a column and LRR in another


please help,
thanks in advance
 
These formulas should do the trick. Assuming all of your data follows the
same format, and you want to extract the text between the 2nd and 3rd
hyphens, and between the 3rd and 4th hyphens.

=MID(A1,FIND("~",SUBSTITUTE(A1,"-","~",2))+1,FIND("~",SUBSTITUTE(A1,"-","~",3))-FIND("~",SUBSTITUTE(A1,"-","~",2))-1)

=MID(A1,FIND("~",SUBSTITUTE(A1,"-","~",3))+1,FIND("~",SUBSTITUTE(A1,"-","~",4))-FIND("~",SUBSTITUTE(A1,"-","~",3))-1)

HTH,
Elkar
 
You can use this array formula, although for some reason I thought it
would be more elegant than the hard-coded alternative. I'm assuming
your large text is in column A, then with B1:E1 selected, paste this
into the formula bar and press Ctrl+Shift+Enter (instead of just
Enter):

=TRANSPOSE(MID(A1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW($1:$4)))
+1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW($2:$5)))-
FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW($1:$4)))-1))

You will get this result:

A-X000-1XS-1XSX1-OUK-5507 X000 1XS 1XSX1 OUK
A-Y00-2XXA-2XXAX-OUK-5507 Y00 2XXA 2XXAX OUK
A-X000-MS-MSX-OUK-5502 X000 MS MSX OUK
A-Y000-LR-LRR-OUK-5503 Y000 LR LRR OUK

The inner four - separated elements are placed in individual cells.
This technique can be applied to any number of dashes, simply by
modifying the ROW($1:$4) reference. For example, if you use this
starting large text:

A-X000-1XS-1XSX1-OUK-5507-BY65K-1SX2M-00952

You use this formula:

=TRANSPOSE(MID(A1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW($1:$7)))
+1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW($2:$8)))-
FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW($1:$7)))-1))

You will get these elements in separate cells:

X000 1XS 1XSX1 OUK 5507 BY65K 1SX2M


One problem with this approach is potential errors. Instead of coding
ISERROR() type formulas, you can instead count the number of dashes
before you begin, like this:

=MID(A1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW(INDIRECT("1:"&(LEN(A1)-
LEN(SUBSTITUTE(A1,"-","")))-1))))
+1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW(INDIRECT("2:"&(LEN(A1)-
LEN(SUBSTITUTE(A1,"-","")))))))-
FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW(INDIRECT("1:"&(LEN(A1)-
LEN(SUBSTITUTE(A1,"-",""))-1)))))-1)

This will only results in rows (you have to array-enter it in cells
J1:J7, for example). The column equivalent formula will only work in
Excel 2007, due to the number of nested functions:

=TRANSPOSE(MID(A1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW(INDIRECT("1:"&(LEN(A1)-
LEN(SUBSTITUTE(A1,"-","")))-1))))
+1,FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW(INDIRECT("2:"&(LEN(A1)-
LEN(SUBSTITUTE(A1,"-","")))))))-
FIND(CHAR(200),SUBSTITUTE(A1,"-",CHAR(200),ROW(INDIRECT("1:"&(LEN(A1)-
LEN(SUBSTITUTE(A1,"-",""))-1)))))-1))

Anyway, sounds like you already got your answer, but I thought you
might find this useful.
 
Back
Top