Extracting text

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I am copying a number of columns from a PDF file and pasting in Excel. It
pastes with spaces between words and between columns. The columns look like
this:
# TEXT (between 2 and 6 words) # # # # # (where each # means there is a
number in that column.)

i.e.
103 Deer Park CDP 23 476 9,435 334 234
103 Harbor on the Hill Village 34 543 12,345 332 555

Converting text to columns is a problem since there are differences in the
number of words. What I would like to know is:

Is there a way to first extract all the text from each line and put it into
another column and then I can do convert text to columns on the rest of it.
 
Try this:

A1 = 103 Deer Park CDP 23 476 9,435 334 234

B1 =

=LEFT(A1,FIND("~",SUBSTITUTE(A1," ","~",LEN(A1)-LEN(SUBSTITUTE(A1,"
",""))-4))-1)

C1 =

=SUBSTITUTE(A1,B1&" ","")

You can convert the formula in C1 to a constant by Edit>Copy then Edit>Paste
Special>Values then do Text to Columns>Delimited>Space

Biff
 
The formula for B1 may have been affected by line wrap depending on what
application you use to view these forums. Here it is again in chunks so that
line wrap won't mess it up:

=LEFT(A1,FIND("~",SUBSTITUTE(A1," ","~",LEN(A1)
-LEN(SUBSTITUTE(A1," ",""))-4))-1)

Line wrap always breaks at a space character and there are intentional
spaces in the formula.

Biff
 
The function (although I have no clue what it's doing) works insofar as it
pulls out the number in the first column together with the text.

Two problems:
1) Sometimes I get a #VALUE error message.

2) The major problem is that I want the function to eliminate that text from
the first cell.

Ideas?
 
Instead of eliminating the text from the first cell, is there a way to put
the remainder of the cell's contents (without the text) into a separate cell?
 
One way - replace each of 0-9 followed/preceded by space with 0-9
followed/preceded by semicolon...

103;Deer Park CDP;23;476;9,435;334;234;
103;Harbor on the Hill Village;34;543;12,345;332;555;

Then do Text-To-Columns with semicolon.

To automate the first part try this script (Alt+F11), paste into
immediate window and select last line then press Enter to execute..

for i = 0 to 9: _
selection.replace i & " ",i & ";": _
selection.replace " " & i,";" & i: _
next i
 
Sometimes I get a #VALUE error message.

The formulas work on the *2* samples you posted. If they doesn't work on
some entries then those entries must have a different format than the *2*
posted samples.

Biff
 
103 Deer Park CDP 23 476 9,435 334 234
103 Harbor on the Hill Village 34 543 12,345 332 555

To do this with formulas, you can download and install Longre's free and easily
distributable morefunc.xll add-in from http://xcell05.free.fr/

Then, with your data in A1:

Text:
B1: =TRIM(REGEX.MID($A1,"\D+"))

Numbers
C1: =REGEX.MID($A1,"\b[^a-z\s]+\b",COLUMNS($A:A),FALSE)

and copy/drag across to H1.


--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top