Splitting up bad data


J

Joannie Jae

I've been given the task of creating a new database from an old one
where multiple things were typed into one field (a very large file,
and a huge mess). Let's assume column A is the data, which looks
something like this:

A
1 Radio grid A4 Fire zone Alpha Network 4C
2 fire bravo radio C9 network 4D
3 radio A2 fire delta net 4C
4 network 4F Fire zone Charlie radio A9
5 Network 4P radio A2 fire zone delta

I need to split this data into three columns, Radio Grid, Fire Zone,
and Network (B, C, and D respectively). Knowing that the items have
unique zones (only the Fire Zone will contain alpha, bravo, charlie,
delta, etcetera), how can I create a formula that looks for incidents
of each word and assigns the correct term to the matching location in
C?

It should wind up looking like this, after all the data is split:

A B C D
1 Radio grid A4 Fire zone Alpha Network 4C A4 Alpha 4C
2 fire bravo radio C9 network 4D C9 Bravo 4D
3 radio A2 fire Delta net 4C A2 Delta 4C
4 network 4F Fire zone Charlie radio A9 A9 Charlie 4F
5 Network 4P radio A2 fire zone delta A2 Delta 4P

If the word is not found at all, I would like for it to leave the cell
blank, so I can go back and enter it by hand. (Some of the data
doesn't have all three items.) Is there a formula that can pull a
trick like this? I've been toying with it for several days now, and
come up with nothing so far. I've been so frustrated, I've been
tempted to drop the whole thing into QBasic and pull some massive
programming tricks out of my repertoire. ;-)

Any help would be most appreciated!

Joannie
 
Ad

Advertisements

A

Andy B

Hi

One problem is that there doesn't seem to be any consistency in your data.
One line says Fire zone, but the next only mentions fire. One says Radio
grid and another just radio. Is that just the way it is? Do all network
parameters begin with a 4? All radios begin with A?

Andy.
 
T

Ture Magnusson

Joannie,

Yes, this CAN be done with worksheet formulas, but I would prefer
to handle this with a short VBA procedure. I have (successfully) tested
the below procedure on your sample data and I hope that it will work
well enough on the large, messy file.

If you need more information on how to enter and run VBA code, please
ask.

Sub HelpJoannie()
'Declare variables
Dim c As Range
Dim t As String
Dim p_radio As Long
Dim p_net As Long
Dim p_fire As Long
Dim p_fire2 As Long

'Loop through all cells in the list surrounding A1
For Each c In Range("A1").CurrentRegion.Cells

'Read text into string, make it UPPER case and fix differences
t = UCase(Trim(c.Value) & " ")
t = Replace(t, "RADIO GRID", "RADIO")
t = Replace(t, "NETWORK", "NET")
t = Replace(t, "FIRE ZONE", "FIRE")

'Find positions in string
p_radio = InStr(1, t, "RADIO")
p_net = InStr(1, t, "NET")
p_fire = InStr(1, t, "FIRE")
If p_fire > 0 Then p_fire2 = InStr(p_fire + 6, t, " ")

'Extract substrings and write them to worksheet
If p_radio > 0 Then c.Offset(0, 1).Value = Mid(t, p_radio + 6, 2)
If p_fire > 0 Then c.Offset(0, 2).Value = Mid(t, p_fire + 5, (p_fire2 -
p_fire) - 5)
If p_net > 0 Then c.Offset(0, 3).Value = Mid(t, p_net + 4, 2)

Next c
End Sub

Ture Magnusson
Karlstad, Sweden
 
J

Joannie Jae

Might should have been a little more clear with what I'm doing here.
The data I'm working with is every bit as bad as what I gave as a
sample. No telling how many different people entered this stuff, and
it's a jumbled mess.

I'm not wanting to keep the terms listed in the data (such as "radio",
"Radio Zone", etcetera), just the relevant entries. What I'm looking
to fill the data with is the presence of certain unique terms. Let me
explain the rules of the data, using the fire zone example:

• Fire zone will contain "alpha", "bravo", "charlie", etcetera.

• For each cell, there will only be one fire zone listed.

• Some cells may not have a fire zone listed, in which case I want the
target cell to remain blank so I can enter the zone by hand later.

• The zone name (alpha, bravo, charlie, etcetera) may appear with
different case (Alpha, alpha, ALPHA, etcetera), and may not
necessarily have the term "fire zone" in front of it. So searching by
the name of the zone itself is critical.

• For the target cell, I want it to just say "Alpha", "Bravo",
"Charlie", etcetera. The whole idea is to split the data so that
column B contains only the radio grid, column C the fire zone, and
column D the network. Then the data will be workable and can be
merged with other applications.

Just to repeat, it looks like this as it is:

A
1 Radio grid A4 Fire zone Alpha Network 4C
2 fire bravo radio C9 network 4D
3 radio A2 fire Delta net 4C
4 network 4F Fire zone Charlie radio A9
5 Network 4P radio A2 fire zone delta

And when I'm done, I want it to appear in the appropriate columns as:

B C D
1 A4 Alpha 4C
2 C9 Bravo 4D
3 A2 Delta 4C
4 A9 Charlie 4F
5 A2 Delta 4P

Going to try Ture's suggestion of a VBA routine (which I should be
able to handle, since I do indeed RTFM). Thanks for naming a
subroutine after me. :)

I would still like to see a formula solution for this, if anyone can
suggest one, for the sake of my own knowledge in future situations.
To answer your question, Andy, there is no consistency to the data.
It may say "Radio grid A4 Fire zone Alpha Network 4C", or something as
cryptic as "a4 alpha 4c". But there are some unique rules to play
with, such as network zones always being a number followed by a
letter. If you can come up with a formula, I'm sure I can adjust the
wildcards necessary to cover any contingency. I come from a (don't
laugh) dBase/Lotus background, and Excel is a new world for me that I
have yet to conquer. ;-)

Thanks again for all the help.

Joannie
 
T

Ture Magnusson

Joannie,

I wouldn't tackle this with worksheet formulas. This is certainly a
task that is better handled by a VBA procedure.

The procedure below works in another way than my previous one.

It assumes that the list is in column A, beginning in cell A1 and that
there is nothing to the right of the list. The list is also assumed to be
contiguous, with no blank cells within the list.

First it fixes the string by
- Converting it to upper case
- Adding a space at the beginning and end of the string
- Replacing some punctuation characters with spaces

Then, to find the strings, it is assumed that
- Radio Grid is always a letter followed by a digit
- Network is alway a digit followed by a letter
- Radio Grid and Network are always surrounded by spaces
- Fire zone matches one of a number of search patterns,
I've tried to make the matches work also when the fire
zone is misspelled

Sub HelpJoannie()
'Declare variables
Dim c As Range
Dim t As String
Dim i As Long
Dim RadioGrid As String
Dim Network As String
Dim FireZone As String

'Loop through all cells in the list surrounding A1
For Each c In Range("A1").CurrentRegion.Cells

'Clear strings
RadioGrid = ""
Network = ""
FireZone = ""

'Read cell value into string, make it UPPER case
t = UCase(" " & Trim(c.Value) & " ")

'Replace some punctuation with spaces
t = Replace(t, ",", " ")
t = Replace(t, ".", " ")
t = Replace(t, "-", " ")
t = Replace(t, "/", " ")

'Find Radio Grid and Network
For i = 1 To Len(t)
If Mid(t, i, 4) Like " [A-Z]# " Then RadioGrid = Mid(t, i + 1, 2)
If Mid(t, i, 4) Like " #[A-Z] " Then Network = Mid(t, i + 1, 2)
Next i

'Determine fire zone
If t Like "*ALPHA*" Then FireZone = "ALPHA"
If t Like "*ALFA*" Then FireZone = "ALPHA"
If t Like "*BRAVO*" Then FireZone = "BRAVO"
If t Like "*CHARL*" Then FireZone = "CHARLIE"
If t Like "*DELTA*" Then FireZone = "DELTA"
If t Like "*ECHO*" Then FireZone = "ECHO"
If t Like "*FOX*" Then FireZone = "FOXTROT"
If t Like "*GOLF*" Then FireZone = "GOLF"
If t Like "*HOTEL*" Then FireZone = "HOTEL"
If t Like "*INDIA*" Then FireZone = "INDIA"
If t Like "*JUL*" Then FireZone = "JULIET"
If t Like "*KILO*" Then FireZone = "KILO"
If t Like "*LIMA*" Then FireZone = "LIMA"
If t Like "*MIKE*" Then FireZone = "MIKE"
If t Like "*NOV*" Then FireZone = "NOVEMBER"
If t Like "*OSC*" Then FireZone = "OSCAR"
If t Like "*PAPA*" Then FireZone = "PAPA"
If t Like "*QUE*" Then FireZone = "QUEBEC"
If t Like "*ROM*" Then FireZone = "ROMEO"
If t Like "*SIE*" Then FireZone = "SIERRA"
If t Like "*TANGO*" Then FireZone = "TANGO"
If t Like "*UNIFORM*" Then FireZone = "UNIFORM"
If t Like "*VICTOR*" Then FireZone = "VICTOR"
If t Like "*WHIS*" Then FireZone = "WHISKEY"
If t Like "*WIS*" Then FireZone = "WHISKEY"
If t Like "*RAY*" Then FireZone = "X-RAY"
If t Like "*YAN*" Then FireZone = "YANKEE"
If t Like "*ZULU*" Then FireZone = "ZULU"

'Write strings to worksheet
c.Offset(0, 1).Value = RadioGrid
c.Offset(0, 2).Value = FireZone
c.Offset(0, 3).Value = Network

Next c
End Sub
 
Ad

Advertisements

I

Immanuel

Given the following values in cells A1:A6

ORIGINAL
Radio grid A4 Fire zone Alpha Network 4C
fire bravo radio C9 network 4D
radio A2 fire Delta net 4C
network 4F Fire zone Charlie radio A9
Network 4P radio A2 fire zone delta

And the following values in J1:J27

PHONETICA
Alpha
Bravo
Charlie
Delta
Echo
...
...
Yankee
Zulu

Set:
E1 = "1st NUMERIC"
F1 = "1st BEGIN"
G1 = "2nd NUMERIC"
H1 = "2nd BEGIN"

In E2:
=SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:
"&LEN(A2))),9999),1)

In F2:
=LARGE(IF((MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)="
")*(ROW(INDIRECT("1:"&LEN(A2)))<E2),ROW(INDIRECT("1:"&LEN(A2))),1),1)+1

In G2:
=SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:
"&LEN(A2))),9999),2)

In H2:
=MAX(IF((MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)="
")*(ROW(INDIRECT("1:"&LEN(A2)))<G2),ROW(INDIRECT("1:"&LEN(A2))),1))+1

(All of the formulae in E2:H2 are array-entered -- Press Ctrl-Shift-Enter
instead of just Enter after typing/pasting in the formula.)

Now, we're ready to extract the data in B:D

B1: "RADIO"
C1: "PHONETICA"
D1: "FIRE"

And:

B2:
=IF(E2=F2+1,MID(A2,F2,2),IF(G2=H2+1,MID(A2,H2,2),"N/A"))
C2 (array-entered):
=IF(MIN(IF(ISERROR(SEARCH($J$2:$J$27,A2,1)),99,ROW($J$2:$J$27)))<99,INDIRECT
("J"&MIN(IF(ISERROR(SEARCH($J$2:$J$27,A2,1)),99,ROW($J$2:$J$27)))),"")
D2:
=IF(E2=F2,MID(A2,F2,2),IF(H2=G2,MID(A2,H2,2),"N/A"))

Drag all of the formulae down to the end of the list.

/i.
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top