splitting text

  • Thread starter Thread starter Jason
  • Start date Start date
J

Jason

Hi,

I've this description field which comes from a comma-separated file.
This file contains moneytransactions which can be imported into an
application.

The problem is the text in the description field. It contains the
accountnumber, transfercode and some remarks. However i've noticed that
there is a fixed patern which can be splitted. The first word of the
description field can be an accountnumber or transfercode. The accountnr
has at most 11 characters (numerical with dots) and the transfercode has
a variable length.

What i would like, is this:

- Seperate the accountnr/transfercode and place it in a new field. The
splitting should check if the accountnr has at most 11 characters or
if the descriptions begins with a word, get the whole word until a
space occur.
- everthing else may stay in the description field

Can someone help me on this?
 
Hi Jason,

I like to use regular expressions for parsing text, so I'd probably do
it with code like the following. Some people think that means I'm crazy:
they'd prefer to write a long sequence of VBA code to examine the field
character by character, using functions like InStr(), Mid() and Asc().


Public Sub XXX()
Dim strDescription As String

Dim oRE As VBScript_RegExp_55.RegExp
Dim oMatches As VBScript_RegExp_55.MatchCollection

strDescription = "12345.92.1 That was the acct number"

Set oRE = CreateObject("VBScript.RegExp")

oRE.Pattern = "^(?:(?:([0-9.]{1,11})|([a-z][^ ]+)) )?(.*)$"
' Comments on pattern
' ^ anchors to start of string
' (?: ... ) groups terms in the expression
' ([0-9.]{0,11}) matches and captures up to 11 digits
' and dots, i.e. account number
' | OR
' ([a-z][^ ]+) matches and captures a substring
' starting with a letter and continuing
' with any characters except a space,
' i.e. transfercode
' ? match preceding term 0 or 1 times
' (i.e. cover the case where there's
' no acct number or code)
' (.*)$ match the remainder of the line, i.e.
' the description.

oRE.IgnoreCase = True
oRE.Global = False
Set oMatches = oRE.Execute(strDescription)
If oMatches.Count > 0 Then
On Error Resume Next
Debug.Print "Acct Number: " & oMatches(0).SubMatches(0)
Debug.Print "Transfercode: " & oMatches(0).SubMatches(1)
Debug.Print "Description: " & oMatches(0).SubMatches(2)
Else
Debug.Print "Description: " & strDescription
End If

Set oMatches = Nothing
Set oRE = Nothing

End Sub
 
Hi John,

Very interesting code you have, but i'm not so good with scripting. I
have set a reference in my vbproject, but when i compile it, i'm getting
a lot of errors.

Also can you explain me your code? I can figure out your prefixes like
what is oRE, &quot, etc.



John said:
Hi Jason,

I like to use regular expressions for parsing text, so I'd probably do
it with code like the following. Some people think that means I'm crazy:
they'd prefer to write a long sequence of VBA code to examine the field
character by character, using functions like InStr(), Mid() and Asc().


Public Sub XXX()
Dim strDescription As String

Dim oRE As VBScript_RegExp_55.RegExp
Dim oMatches As VBScript_RegExp_55.MatchCollection

strDescription = "12345.92.1 That was the acct number"

Set oRE = CreateObject("VBScript.RegExp")

oRE.Pattern = "^(?:(?:([0-9.]{1,11})|([a-z][^ ]+)) )?(.*)$"
' Comments on pattern
' ^ anchors to start of string
' (?: ... ) groups terms in the expression
' ([0-9.]{0,11}) matches and captures up to 11 digits
' and dots, i.e. account number
' | OR
' ([a-z][^ ]+) matches and captures a substring
' starting with a letter and continuing
' with any characters except a space,
' i.e. transfercode
' ? match preceding term 0 or 1 times
' (i.e. cover the case where there's
' no acct number or code)
' (.*)$ match the remainder of the line, i.e.
' the description.

oRE.IgnoreCase = True
oRE.Global = False
Set oMatches = oRE.Execute(strDescription)
If oMatches.Count > 0 Then
On Error Resume Next
Debug.Print "Acct Number: " & oMatches(0).SubMatches(0)
Debug.Print "Transfercode: " & oMatches(0).SubMatches(1)
Debug.Print "Description: " & oMatches(0).SubMatches(2)
Else
Debug.Print "Description: " & strDescription
End If

Set oMatches = Nothing
Set oRE = Nothing

End Sub



Hi,

I've this description field which comes from a comma-separated file.
This file contains moneytransactions which can be imported into an
application.

The problem is the text in the description field. It contains the
accountnumber, transfercode and some remarks. However i've noticed that
there is a fixed patern which can be splitted. The first word of the
description field can be an accountnumber or transfercode. The accountnr
has at most 11 characters (numerical with dots) and the transfercode has
a variable length.

What i would like, is this:

- Seperate the accountnr/transfercode and place it in a new field. The
splitting should check if the accountnr has at most 11 characters or
if the descriptions begins with a word, get the whole word until a
space occur.
- everthing else may stay in the description field

Can someone help me on this?
 
Hi Jason,

If you can't find the Microsoft VBScript Regular Expression 5.5 library
in the list in Tools|References, you need to register it by running this
command in the Immediate pane:

Application.References.AddFromFile "VBScript.dll\3"

If necessary, include the full path, e.g.

"C:\Windows\System32.VBScript.dll\3"

oRE is just the name I used for the regular expression object (o for
object).

" is nothing to do with me. It's one of the ways of specifying a
quote mark in HTML, and must have been added by some of the newsgroup
software between me and you. There's nothing out of the way about the
code except for the regex object and the regular expression (Pattern)
itself, which I have tried to explain.


Hi John,

Very interesting code you have, but i'm not so good with scripting. I
have set a reference in my vbproject, but when i compile it, i'm getting
a lot of errors.

Also can you explain me your code? I can figure out your prefixes like
what is oRE, &quot, etc.



John said:
Hi Jason,

I like to use regular expressions for parsing text, so I'd probably do
it with code like the following. Some people think that means I'm crazy:
they'd prefer to write a long sequence of VBA code to examine the field
character by character, using functions like InStr(), Mid() and Asc().


Public Sub XXX()
Dim strDescription As String

Dim oRE As VBScript_RegExp_55.RegExp
Dim oMatches As VBScript_RegExp_55.MatchCollection

strDescription = "12345.92.1 That was the acct number"

Set oRE = CreateObject("VBScript.RegExp")

oRE.Pattern = "^(?:(?:([0-9.]{1,11})|([a-z][^ ]+)) )?(.*)$"
' Comments on pattern
' ^ anchors to start of string
' (?: ... ) groups terms in the expression
' ([0-9.]{0,11}) matches and captures up to 11 digits
' and dots, i.e. account number
' | OR
' ([a-z][^ ]+) matches and captures a substring
' starting with a letter and continuing
' with any characters except a space,
' i.e. transfercode
' ? match preceding term 0 or 1 times
' (i.e. cover the case where there's
' no acct number or code)
' (.*)$ match the remainder of the line, i.e.
' the description.

oRE.IgnoreCase = True
oRE.Global = False
Set oMatches = oRE.Execute(strDescription)
If oMatches.Count > 0 Then
On Error Resume Next
Debug.Print "Acct Number: " & oMatches(0).SubMatches(0)
Debug.Print "Transfercode: " & oMatches(0).SubMatches(1)
Debug.Print "Description: " & oMatches(0).SubMatches(2)
Else
Debug.Print "Description: " & strDescription
End If

Set oMatches = Nothing
Set oRE = Nothing

End Sub



Hi,

I've this description field which comes from a comma-separated file.
This file contains moneytransactions which can be imported into an
application.

The problem is the text in the description field. It contains the
accountnumber, transfercode and some remarks. However i've noticed that
there is a fixed patern which can be splitted. The first word of the
description field can be an accountnumber or transfercode. The accountnr
has at most 11 characters (numerical with dots) and the transfercode has
a variable length.

What i would like, is this:

- Seperate the accountnr/transfercode and place it in a new field. The
splitting should check if the accountnr has at most 11 characters or
if the descriptions begins with a word, get the whole word until a
space occur.
- everthing else may stay in the description field

Can someone help me on this?
 
Hi John,

I've got your code working. However i do not understand some parts of
code. Could you explain me what or where this line of code looks at:

oMatches(0).SubMatches(0)

Because if i try the following string into the code, i get this:

?XXX("33.22.83.436 test")
Acct Number:
Transfercode:
Description: 33.22.83.436 test

Should be:

Acct Number: 33.22.83.436
Transfercode:
Description: test

And in case if there is some text before the acct number:

Acct Number: 33.22.83.436
Transfercode: ATM
Description: test

John said:
Hi Jason,

If you can't find the Microsoft VBScript Regular Expression 5.5 library
in the list in Tools|References, you need to register it by running this
command in the Immediate pane:

Application.References.AddFromFile "VBScript.dll\3"

If necessary, include the full path, e.g.

"C:\Windows\System32.VBScript.dll\3"

oRE is just the name I used for the regular expression object (o for
object).

" is nothing to do with me. It's one of the ways of specifying a
quote mark in HTML, and must have been added by some of the newsgroup
software between me and you. There's nothing out of the way about the
code except for the regex object and the regular expression (Pattern)
itself, which I have tried to explain.


Hi John,

Very interesting code you have, but i'm not so good with scripting. I
have set a reference in my vbproject, but when i compile it, i'm getting
a lot of errors.

Also can you explain me your code? I can figure out your prefixes like
what is oRE, &quot, etc.



John said:
Hi Jason,

I like to use regular expressions for parsing text, so I'd probably do
it with code like the following. Some people think that means I'm crazy:
they'd prefer to write a long sequence of VBA code to examine the field
character by character, using functions like InStr(), Mid() and Asc().


Public Sub XXX()
Dim strDescription As String

Dim oRE As VBScript_RegExp_55.RegExp
Dim oMatches As VBScript_RegExp_55.MatchCollection

strDescription = "12345.92.1 That was the acct number"

Set oRE = CreateObject("VBScript.RegExp")

oRE.Pattern = "^(?:(?:([0-9.]{1,11})|([a-z][^ ]+)) )?(.*)$"
' Comments on pattern
' ^ anchors to start of string
' (?: ... ) groups terms in the expression
' ([0-9.]{0,11}) matches and captures up to 11 digits
' and dots, i.e. account number
' | OR
' ([a-z][^ ]+) matches and captures a substring
' starting with a letter and continuing
' with any characters except a space,
' i.e. transfercode
' ? match preceding term 0 or 1 times
' (i.e. cover the case where there's
' no acct number or code)
' (.*)$ match the remainder of the line, i.e.
' the description.

oRE.IgnoreCase = True
oRE.Global = False
Set oMatches = oRE.Execute(strDescription)
If oMatches.Count > 0 Then
On Error Resume Next
Debug.Print "Acct Number: " & oMatches(0).SubMatches(0)
Debug.Print "Transfercode: " & oMatches(0).SubMatches(1)
Debug.Print "Description: " & oMatches(0).SubMatches(2)
Else
Debug.Print "Description: " & strDescription
End If

Set oMatches = Nothing
Set oRE = Nothing

End Sub



Hi,

I've this description field which comes from a comma-separated file.
This file contains moneytransactions which can be imported into an
application.

The problem is the text in the description field. It contains the
accountnumber, transfercode and some remarks. However i've noticed that
there is a fixed patern which can be splitted. The first word of the
description field can be an accountnumber or transfercode. The accountnr
has at most 11 characters (numerical with dots) and the transfercode has
a variable length.

What i would like, is this:

- Seperate the accountnr/transfercode and place it in a new field. The
splitting should check if the accountnr has at most 11 characters or
if the descriptions begins with a word, get the whole word until a
space occur.
- everthing else may stay in the description field

Can someone help me on this?
 
Back
Top