Problems With Text Import

D

Dave

This one's driving me crazy -- Our payroll department uses
a Datamatics Time keeping system. I regularly import a
text file into Access. Recently I discovered that a small
part of the imported data was corrupted. We found that a
time clock had been mis-programmed with a single character
rather than two characters (a 3 rather than 03). The
Datamatics system put in a "place holder" that Microsoft
(Excel and Access) seem to think is a carriage return.
The "place holder" is a small rectangle. The clock has
been reprogrammed, but all of the existing data contains
this bogus clock number. I have tried to bring this text
file into both Access and Excel to try and replace the bad
clock number, but each time I try the string breaks at
this rectangle throwing all of the fields out of whack.
Due to the volume of records (80K strings at 200+
characters per string), I need a way to find and replace
this corrupt data. When I try to do a "find" while the
data is still in Notepad, I cannot hightlight the odd
character.
 
J

John Nurick

Hi Dave,

Make a copy of the textfile and open this in Word. With luck you will be
able to highlight the suspect character there and copy and paste it into
the Replace dialog (use Ctrl+V to paste it into the dialog) and replace
it with whatever you need.

After doing the replacement, be sure to save the document as a textfile
and not as a Word document. Then import into Access as usual.

If this doesn't work, open the document again in Word. Highlight the
suspect character. Switch to the VBA editor (Alt+F11) and open the
Immediate pane (Ctrl+G). Then type
?Asc(Selection.Text)
and press Enter, followed by
?AscW(Selection.Text)

Post the results back here. They'll show exactly what the suspect
character is.



This one's driving me crazy -- Our payroll department uses
a Datamatics Time keeping system. I regularly import a
text file into Access. Recently I discovered that a small
part of the imported data was corrupted. We found that a
time clock had been mis-programmed with a single character
rather than two characters (a 3 rather than 03). The
Datamatics system put in a "place holder" that Microsoft
(Excel and Access) seem to think is a carriage return.
The "place holder" is a small rectangle. The clock has
been reprogrammed, but all of the existing data contains
this bogus clock number. I have tried to bring this text
file into both Access and Excel to try and replace the bad
clock number, but each time I try the string breaks at
this rectangle throwing all of the fields out of whack.
Due to the volume of records (80K strings at 200+
characters per string), I need a way to find and replace
this corrupt data. When I try to do a "find" while the
data is still in Notepad, I cannot hightlight the odd
character.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
 
D

Dave

-----Original Message-----
Hi Dave,

Make a copy of the textfile and open this in Word. With luck you will be
able to highlight the suspect character there and copy and paste it into
the Replace dialog (use Ctrl+V to paste it into the dialog) and replace
it with whatever you need.

After doing the replacement, be sure to save the document as a textfile
and not as a Word document. Then import into Access as usual.

If this doesn't work, open the document again in Word. Highlight the
suspect character. Switch to the VBA editor (Alt+F11) and open the
Immediate pane (Ctrl+G). Then type
?Asc(Selection.Text)
and press Enter, followed by
?AscW(Selection.Text)

Post the results back here. They'll show exactly what the suspect
character is.



This one's driving me crazy -- Our payroll department uses
a Datamatics Time keeping system. I regularly import a
text file into Access. Recently I discovered that a small
part of the imported data was corrupted. We found that a
time clock had been mis-programmed with a single character
rather than two characters (a 3 rather than 03). The
Datamatics system put in a "place holder" that Microsoft
(Excel and Access) seem to think is a carriage return.
The "place holder" is a small rectangle. The clock has
been reprogrammed, but all of the existing data contains
this bogus clock number. I have tried to bring this text
file into both Access and Excel to try and replace the bad
clock number, but each time I try the string breaks at
this rectangle throwing all of the fields out of whack.
Due to the volume of records (80K strings at 200+
characters per string), I need a way to find and replace
this corrupt data. When I try to do a "find" while the
data is still in Notepad, I cannot hightlight the odd
character.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
.
When I brought the text file into Word the string broke
where that character appeared in the text file (the text
file showed the number 3 then the character (3*). The
character (a small rectangle) did not appear in Word. I
did select the space to the right of the 3 and ran your
test -- I got a "13".
 
J

John Nurick

Here's a little VBScript that will replace the Chr(13) with the 0 (zero)
that it seems you need.

Launch Notepad and paste the script into a new file, then save it in the
same folder as your problem file, giving it a name like "Fixit.vbs" (the
".vbs" is essential) and close Notepad.

Then open that folder in My Computer or Windows Explorer. Make sure you
have a backup copy of your problem file; then just drag the file onto
the script file's icon. You may get a warning from your antivirus
system; if so, tell it to let the script execute. The script should fix
the file and leave the original with ".bak" appended to its name.

'---------------Start of script-----------------
Option Explicit
Dim fso 'FileSystemObject
Dim fF 'File
Dim fIn, fOut 'Textstreams
Dim S 'String

If WScript.Arguments.Count = 1 Then
Set fso = CreateObject("Scripting.FileSystemObject")
Set fF = fso.GetFile(WScript.Arguments(0))

fF.Name = fF.Name & ".bak"
Set fIn = fF.OpenAsTextStream(1) 'ForReading
Set fOut = fso.CreateTextFile(WScript.Arguments(0))

Do 'Read line by line
S = fIn.ReadLine

'Process the line
S = Replace(S, Chr(13), "0")

'Write the processed line
fOut.Write S & Chr(13) & Chr(10)
Loop Until fIn.AtEndOfStream

'Tidy up
fIn.Close
fOut.Close
Else
MsgBox "Drag a file onto the icon to process it. " _
& vbcrlf & "Original file will be renamed with " _
& ".bak extension"
End If
'-----------End of script-----------------------


-----Original Message-----
Hi Dave,

Make a copy of the textfile and open this in Word. With luck you will be
able to highlight the suspect character there and copy and paste it into
the Replace dialog (use Ctrl+V to paste it into the dialog) and replace
it with whatever you need.

After doing the replacement, be sure to save the document as a textfile
and not as a Word document. Then import into Access as usual.

If this doesn't work, open the document again in Word. Highlight the
suspect character. Switch to the VBA editor (Alt+F11) and open the
Immediate pane (Ctrl+G). Then type
?Asc(Selection.Text)
and press Enter, followed by
?AscW(Selection.Text)

Post the results back here. They'll show exactly what the suspect
character is.



This one's driving me crazy -- Our payroll department uses
a Datamatics Time keeping system. I regularly import a
text file into Access. Recently I discovered that a small
part of the imported data was corrupted. We found that a
time clock had been mis-programmed with a single character
rather than two characters (a 3 rather than 03). The
Datamatics system put in a "place holder" that Microsoft
(Excel and Access) seem to think is a carriage return.
The "place holder" is a small rectangle. The clock has
been reprogrammed, but all of the existing data contains
this bogus clock number. I have tried to bring this text
file into both Access and Excel to try and replace the bad
clock number, but each time I try the string breaks at
this rectangle throwing all of the fields out of whack.
Due to the volume of records (80K strings at 200+
characters per string), I need a way to find and replace
this corrupt data. When I try to do a "find" while the
data is still in Notepad, I cannot hightlight the odd
character.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
.
When I brought the text file into Word the string broke
where that character appeared in the text file (the text
file showed the number 3 then the character (3*). The
character (a small rectangle) did not appear in Word. I
did select the space to the right of the 3 and ran your
test -- I got a "13".

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
 
D

Dave

-----Original Message-----
Here's a little VBScript that will replace the Chr(13) with the 0 (zero)
that it seems you need.

Launch Notepad and paste the script into a new file, then save it in the
same folder as your problem file, giving it a name like "Fixit.vbs" (the
".vbs" is essential) and close Notepad.

Then open that folder in My Computer or Windows Explorer. Make sure you
have a backup copy of your problem file; then just drag the file onto
the script file's icon. You may get a warning from your antivirus
system; if so, tell it to let the script execute. The script should fix
the file and leave the original with ".bak" appended to its name.

'---------------Start of script-----------------
Option Explicit
Dim fso 'FileSystemObject
Dim fF 'File
Dim fIn, fOut 'Textstreams
Dim S 'String

If WScript.Arguments.Count = 1 Then
Set fso = CreateObject("Scripting.FileSystemObject")
Set fF = fso.GetFile(WScript.Arguments(0))

fF.Name = fF.Name & ".bak"
Set fIn = fF.OpenAsTextStream(1) 'ForReading
Set fOut = fso.CreateTextFile(WScript.Arguments(0))

Do 'Read line by line
S = fIn.ReadLine

'Process the line
S = Replace(S, Chr(13), "0")

'Write the processed line
fOut.Write S & Chr(13) & Chr(10)
Loop Until fIn.AtEndOfStream

'Tidy up
fIn.Close
fOut.Close
Else
MsgBox "Drag a file onto the icon to process it. " _
& vbcrlf & "Original file will be renamed with " _
& ".bak extension"
End If
'-----------End of script-----------------------


-----Original Message-----
Hi Dave,

Make a copy of the textfile and open this in Word. With luck you will be
able to highlight the suspect character there and copy and paste it into
the Replace dialog (use Ctrl+V to paste it into the dialog) and replace
it with whatever you need.

After doing the replacement, be sure to save the
document
as a textfile
and not as a Word document. Then import into Access as usual.

If this doesn't work, open the document again in Word. Highlight the
suspect character. Switch to the VBA editor (Alt+F11)
and
open the
Immediate pane (Ctrl+G). Then type
?Asc(Selection.Text)
and press Enter, followed by
?AscW(Selection.Text)

Post the results back here. They'll show exactly what
the
suspect
character is.



This one's driving me crazy -- Our payroll department uses
a Datamatics Time keeping system. I regularly import a
text file into Access. Recently I discovered that a small
part of the imported data was corrupted. We found that a
time clock had been mis-programmed with a single character
rather than two characters (a 3 rather than 03). The
Datamatics system put in a "place holder" that Microsoft
(Excel and Access) seem to think is a carriage return.
The "place holder" is a small rectangle. The clock has
been reprogrammed, but all of the existing data contains
this bogus clock number. I have tried to bring this text
file into both Access and Excel to try and replace the bad
clock number, but each time I try the string breaks at
this rectangle throwing all of the fields out of whack.
Due to the volume of records (80K strings at 200+
characters per string), I need a way to find and replace
this corrupt data. When I try to do a "find" while the
data is still in Notepad, I cannot hightlight the odd
character.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
.
When I brought the text file into Word the string broke
where that character appeared in the text file (the text
file showed the number 3 then the character (3*). The
character (a small rectangle) did not appear in Word. I
did select the space to the right of the 3 and ran your
test -- I got a "13".

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
.
John, thanks so much for your help thus far. I haven'y
tried the fix yet because I think there's still an issue.
The the probem character 'follows' the time clock number
(3*). The two charcters will need to be replaced together,
not just the '13' character (or it would create a '30'
rather than the '03' that I need. Can the replacement
funtion be revised to look only for instances of the 3*
and replace with '03'?
 
D

Dave

-----Original Message-----
Here's a little VBScript that will replace the Chr(13) with the 0 (zero)
that it seems you need.

Launch Notepad and paste the script into a new file, then save it in the
same folder as your problem file, giving it a name like "Fixit.vbs" (the
".vbs" is essential) and close Notepad.

Then open that folder in My Computer or Windows Explorer. Make sure you
have a backup copy of your problem file; then just drag the file onto
the script file's icon. You may get a warning from your antivirus
system; if so, tell it to let the script execute. The script should fix
the file and leave the original with ".bak" appended to its name.

'---------------Start of script-----------------
Option Explicit
Dim fso 'FileSystemObject
Dim fF 'File
Dim fIn, fOut 'Textstreams
Dim S 'String

If WScript.Arguments.Count = 1 Then
Set fso = CreateObject("Scripting.FileSystemObject")
Set fF = fso.GetFile(WScript.Arguments(0))

fF.Name = fF.Name & ".bak"
Set fIn = fF.OpenAsTextStream(1) 'ForReading
Set fOut = fso.CreateTextFile(WScript.Arguments(0))

Do 'Read line by line
S = fIn.ReadLine

'Process the line
S = Replace(S, Chr(13), "0")

'Write the processed line
fOut.Write S & Chr(13) & Chr(10)
Loop Until fIn.AtEndOfStream

'Tidy up
fIn.Close
fOut.Close
Else
MsgBox "Drag a file onto the icon to process it. " _
& vbcrlf & "Original file will be renamed with " _
& ".bak extension"
End If
'-----------End of script-----------------------


-----Original Message-----
Hi Dave,

Make a copy of the textfile and open this in Word. With luck you will be
able to highlight the suspect character there and copy and paste it into
the Replace dialog (use Ctrl+V to paste it into the dialog) and replace
it with whatever you need.

After doing the replacement, be sure to save the
document
as a textfile
and not as a Word document. Then import into Access as usual.

If this doesn't work, open the document again in Word. Highlight the
suspect character. Switch to the VBA editor (Alt+F11)
and
open the
Immediate pane (Ctrl+G). Then type
?Asc(Selection.Text)
and press Enter, followed by
?AscW(Selection.Text)

Post the results back here. They'll show exactly what
the
suspect
character is.



This one's driving me crazy -- Our payroll department uses
a Datamatics Time keeping system. I regularly import a
text file into Access. Recently I discovered that a small
part of the imported data was corrupted. We found that a
time clock had been mis-programmed with a single character
rather than two characters (a 3 rather than 03). The
Datamatics system put in a "place holder" that Microsoft
(Excel and Access) seem to think is a carriage return.
The "place holder" is a small rectangle. The clock has
been reprogrammed, but all of the existing data contains
this bogus clock number. I have tried to bring this text
file into both Access and Excel to try and replace the bad
clock number, but each time I try the string breaks at
this rectangle throwing all of the fields out of whack.
Due to the volume of records (80K strings at 200+
characters per string), I need a way to find and replace
this corrupt data. When I try to do a "find" while the
data is still in Notepad, I cannot hightlight the odd
character.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
.
When I brought the text file into Word the string broke
where that character appeared in the text file (the text
file showed the number 3 then the character (3*). The
character (a small rectangle) did not appear in Word. I
did select the space to the right of the 3 and ran your
test -- I got a "13".

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
.
John, I'd already responded o your last e-mail, but I
don't see it -- here goes again. The 'bad' character
follows the number 3. If we replace the character with a
zero, then it will read '30' rather than the
required '03'. Can your fix be adjusted to search for '3*'
rather than just the '*'. Then we would replace both the 3
and the odd character with an '03'. Thanks so much for
your assistance -- I'm begining to see a ray of hope for
this data.

Dave
 
J

John Nurick

John, thanks so much for your help thus far. I haven'y
tried the fix yet because I think there's still an issue.
The the probem character 'follows' the time clock number
(3*). The two charcters will need to be replaced together,
not just the '13' character (or it would create a '30'
rather than the '03' that I need. Can the replacement
funtion be revised to look only for instances of the 3*
and replace with '03'?

You didn't mention that before. The standard Replace() function can't do
this. One approach is to write code using mostly the Instr() and Mid()
functions to work through the line character by character.

Another is to use a regular expression as in this script (which is based
on a template I've been working on):

'Start of code-------------------------------
'VBScript template for munging textfiles
Option Explicit
Const BAK_EXT = ".bak"
Dim fso 'FileSystemObject
Dim fF 'File
Dim fIn, fOut 'Textstreams
Dim strL 'String
Dim rgx 'Regular expression object
Dim strPattern 'Regular expression



'This function is called for each line of the
'input file. Modify as required.
Function ProcessLine(ByVal Line)
ProcessLine = rgx.Replace(Line, "0$1")
End Function


'Pattern for regular expression object.
'Modify as required.
strPattern = "(\d)\x0D" 'Digit followed by Chr(13)




'Main body of script. This usually needn't change.
If WScript.Arguments.Count = 1 Then
Set fso = CreateObject("Scripting.FileSystemObject")
Set fF = fso.GetFile(WScript.Arguments(0))

'If backup file aready exists, delete it
If fso.FileExists(fF.Path & BAK_EXT) Then
fso.DeleteFile fF.Path & BAK_EXT
End If
fF.Name = fF.Name & BAK_EXT
Set fIn = fF.OpenAsTextStream(1) 'ForReading
Set fOut = fso.CreateTextFile(WScript.Arguments(0))

'Set up regular expression object
Set rgx = New Regexp
rgx.Pattern = strPattern
rgx.Global = True

Do 'Read line by line
strL = fIn.ReadLine
strL = ProcessLine(strL)
fOut.Write strL & vbCRLF
Loop Until fIn.AtEndOfStream

'Tidy up
fIn.Close
fOut.Close

Else
MsgBox "Drag a file onto the icon to process it. " _
& vbcrlf & "Alternatively, call from commandline " _
& "with " & vbcrlf & " [cscript] script.vbs " _
& "filespec" & vbcrlf & "where script.vbs is the " _
& "script file and filespec is the input file. " _
& vbcrlf & "Original file will be renamed with " _
& ".bak extension"
End If
'End of code---------------------------------
John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top