Word Table to Access

R

Ruby Tuesday

Hi, I was wondering if expert can give me some lite to convert my word table
into access database.

Note: within each cell of my word table(s), some has multi-line data in it.
In addition, there is one row containing picture(s) as well.

So far, what I did is doing it manually for each word docs I have.

Select Table
Convert Table to Text(I use ^ character for delimiter)
Save it to a text file(with char substitution, and CR/LF)
Then, Load them to Access DB.

Is there an easier way to do this?
 
H

Hunne E. Balsiche

I'd love to see it done easily in windows. What you can do is to save your
word doc to text with delimiter and let perl handle the parsing.

Opensource to the rescue!
 
B

Beth Melton

Hi Ruby,

The easiest way to do this is to copy the table to Excel, then either
save the Excel workbook and import it into Access or copy/paste from
Excel to Access.

--
Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 
R

Ruby Tuesday

I tried that venue but no go. The problem is with the multi-line data. Excel
treat each line as a new row. I figure that Excel does not understant CR/LF
character. Perhap you can cheat that but how? Thanks
 
F

Fletcher Arnold

Ruby Tuesday said:
Hi, I was wondering if expert can give me some lite to convert my word table
into access database.

Note: within each cell of my word table(s), some has multi-line data in it.
In addition, there is one row containing picture(s) as well.

So far, what I did is doing it manually for each word docs I have.

Select Table
Convert Table to Text(I use ^ character for delimiter)
Save it to a text file(with char substitution, and CR/LF)
Then, Load them to Access DB.

Is there an easier way to do this?


If you want to write code to do it, here is an example that just creates a
message box for each cell in the table showing the value. This could be
converted to create a record in Access and you could adapt it to do this for
a number of documents, each with 1 or more tables.
Is this a route you might pursue?


Private Sub cmdExtract_Click()

On Error GoTo Err_Handler

Dim strPath As String
Dim strValue As String
Dim wdApp As Object 'Word.Application
Dim wdDoc As Object 'Word.Document
Dim wdTbl As Object 'Word.Table
Dim wdRow As Object 'Word.Row
Dim wdCol As Object 'Word.Column
Dim wdCell As Object 'Word.Cell

strPath = "C:\Example.doc"

Set wdApp = CreateObject("Word.Application")

Set wdDoc = wdApp.Documents.Open(strPath)

If wdDoc.Tables.Count > 0 Then

Set wdTbl = wdDoc.Tables(1)

For Each wdRow In wdTbl.Rows

For Each wdCol In wdTbl.Columns

strValue = wdTbl.Cell(wdRow.Index, wdCol.Index).Range.Text

If Len(strValue) > 2 Then
strValue = Left$(strValue, Len(strValue) - 2)
Else
strValue = ""
End If

MsgBox strValue

Next wdCol

Next wdRow

End If

MsgBox "Done", vbInformation

Exit_Handler:

On Error Resume Next

If Not wdDoc Is Nothing Then
wdDoc.Close
Set wdDoc = Nothing
End If

If Not wdApp Is Nothing Then
wdApp.Quit
Set wdApp = Nothing
End If

Exit Sub

Err_Handler:
MsgBox Err.Description, vbExclamation, "Error No: " & Err.Number
Resume Exit_Handler

End Sub
 
B

Beth Melton

You could use Find/Replace in Word table:

Find: ^p
Replace: @@@ (or a character you are not using)

In Excel use Find/Replace:

Find: @@@
Replace: press <Alt 010> (Use the 10-key pad for this. Note that you
will not see anything)

Note that you may need to restart Excel to clear the search string in
the Replace text box.

--
Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 
P

Pieter Linden

Ruby Tuesday said:
Hi, I was wondering if expert can give me some lite to convert my word table
into access database.

Note: within each cell of my word table(s), some has multi-line data in it.
In addition, there is one row containing picture(s) as well.

So far, what I did is doing it manually for each word docs I have.

Select Table
Convert Table to Text(I use ^ character for delimiter)
Save it to a text file(with char substitution, and CR/LF)
Then, Load them to Access DB.

Is there an easier way to do this?

Don't think so, unless you use code to do the conversions and then
imports for you. If they're all in the same directory, you could use
the Dir function to loop through the contents of the directory,
process the Word file to create the importable file, and then after
those are done, import them into Access... but that's all I can think
of.
 
R

Ruby Tuesday

Can I do it programatically? I have about 100 word files with at least 3
tables inside them.

I don't know if the following is possible. Since we can save those word docs
into an html documents, can we parse the html docs and extract the data?
They are pretty structured <table>... <tr>...<td> ... </td>..</tr>...<table>
and some other formatting tags such as <b></b> etc...
Wish there are tools to do that.

Thanks
 
R

Ruby Tuesday

Thanks Fletcher.

I might be able to follow your VBscript(is it? or is it a VB program that
need to be compiled?) program but will this program work for multi-line
cells? As I mentioned in my earlier message, I can't directly cut-and-paste
the table into the excel sheet cause excel treat each line of the multi-line
cell as another row of data. Somehow excel do not know how to handle cell
data with CR/LF character in it. Can we escape them?

Thanks again
 
F

Fletcher Arnold

Ruby Tuesday said:
Thanks Fletcher.

I might be able to follow your VBscript(is it? or is it a VB program that
need to be compiled?) program but will this program work for multi-line
cells? As I mentioned in my earlier message, I can't directly cut-and-paste
the table into the excel sheet cause excel treat each line of the multi-line
cell as another row of data. Somehow excel do not know how to handle cell
data with CR/LF character in it. Can we escape them?

Thanks again


The code was actually written in Access with a form and a button named
"cmdExtract" and you could just cut and paste the code. With very minor
modifications you could cut and paste this code into a normal text file and
save it as Whatever.vbs (vbs = visual basic script) and it would run.
Perhaps better still would be to re-write the file as an hta (html
application) which gives you a nice browser-based interface to work with but
is still simply plain text files.

I don't know if newgroup participants would want to write a complete html
application for nothing, but it would not take an experienced programmer
long. You could post some more specific details about the transformation
you need, eg:

Cell A1: "Name"
Cell B1: "Address"
Cell A2: "Peter Smith"
Cell B2: "85 Station Road"
"Newbury"
"Berkshire"
"United Kingdom"


How would this translate into your Access table. What tables and field
names would you have? I see elsewhere in the thread you will have up to 3
tables per doc.


Fletcher
 
R

Ruby Tuesday

Thanks Fletcher.

I have at most 3 tables per word documents. And for sure, I'd love to
understand the process instead of someone else writing the whole thing.

As I mentiond I have the following data in a word table(s)

Field1: Name text(40)
Field2: <image> jpeg
Field3: Desc memo - contain CR/LF
Field4: Note memo - contain CR/LF

e.g:

Field1 Field 2 Field3 Field4
Exa One <image1> A candy in a glass On top of
refrigerator
jar.
next to the cabinet
Exa Tw0 <image2> Car with broken In the garage by
windshield and flat John
John's garden
tire.
in Sarasota

Visually, in the word table, I have 3 rows, and 4 coloumns(12 cells). The
first row is the label, and the 2nd & 3rd row are the data. If you notice,
in field3 and field 4, the cell contain a multi-line(it has CR/LF at the end
of each line-- it is not the wrap around!).

Therefore, when I cut and paste them in the excel table, I end up having
6 ROWS, and 4COLS(that is 24 cells). It treat each line of field3&field4 as
another record.

My question is, how do you write a vba in word(or access or excel) to
extract those data so I will represent the correct one. That is it should
result that I have 2 rows, 4colums of data. Do we have to somewhat escaped
CR/LF on each cells?

Thanks again.
 
N

Norman Scheinin

Fletcher,

Your code to extract data from Word tables works great. Do you also
have code to extract data from a table in the Header of a Word document?

Thanks,
Norman Scheinin
(e-mail address removed)
 
N

Norman Scheinin

Fletcher,

Your code to extract data from Word tables works great. Do you also
have code to extract data from a table in the Header of a Word document?

Thanks,
Norman Scheinin
(e-mail address removed)
 
R

Ruby Tuesday

Norman, would you share how you extract the word table data? Perhaps you can
give me some vba samples on how to do it. Thank much.
 
R

Ruby Tuesday

Fletcher, thanks for the code. I test it and it works, but there are a few
things I'd love to know how.

Instead of displaying it on the msg box, how would you insert it to the
database, say, the access or mysql database? Do I have to use ODBC? How?

Also, if one of the colums contain images, how one extract the image? Thanks
 
F

Fletcher Arnold

Ruby Tuesday said:
Fletcher, thanks for the code. I test it and it works, but there are a few
things I'd love to know how.

Instead of displaying it on the msg box, how would you insert it to the
database, say, the access or mysql database? Do I have to use ODBC? How?


If you can give me a while, I will post a more complete example.

Fletcher
 
P

Pieter Linden

I tweaked Fletcher's code so you can write the Word table info to your
database...

Option Compare Database

Private Sub cmdExtract_Click()

On Error GoTo Err_Handler

Dim strPath As String
Dim strValue As String
Dim wdApp As Object 'Word.Application
Dim wdDoc As Object 'Word.Document
Dim wdTbl As Object 'Word.Table
Dim wdRow As Object 'Word.Row
Dim wdCol As Object 'Word.Column
Dim wdCell As Object 'Word.Cell

'--NEW STUFF
Dim rs As DAO.Recordset
Set rs = CurrentDb.OpenRecordset("mytable", dbOpenDynamic)
'"mytable" is the name of the table in your DB you'll be writing to
strPath = "C:\Example.doc"

Set wdApp = CreateObject("Word.Application")

Set wdDoc = wdApp.Documents.Open(strPath)

If wdDoc.Tables.Count > 0 Then

Set wdTbl = wdDoc.Tables(1)

'--Row is analogous to Access Record
For Each wdRow In wdTbl.Rows
'--Column is analogous to Access Table Field
rs.AddNew
For Each wdCol In wdTbl.Columns

strValue = wdTbl.Cell(wdRow.Index,
wdCol.Index).Range.Text

If Len(strValue) > 2 Then
strValue = Left$(strValue, Len(strValue) - 2)
rs.Fields(wdCol.Index) = strValue
Else
strValue = ""
End If

MsgBox strValue

Next wdCol

'end of the column/record, so save it
rs.Update
Next wdRow

End If

rs.Close
Set rs = Nothing
MsgBox "Done", vbInformation

Exit_Handler:

On Error Resume Next

If Not wdDoc Is Nothing Then
wdDoc.Close
Set wdDoc = Nothing
End If

If Not wdApp Is Nothing Then
wdApp.Quit
Set wdApp = Nothing
End If

If Not rs Is Nothing Then
rs.Close
Set rs = Nothing
End If

Exit Sub

Err_Handler:
MsgBox Err.Description, vbExclamation, "Error No: " & Err.Number
Resume Exit_Handler

End Sub


(Apologies for bollixing up your code, Fletcher)

Pieter
 
F

Fletcher Arnold

Ruby Tuesday said:
Thank you, Fletcher.


On the basis that the OP wanted to know how to get the code to run, I
thought I would offer a solution which only requires Notepad to create a
working application, provided certain system components are in place. These
components should be present if you have a reasonably up-to-date Windows
installation - so it shouldn't need any fiddling around with.

To get the data into Access, open up any normal text editor (eg NotePad) and
copy and paste the code into a new file and save the file as "Xtractor.hta"
The .hta extension is for a html application.

The code was really just for a bit of fun - to try out these hta files. One
plus side is that I can post plain text to the newsgroup, but there are a
number of downsides, including error handling. If posting attachments were
allowed, I am sure an Access/VBA/DAO solution would be better than the
VBS/ADO code posted here.

Anyway, feel free to try it out and let me know how you get on.

Fletcher



Copy everything below the stars:
' ************************************************

<html>
<head>
<title>Table Extractor</title>
<script language="vbscript">
<!--

Sub DoMain()

Dim lngMaxCols
Dim strFolder
Dim strDbName
Dim strDbPath
Dim strMsg

strFolder = document.all.txtFolder.value

If Right(strFolder,1) <> "\" Then
strFolder = strFolder & "\"
End If

If Not FolderExists(strFolder) Then
Msgbox "Non-existant Folder"
Exit Sub
End If

If CountWordDocs(strFolder) < 1 Then
Msgbox "No Word Docs"
Exit Sub
End If

strDbName = document.all.txtDbName.value

strDbPath = strFolder & strDbName

If FileExists(strDbPath) = True Then

strMsg = "The following file already exists:"
strMsg = strMsg & vbCrLf
strMsg = strMsg & "Do you want to overwrite it?"

If Msgbox(strMsg, vbExclamation OR vbYesNoCancel) <> vbYes Then
Exit Sub
End If

If Not DeleteFile(strDbPath) Then
strMsg = "Error deleting file"
strMsg = strMsg & vbCrLf
strMsg = strMsg & "Check the file is not in use."
MsgBox strMsg, vbCritical
Exit Sub
End If
End If

If IsNumeric(document.all.txtMaxColumns.value) Then
lngMaxCols = Clng(document.all.txtMaxColumns.value)
Else
Msgbox "Columns"
Exit Sub
End If

If (lngMaxCols < 1) OR (lngMaxCols > 200) Then
Msgbox "Columns"
Exit sub
End If

If CreateDb(strDbPath, lngMaxCols) = False Then
Msgbox "Error Creating Database", vbCritical
Exit Sub
End If

ImportDocs strFolder, strDbPath, lngMaxCols

strMsg = "Word tables successfully imported" & vbCrLf
strMsg = strMsg & "Do you want to open the database?"

If Msgbox(strMsg, vbInformation OR vbYesNoCancel) = vbYes Then
StartDb(strDbPath)
End If

End Sub




Function CleanString(strDirty)

Dim strClean
Dim lng

strClean = Trim(strDirty)

If Len(strClean) > 0 Then

strClean = Replace(strClean, Chr(13), vbCrLf)

For lng = Len(strClean) To 1 Step -1
If Asc(Mid(strClean, lng, 1)) > 32 Then
Exit For
End If
Next

strClean = Left(strClean, lng)
End If

If Len(strClean) > 255 Then
strClean = Left(strClean, 250) & "..."
End If

CleanString = strClean

End Function




Function FolderExists(strFolder)

On Error Resume Next

Dim fso
Dim fld

FolderExists = False

Set fso = CreateObject("Scripting.FileSystemObject")

Set fld = fso.GetFolder(strFolder)

If Err.Number = 0 Then FolderExists = True

Set fld = Nothing

Set fso = Nothing

End Function




Function FileExists(strPath)

On Error Resume Next

Dim fso
Dim fil

FileExists = False

Set fso = CreateObject("Scripting.FileSystemObject")

Set fil = fso.GetFile(strPath)

If Err.Number = 0 Then FileExists = True

Set fil = Nothing

Set fso = Nothing

End Function




Function DeleteFile(strPath)

On Error Resume Next

Dim fso

DeleteFile = False

Set fso = CreateObject("Scripting.FileSystemObject")

fso.DeleteFile strPath, True

If Err.Number = 0 Then DeleteFile = True

Set fso = Nothing

End Function




Sub StartDb(strDbPath)

Dim wshShell
Dim lng

Set wshShell = CreateObject("WScript.Shell")

lng = wshShell.Run(strDbPath, 1)

Set wshShell = Nothing

End Sub




Function CountWordDocs(strFolder)

On Error Resume Next

Dim fil
Dim lng

lng = 0

Set fso = CreateObject("Scripting.FileSystemObject")

Set fld = fso.GetFolder(strFolder)

If Err.Number = 0 Then

For each fil in fld.Files

If Right(fil.Name, 4) = ".doc" Then
lng = lng + 1
End If
Next

End If

Set fil = Nothing

Set fld = Nothing

Set fso = Nothing

CountWordDocs = lng

End Function




Sub SetFolder()

Dim strFolder

strFolder = BrowseFolder("Choose a folder", &h0007, "c:\")

If Len(strFolder)>0 Then
document.all.txtFolder.value = strFolder
End If

End Sub




Function BrowseFolder(sPrompt, BrowseInfo, root)

On Error Resume Next

Dim oShell
Dim oFolder
Dim iColonPos
Dim oWshShell

Set oShell = CreateObject("Shell.Application")

Set oWshShell = CreateObject("WScript.Shell")

Set oFolder = oShell.BrowseForFolder(&h0&, sPrompt, BrowseInfo, root)

BrowseFolder = oFolder.ParentFolder.ParseName(oFolder.Title).Path

If Err.Number <> 0 Then

BrowseFolder = Null

If oFolder.Title = "Desktop" Then
BrowseFolder = oWshShell.SpecialFolders("Desktop")
End If

iColonPos = InStr(oFolder.Title, ":")

If iColonPos > 0 Then
BrowseFolder = Mid(oFolder.Title, iColonPos - 1, 2) & "\"
End If
End If

End Function




Function GetFolder()

Dim objShell
Dim objFolder

set objShell = CreateObject("Shell.Application")

set objFolder = objShell.BrowseForFolder(0, "Example", 0, "" )

If (not objFolder is nothing) then
GetFolder = "X" 'objFolder.Path
Else
GetFolder = ""
End if

set objFolder = Nothing
set objShell = Nothing

End function




Sub ImportTables(wdDoc, rst, lngMaxCols)

Dim wdTbl
Dim lngTblNo
Dim wdRow
Dim wdCol
Dim strValue

lngTblNo = 0

For Each wdTbl In wdDoc.Tables

lngTblNo = lngTblNo + 1

For Each wdRow In wdTbl.Rows

rst.AddNew
rst(1) = wdDoc.path & "\" & wdDoc.Name
rst(2) = lngTblNo
rst(3) = wdRow.Index

For Each wdCol In wdTbl.Columns

strValue = wdTbl.Cell(wdRow.Index, wdCol.Index).Range.Text

strValue = CleanString(strValue)

If Len(strValue) > 0 And (wdCol.Index < (lngMaxCols + 1)) Then
rst(3 + wdCol.Index) = strValue
End If

Next

rst.Update

Next

Next

Set wdTbl = Nothing

End Sub




Sub ImportDocs(strFolder, strDbPath, lngMaxCols)

Dim strSQL
Dim strCnn
Dim cnn
Dim rst
Dim wdApp
Dim wdDoc
Dim fso
Dim fld
Dim fil

strCnn = "Provider=Microsoft.Jet.OLEDB.4.0;"
strCnn = strCnn & "Data Source=" & strDbPath

Set cnn = CreateObject("ADODB.Connection")

cnn.ConnectionString = strCnn

cnn.Open

strSQL = "SELECT * FROM tblWordTables"

Set rst = CreateObject("ADODB.Recordset")

rst.Open strSQL, cnn, 2, 3

Set wdApp = CreateObject("Word.Application")

Set fso = CreateObject("Scripting.FileSystemObject")

Set fld = fso.GetFolder(strFolder)

For Each fil In fld.Files

If Right(fil.Name, 4) = ".doc" Then

Set wdDoc = wdApp.Documents.Open(fil.Path)

ImportTables wdDoc, rst, lngMaxCols

wdDoc.Close

Set wdDoc = Nothing

End If
Next

Set fil = Nothing
Set fld = Nothing
Set fso = Nothing
wdApp.Quit
Set wdApp = Nothing
rst.Close
Set rst = Nothing
cnn.Close
Set cnn = Nothing
End Sub




Function CreateTextColumn(catCatalog, tblTable, strColumnName)

On Error Resume Next

Const adVarWChar = 202
Dim col

Set col = CreateObject("ADOX.Column")
col.ParentCatalog = catCatalog
col.Name = strColumnName
col.Type = adVarWChar
col.properties("Nullable").Value = True
col.Properties("Jet OLEDB:Allow Zero Length").Value = False

tblTable.Columns.Append col

If Err.Number = 0 Then CreateTextColumn = True

Set col = Nothing

End Function




Function CreateLongColumn(catCatalog, tblTable, strColumnName)

On Error Resume Next

Const adInteger = 3
Dim col
Dim idx

Set col = CreateObject("ADOX.Column")
col.ParentCatalog = catCatalog
col.Name = strColumnName
col.Type = adInteger

tblTable.Columns.Append col

If Err.Number = 0 Then

Set col = Nothing

Set idx = CreateObject("ADOX.Index")

idx.Name = strColumnName

idx.Unique = False

Set col = CreateObject("ADOX.Column")

col.Name = strColumnName

idx.Columns.Append col

tblTable.Indexes.Append idx

If Err.Number = 0 Then

CreateLongColumn = True

End If

End If

Set idx = Nothing

Set col = Nothing

End Function


Function CreatePrimaryKey(catCatalog, tblTable, strColumnName)

On Error Resume Next

Const adInteger = 3
Dim col

Set col = CreateObject("ADOX.Column")
col.ParentCatalog = catCatalog
col.Name = strColumnName
col.Type = adInteger
col.Properties("AutoIncrement").Value = True

tblTable.Columns.Append col

If Err.Number = 0 Then

tblTable.Keys.Append "PrimaryKey", 1, strColumnName

If Err.Number = 0 Then

CreatePrimaryKey = True

End If

End If

Set col = Nothing

End Function


Function CreateDb(strPath, lngMaxCols)

On Error Resume Next
Dim cat
Dim tbl
Dim col
Dim str
Dim strColName
Dim lngColCount

CreateDb = True

Set cat = CreateObject("ADOX.Catalog")

str = "Provider=Microsoft.Jet.OLEDB.4.0;"
str = str & "Jet OLEDB:Engine Type=5;"
str = str & "Data Source=" & strPath

cat.Create str

If Err.Number <> 0 Then Exit Function

Set tbl = CreateObject("ADOX.Table")

tbl.ParentCatalog = cat

tbl.Name = "tblWordTables"

If Not CreatePrimaryKey(cat, tbl, "ID") Then Exit Function

If Not CreateTextColumn(cat, tbl, "DocPath") Then Exit Function

If Not CreateLongColumn(cat, tbl, "TableNo") Then Exit Function

If Not CreateLongColumn(cat, tbl, "RowNo") Then Exit Function

For lngColCount = 1 to lngMaxCols

strColName = Cstr(1000 + lngColCount)

strColName = "Column" & Mid(strColName, 2)

If Not CreateTextColumn(cat, tbl, strColName) Then
Exit Function
End If
Next

cat.Tables.Append tbl

If Err.Number = 0 Then
CreateDb = True
Else
Msgbox Err.Description
End If

Set tbl = Nothing

Set cat = Nothing

End Function

-->
</script>

<body bgcolor="#CCCCFF">

<table>
<th colspan="2" align="center">Extract Tables From Microsoft Word</th>

<tr><td>&nbsp;</td><td>&nbsp;</td></tr>

<tr>
<td>Document Folder</td>
<td><input type="text" id="txtFolder" value="C:\"</td>
</tr>

<tr>
<td>Database Name</td>
<td><input type="text" id="txtDbName" value="WordTables.mdb"</td>
</tr>

<tr>
<td>Maximum Columns</td>
<td><input type="text" id="txtMaxColumns" value="10"</td>
</tr>

<tr><td>&nbsp;</td><td>&nbsp;</td></tr>

<tr>
<td><input type="button" name="cmdFolder" value="Change Folder"
onclick="SetFolder()"></td>
<td><input type="button" name="cmdImport" value="Import Tables"
onclick="DoMain()"></td>
</tr>

</table>

</body>
</html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top