importing Data from a Word file

J

Jayjay

**Crossposted for lack of knowing which group is best. I'll read
followups in all groups.**

I'm looking for some brainstorming assistance on how best to convert
data into a format that I can import into an existing database.

Existing Database is "Contacts" and has fields like firstname,
lastname, companyname, address1, address2, city, state, zip, etc.
Its basically a "Contact" address book.

I have 5 people who have data that needs to be brought into this
database. But here's the tricky part. Their data is in Word, and
not in a dataset that is easily imported. Their layout is just like
an envelope address layout:

FirstName LastName
CompanyName
Address1
Address2 (if exists)
City, state, zip
phonenumber
email


Repeat for about 200+ pages worth of data.

And if a company has multiple contats the list would be:

Firstname LastName
Firstname LastName
Firstname LastName
Firstname LastName
CompanyName
Address1
Address2
City, state, zip
phonenumber

Sometimes the email preceeds the phonenumber, sometimes it falls
after. No particular order

Short of manual dataentry into a dataset of some sort - can you think
of any way to manipulate this kind of data.

I have converted all text to table - which then puts all data into 1
column. I've considered playing with Excel's pivot tables. But if
someone can brainstorm and come up with another idea, I'd love to hear
them.

Thanks.
 
J

John Nurick

Hi Jayjay,

This is not fun. The general idea is to read the textfile a line at a
time, putting each line into a variable and inspecting it as you go.

When you find a line that looks like City State Zip followed by a line
that looks like a phone number or an email address, you know where you
are. Work backwards and try to match the previous lines to likely
patterns for Address1 or Address2 or Company. If you hit a blank line
when working backwards, you've probably just passed a name. If you hit a
blank line when going forwards, the next non-blank line will almost
certainly be a name. And so on. When you've worked out which line is
which in a record, try and parse out the city, state and zip, the first
and last names, and so on.

The tool you need to make it possible to recognise these patterns
without extraordinary programming effort is a regular expression engine
(like the Like operator in VBA or Word's pattern-matching engine only
far more powerful). There's a good one included in VBScript,
but there's an even better one in Perl, which also has the advantage of
being a language with especially good facilities for handling strings
and text files.

Probably it depends on your knowledge of and taste for programming. If
you're fluent in VBA and/or VBScript, that's probably the easier way to
go. If you're familiar with C/C++/Java syntax, go for Perl. If the
thought of either scares you rigid, look for a good cheap typist<g>.



**Crossposted for lack of knowing which group is best. I'll read
followups in all groups.**

I'm looking for some brainstorming assistance on how best to convert
data into a format that I can import into an existing database.

Existing Database is "Contacts" and has fields like firstname,
lastname, companyname, address1, address2, city, state, zip, etc.
Its basically a "Contact" address book.

I have 5 people who have data that needs to be brought into this
database. But here's the tricky part. Their data is in Word, and
not in a dataset that is easily imported. Their layout is just like
an envelope address layout:

FirstName LastName
CompanyName
Address1
Address2 (if exists)
City, state, zip
phonenumber
email


Repeat for about 200+ pages worth of data.

And if a company has multiple contats the list would be:

Firstname LastName
Firstname LastName
Firstname LastName
Firstname LastName
CompanyName
Address1
Address2
City, state, zip
phonenumber

Sometimes the email preceeds the phonenumber, sometimes it falls
after. No particular order

Short of manual dataentry into a dataset of some sort - can you think
of any way to manipulate this kind of data.

I have converted all text to table - which then puts all data into 1
column. I've considered playing with Excel's pivot tables. But if
someone can brainstorm and come up with another idea, I'd love to hear
them.

Thanks.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
 
C

chriske911

on a previous similar question I had this advise:

are you sure there always the same amount of fields for every record ?

I did this once myself by converting the table into a text file and with the
edit/replace function:

1. remove all blank lines
2. remove all line feeds
3. remove all extra spaces
4. insert tabs between all "fields" by replacing the "fieldname" with
^t"fieldname" (without the quotes offcourse)
5. insert a line feed between all first fields of a record by adding ^p^p
before that firs field (yes 2 linefeeds)
6. remove all fieldnames since you know what field is in what column
7. import that text file by using delimited by tabs option

you will need a lot of patience for this one

p.s.: there are more editors around then MS notepad, much faster and with
more features

bon chance
John Nurick said:
Hi Jayjay,

This is not fun. The general idea is to read the textfile a line at a
time, putting each line into a variable and inspecting it as you go.

When you find a line that looks like City State Zip followed by a line
that looks like a phone number or an email address, you know where you
are. Work backwards and try to match the previous lines to likely
patterns for Address1 or Address2 or Company. If you hit a blank line
when working backwards, you've probably just passed a name. If you hit a
blank line when going forwards, the next non-blank line will almost
certainly be a name. And so on. When you've worked out which line is
which in a record, try and parse out the city, state and zip, the first
and last names, and so on.

The tool you need to make it possible to recognise these patterns
without extraordinary programming effort is a regular expression engine
(like the Like operator in VBA or Word's pattern-matching engine only
far more powerful). There's a good one included in VBScript,
but there's an even better one in Perl, which also has the advantage of
being a language with especially good facilities for handling strings
and text files.

Probably it depends on your knowledge of and taste for programming. If
you're fluent in VBA and/or VBScript, that's probably the easier way to
go. If you're familiar with C/C++/Java syntax, go for Perl. If the
thought of either scares you rigid, look for a good cheap typist<g>.



**Crossposted for lack of knowing which group is best. I'll read
followups in all groups.**

I'm looking for some brainstorming assistance on how best to convert
data into a format that I can import into an existing database.

Existing Database is "Contacts" and has fields like firstname,
lastname, companyname, address1, address2, city, state, zip, etc.
Its basically a "Contact" address book.

I have 5 people who have data that needs to be brought into this
database. But here's the tricky part. Their data is in Word, and
not in a dataset that is easily imported. Their layout is just like
an envelope address layout:

FirstName LastName
CompanyName
Address1
Address2 (if exists)
City, state, zip
phonenumber
email


Repeat for about 200+ pages worth of data.

And if a company has multiple contats the list would be:

Firstname LastName
Firstname LastName
Firstname LastName
Firstname LastName
CompanyName
Address1
Address2
City, state, zip
phonenumber

Sometimes the email preceeds the phonenumber, sometimes it falls
after. No particular order

Short of manual dataentry into a dataset of some sort - can you think
of any way to manipulate this kind of data.

I have converted all text to table - which then puts all data into 1
column. I've considered playing with Excel's pivot tables. But if
someone can brainstorm and come up with another idea, I'd love to hear
them.

Thanks.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
 
J

John Nurick

This won't work in the present situation, because the number of fields
in a record is not constant and the fields are not named.

on a previous similar question I had this advise:

are you sure there always the same amount of fields for every record ?

I did this once myself by converting the table into a text file and with the
edit/replace function:

1. remove all blank lines
2. remove all line feeds
3. remove all extra spaces
4. insert tabs between all "fields" by replacing the "fieldname" with
^t"fieldname" (without the quotes offcourse)
5. insert a line feed between all first fields of a record by adding ^p^p
before that firs field (yes 2 linefeeds)
6. remove all fieldnames since you know what field is in what column
7. import that text file by using delimited by tabs option

you will need a lot of patience for this one

p.s.: there are more editors around then MS notepad, much faster and with
more features

bon chance
John Nurick said:
Hi Jayjay,

This is not fun. The general idea is to read the textfile a line at a
time, putting each line into a variable and inspecting it as you go.

When you find a line that looks like City State Zip followed by a line
that looks like a phone number or an email address, you know where you
are. Work backwards and try to match the previous lines to likely
patterns for Address1 or Address2 or Company. If you hit a blank line
when working backwards, you've probably just passed a name. If you hit a
blank line when going forwards, the next non-blank line will almost
certainly be a name. And so on. When you've worked out which line is
which in a record, try and parse out the city, state and zip, the first
and last names, and so on.

The tool you need to make it possible to recognise these patterns
without extraordinary programming effort is a regular expression engine
(like the Like operator in VBA or Word's pattern-matching engine only
far more powerful). There's a good one included in VBScript,
but there's an even better one in Perl, which also has the advantage of
being a language with especially good facilities for handling strings
and text files.

Probably it depends on your knowledge of and taste for programming. If
you're fluent in VBA and/or VBScript, that's probably the easier way to
go. If you're familiar with C/C++/Java syntax, go for Perl. If the
thought of either scares you rigid, look for a good cheap typist<g>.



**Crossposted for lack of knowing which group is best. I'll read
followups in all groups.**

I'm looking for some brainstorming assistance on how best to convert
data into a format that I can import into an existing database.

Existing Database is "Contacts" and has fields like firstname,
lastname, companyname, address1, address2, city, state, zip, etc.
Its basically a "Contact" address book.

I have 5 people who have data that needs to be brought into this
database. But here's the tricky part. Their data is in Word, and
not in a dataset that is easily imported. Their layout is just like
an envelope address layout:

FirstName LastName
CompanyName
Address1
Address2 (if exists)
City, state, zip
phonenumber
email


Repeat for about 200+ pages worth of data.

And if a company has multiple contats the list would be:

Firstname LastName
Firstname LastName
Firstname LastName
Firstname LastName
CompanyName
Address1
Address2
City, state, zip
phonenumber

Sometimes the email preceeds the phonenumber, sometimes it falls
after. No particular order

Short of manual dataentry into a dataset of some sort - can you think
of any way to manipulate this kind of data.

I have converted all text to table - which then puts all data into 1
column. I've considered playing with Excel's pivot tables. But if
someone can brainstorm and come up with another idea, I'd love to hear
them.

Thanks.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.

John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top