MSWord -> Html

Guest · Oct 15, 2005

Hi,

Im making a html converter from word, and i have some difficulties in the
algorithm that parses the bolds, italics, and underlines...

if in a word document i have with some bolds, italic and underlined formats
(for instance, i have this paragraph) :

IM BOLD, IM UNDERLINED, IM ITALIC

i would like to convert to:

IM BOLD, IM UNDERLINED,IM ITALIC

for me is easy to do:
IMBOLDIMUNDERLINED.....
but following this way i have a lot of tags, and i would like to minimize
this...

If anybody knows how could i organize this algorithm (taking in count that
when a word is going to be parsed, i have to check what tags are open
before,etc...) i would be grateful...

Nicholas Paldino [.NET/C# MVP] · Oct 15, 2005

Josema,

The bold style you could reduce, but you can not reduce the underline
style. Having this:

I'm underlined

Is not the same as:

I'm underlined

That being said, you could look for words that are in bold, which are
separated by nothing but whitespace, and then wrap the bold tags around
that.

I think since Office XP, you can save word documents as HTML. Since you
are accessing the object model for word already to do this, why not just use
that facility instead?

Hope this helps.

Guest · Oct 15, 2005

Hi Nicholas, first of all thanks for your fast and useful response...

Im accessing to the Office object model, cause Office 2000 when you use the
option save as html, a lot of code trash is created around...

--
Thanks again.
Regards.
Josema

Nicholas Paldino said:
Josema,

The bold style you could reduce, but you can not reduce the underline
style. Having this:

I'm underlined

Is not the same as:

I'm underlined

That being said, you could look for words that are in bold, which are
separated by nothing but whitespace, and then wrap the bold tags around
that.

I think since Office XP, you can save word documents as HTML. Since you
are accessing the object model for word already to do this, why not just use
that facility instead?

Hope this helps.

rossum · Oct 15, 2005

Hi,

Im making a html converter from word, and i have some difficulties in the
algorithm that parses the bolds, italics, and underlines...

if in a word document i have with some bolds, italic and underlined formats
(for instance, i have this paragraph) :

IM BOLD, IM UNDERLINED, IM ITALIC

i would like to convert to:

IM BOLD, IM UNDERLINED,IM ITALIC

for me is easy to do:
IMBOLDIMUNDERLINED.....
but following this way i have a lot of tags, and i would like to minimize
this...

If anybody knows how could i organize this algorithm (taking in count that
when a word is going to be parsed, i have to check what tags are open
before,etc...) i would be grateful...

Crude fix: replace " " with " ".

If you want to do it the proper way, then you are going to have a lot
of on/off switches. Some sort of bit array might be useful.

rossum

The ultimate truth is that there is no ultimate truth

Word automatically making underline to appear bold	11	Jun 2, 2009
Font to bold, italic, underline only	2	May 7, 2006
HTML in User Comments	2	Jul 28, 2004
concatenated cell properties	2	Feb 10, 2007
Word_to_html short dude...	2	Dec 11, 2003
Formatting text in a ms report (.rdlc)	0	Feb 28, 2006
Multi format conditional formating	2	Nov 24, 2006
FORMATTING NUISANCE	2	Jan 13, 2006

MSWord -> Html

Guest

Nicholas Paldino [.NET/C# MVP]

Guest

rossum

Ask a Question

Similar Threads