Convert HTML -> PlainText (-> HTML)

  • Thread starter Thread starter Joerg Battermann
  • Start date Start date
J

Joerg Battermann

Hey there,

mmmm does anyone know a library or anything I can use to convert
plaintext to html real quick and vice-versa? Regex surely would do the
trick to a certain level, but maybe there's already a useful library out
there which I might have a look at..

Thanks,
-J
 
I could see the HTML -> plain text. But how do you convert plain text to
html? I mean, at that point, all your have is the text, and none of the
formatting, none of the elements. How would you figure out what to put
where?

You can use MSHTML to strip out all HTML formatting.
 
Joerg Battermann said:
mmmm does anyone know a library or anything I can use to convert plaintext
to html real quick and vice-versa?

I am not sure what exectly you want to archieve, but maybe you want to
encode characters like "<", ">", etc. using according named character
entities:

Text -> HTML

'HttpUtility.HtmlEncode'.

XML encoding:

\\\
Imports System.Text
Imports System.Xml
..
..
..
Public Function XmlEncode(ByVal Text As String) As String
Dim sb As New StringBuilder(Text.Length)
Dim tw As New XmlTextWriter(New StringWriter(sb))
tw.WriteString(Text)
tw.Flush()
Return sb.ToString()
End Function
///
 
Joerg,

For HTML to plain text see the advice from Marina, the other way around can
be as simple as

Before your text
<html><body>
after it
</body></html>

However probably do you not mean plain text.

Cor
 
Cor,

I actually did mean sorta plaintext... or to be more exact text I
extract from an Excel Cell.

I am currently writing a synchronizer that allows user to sync. data
that is stored partially in an excel file and partially in a caliberrm
database (which holds it's data partially in some more or less valid
html-format).

The tool shall be able to sync the data, but prior to writing back to
caliber, I have to convert it back to html and i was thinking about
simply putting a html & body tag before and at the end of it plus
converting line breaks etc to br's, but I thought maybe there's
something more sophisticated already out there.


Ideally I would be looking for something that gives me valid html code
out of a excel cell's content (like bold, italic etc formating), but I
think that's sort of close to utopia.
 
Joerg,

I got that idea, you can than have a look to office interop, although you
would than have office installed to do that and create a doc, which you than
save as HTML (better to say ugly HTML).

Be aware that those fonts descriptions are the hardest to set in HTML by
hand (program). They need a lot of tags and it is mostly close to an utopia.

:-)

Cor




Cor
 
Cor,

thanks for the reply everything - I'll see what I'll do. Converting via
Word is somewhat impossible, because I am currently handling a couple
thousand requirements and doing a temp. save and import to caliber might
take a little longer than forever. Maybe I just tell the user that
importing to caliber is a no-go.

Thanks and have a great day,
-Joerg :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top