UTF8 encoding

S

shapper

Hello,

I am saving different types (image, plain text, html text, sound and
pdf) of content into a database in Byte[] format.
Is UTF8 a correct encoding for all these content types?

Thanks,
Miguel
 
P

Patrice

Hello,
I am saving different types (image, plain text, html text, sound and
pdf) of content into a database in Byte[] format.
Is UTF8 a correct encoding for all these content types?

UTF-8 is a way to encode unicode characters. You should just store the
bytes to your db without any further encoding (if needed it should have been
encoded earlier, stricly speaking you don't have anything else to do to
store the content than getting it and saving it unchanged in the db). If you
tried something that doesn't work please be explicit on the problem you
get...

Could it be that you save the content in a varchar or text column ? With
SQL Server 2005 or later varbinary(max) is likely the preferred datatype for
blob data.

Another well known option is to store the data outisde of the db in the
filesystem and store its location inside the db. SQL Server 2008 has also
some support for doing this transparently...

Some details could perhaps help to better understand the exact point on
which you need help.
 
S

shapper

Hello,

On a C# web application I have a few global resources that I need to
save on the database.

Examples:
- Welcome Text (Plain Text)
- Contact Text (Html Text)
- Logo Image (Image JPEG)
- About the compay file (PDF file)
- Catalog Ambient Sound (MP3 File)

These are isolated elements that are used around the web application.
So I would like to have a way to save all them on the same SQL table
or in a XML file for small projects.

For Plain Text and Html Text I can read the bytes in it.
The other files I think in my C# code I will easily.

When saving in a XML file I think I need to save the byte[]
representation to Base64String.

In both cases, SQL and XML, I will have a column with the Mime Type of
the content.

In case of the XML I will never have more than 20 content elements.
In case of the SQL I can have around 1000 element max.

I do know about file stream. And I do use it for saving files in a
Documents table.

But in this case the content itself can be a file, or plain text or
plain html, etc.
Can I save to file stream a plain text or plain html text that was
converted to byte[]?

Does this make sense?

Thanks,
Miguel
 
P

Patrice

But in this case the content itself can be a file, or plain text or
plain html, etc.
Can I save to file stream a plain text or plain html text that was
converted to byte[]?

Sure the key point is that the encoding is not something you deal with when
you store those data. This has been dealt earlier ie. if you save a sound, a
word document, or an UTF-8 encoded HTML file you'll just get this content as
bytes and will save those bytes unchanged to the db...
 
S

shapper

Sure the key point is that the encoding is not something you deal with when
you store those data. This has been dealt earlier ie. if you save a sound, a
word document, or an UTF-8 encoded HTML file you'll just get this contentas
bytes and will save those bytes unchanged to the db...

True, but if I need to get the from a XML file where it was saved
before by converting the Byte[] to Base64String, don't I need to use:

Byte[] Content = Encoding.UTF8.GetBytes(MyContent)

If Content is a file I use the Byte as it is in my C# application.
If Content is plain text or html text then can I use MyContent
directly?

Thanks,
Miguel
 
P

Patrice

True, but if I need to get the from a XML file where it was saved
before by converting the Byte[] to Base64String, don't I need to use:
Byte[] Content = Encoding.UTF8.GetBytes(MyContent)

No once base64 data are decoded you have the same content that was stored
(ie. it is already encoded the same way if you stored encoded data).

For example when you store a file on disk, the disk doesn't care what is is.
It just take the bytes and save them. "Encoding" is just a convention to
represent unicode characters and is needed when you change this convention
(ie. the HTML document you stored is encoded using a method and you want to
display it using another encoding convention).

Do you have problems if you just read back your data ?
 
P

Patrice

I was doing something else and I believe I suddenly could have understood
your issue. Do you mean that the problem is when you convert back the byte
array to a string ?

Also .NET uses UTF-16. Is this a web app ? Usually the conversion happens
when data are written to the Responsse output stream depending on the coding
defined in the web.config file...

If it still not that doing a small sample so that we can understand what is
the issue you currently have is likely best (I assume you do have some
problem currently ? If not try first the soimplest option and see if you
have an issue so that we can start from there)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top