ASCII and beyond, backup binary data to text file

R

Ryan Liu

Hi,

I want to backup database to a text file. There are binary columns.

I read byte[] from the column and use ASCII encoding convert it to string
and write to a text file.

But there are bytes large than 7F, and I lost 80Hex, e.g. 91Hex will be
11Hex.

Which sigle byte encoding that covers from 0-255 ?

Thanks,
Ryan
 
J

Jon Skeet [C# MVP]

Ryan Liu said:
I want to backup database to a text file. There are binary columns.

I read byte[] from the column and use ASCII encoding convert it to string
and write to a text file.

But there are bytes large than 7F, and I lost 80Hex, e.g. 91Hex will be
11Hex.

Which sigle byte encoding that covers from 0-255 ?

If you need to write arbitrary binary data to a text file, you should
use something like Base64. See Convert.ToBase64String and
Convert.FromBase64String.

You should never treat arbitrary binary data as if it were encoded text
data.
 
R

Ryan Liu

Thanks for the reply, Jon!

But, BTW, What is potential risk to treat arbitrary binary data as if it
were encoded text ?

Thanks,
Ryan

Jon Skeet said:
Ryan Liu said:
I want to backup database to a text file. There are binary columns.

I read byte[] from the column and use ASCII encoding convert it to string
and write to a text file.

But there are bytes large than 7F, and I lost 80Hex, e.g. 91Hex will be
11Hex.

Which sigle byte encoding that covers from 0-255 ?

If you need to write arbitrary binary data to a text file, you should
use something like Base64. See Convert.ToBase64String and
Convert.FromBase64String.

You should never treat arbitrary binary data as if it were encoded text
data.
 
B

Barry Kelly

Ryan Liu said:
Thanks for the reply, Jon!

But, BTW, What is potential risk to treat arbitrary binary data as if it
were encoded text ?

Because some things don't round-trip. Try this on some binary data and
you'll see what happens.

---8<---
using System;
using System.Text;
using System.IO;

static class App
{
static void Main(string[] args)
{
byte[] buffer = new byte[4096];
int offset = 0;
using (Stream input = File.OpenRead(args[0]))
{
for (;;)
{
int read = input.Read(buffer, 0, buffer.Length);
if (read == 0)
break;
byte[] roundTrip = Encoding.UTF8.GetBytes(
Encoding.UTF8.GetString(buffer, 0, read));
for (int i = 0; i < read; ++i)
if (buffer != roundTrip)
{
Console.WriteLine("Round-trip error at {0}.",
offset + i);
return;
}
offset += read;
}
}
}
}
--->8---

Here's what I get when I run it on an executable:

---8<---
../Test Test.exe
Round-trip error at 2.
--->8---

-- Barry
 
B

Barry Kelly

Ryan Liu said:
Thanks for the reply, Jon!

But, BTW, What is potential risk to treat arbitrary binary data as if it
were encoded text ?

A less buggy version showing the problem:

---8<---
using System;
using System.Text;
using System.IO;

static class App
{
static void Main(string[] args)
{
byte[] buffer = new byte[4096];
int offset = 0;
using (Stream input = File.OpenRead(args[0]))
{
for (;;)
{
int read = input.Read(buffer, 0, buffer.Length);
if (read == 0)
break;
byte[] roundTrip = Encoding.UTF8.GetBytes(
Encoding.UTF8.GetString(buffer, 0, read));
if (roundTrip.Length != read)
{
Console.WriteLine("Length mismatch ({0} != {1}).",
roundTrip.Length, read);
return;
}
for (int i = 0; i < read; ++i)
if (buffer != roundTrip)
{
Console.WriteLine("Round-trip error at {0}.",
offset + i);
return;
}
offset += read;
}
}
}
}
--->8---

Basically, many byte sequences aren't valid character encodings, so
decoding from bytes to strings is a lossy operation.

-- Barry
 
K

Kevin Spencer

This is not the same as using Base64 encoding. There is no data loss risk
associated with using Base64 encoding. The following Wikipedia article is a
good starting point to learn about this standard, and includes links to the
RFCs that define the rules for it:

http://en.wikipedia.org/wiki/Base64

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

This is, by definition, not that.

Barry Kelly said:
Ryan Liu said:
Thanks for the reply, Jon!

But, BTW, What is potential risk to treat arbitrary binary data as if it
were encoded text ?

A less buggy version showing the problem:

---8<---
using System;
using System.Text;
using System.IO;

static class App
{
static void Main(string[] args)
{
byte[] buffer = new byte[4096];
int offset = 0;
using (Stream input = File.OpenRead(args[0]))
{
for (;;)
{
int read = input.Read(buffer, 0, buffer.Length);
if (read == 0)
break;
byte[] roundTrip = Encoding.UTF8.GetBytes(
Encoding.UTF8.GetString(buffer, 0, read));
if (roundTrip.Length != read)
{
Console.WriteLine("Length mismatch ({0} != {1}).",
roundTrip.Length, read);
return;
}
for (int i = 0; i < read; ++i)
if (buffer != roundTrip)
{
Console.WriteLine("Round-trip error at {0}.",
offset + i);
return;
}
offset += read;
}
}
}
}
--->8---

Basically, many byte sequences aren't valid character encodings, so
decoding from bytes to strings is a lossy operation.

-- Barry
 
B

Barry Kelly

Kevin Spencer said:
This is not the same as using Base64 encoding. There is no data loss risk
associated with using Base64 encoding. The following Wikipedia article is a
good starting point to learn about this standard, and includes links to the
RFCs that define the rules for it:

http://en.wikipedia.org/wiki/Base64

Of course, and I'm perfectly aware of that. The OP was talking about
text encodings.

-- Barry
 
K

Kevin Spencer

Base64 encodes binary data as text. The OP was asking about how to encode
binary data as text. AFAIK, Base64 is the best way to do this. It is
certainly the most widely-used, and presents no problems.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

This is, by definition, not that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top