Read text-segment from binary file

K

Klaus Jensen

Hi

I have some binary files (jpeg), which contain a lot of image-data - and
some embedded XML (XMP actually).

If I view the file in a hex-editor, there is a lot of binary data - and then
in the middle of everything:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
[etc etc]

I need to load this file in some kind of reader, search for the <?xml as
start and </plist> as the end of the xml, and return this string for further
processing.

How do I do that?

I tried just loading it into a streamreader to check out what happened, but
I only got garbage from the reader...

Any help will be greatly appreciated :)

Regards

- Klaus
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

You have to read the file as binary data. Then you can extract the part
of it that is the xml file and decode that to a string.
 
K

Klaus Jensen

Göran Andersson said:
You have to read the file as binary data. Then you can extract the part of
it that is the xml file and decode that to a string.

Could you give an example, please? Working with the binary-objects is new to
me.
 
M

Michel Posseth [MCP]

you can just use a stream reader and loop untill you reach the XML from
that point you read all the data in a string untill you read the end of the
XML

Dim sr As New System.IO.StreamReader("C:\afile.bin")

Do Until sr.EndOfStream

Debug.WriteLine(sr.ReadLine)

Loop

note that readline returns a string object

hth

Michel Posseth [MCP]
 
G

GhostInAK

Hello Klaus,

I strongly recommend you read the document at: http://partners.adobe.com/public/developer/en/xmp/sdk/xmpspecification.pdf

Small Excerpt from page 93:
JPEG
In JPEG files, an APP1 marker designates the location of the XMP Packet.
The following table
shows the entry format.

Byte Offset : Field value : Field name : Length(bytes) : Comments
0 : 0xFFE1 : APP1 : 2 : APP1 marker.
2 : 2 + length of namespace (29) + length of XMP Packet : Lp : 2 : Size in
bytes of this count plus the following two portions.
4 : Null-terminated ASCII string without quotation marks. : namespace : 29
: XMP namespace URI, used as unique ID: http://ns.adobe.com/xap/1.0/
33 : < XMP Packet > : : Must be encoded as UTF-8.

The header plus the following data must be less than 64 KB bytes. The XMP
Packet cannot be
split across the multiple markers, so the size of the XMP Packet must be
at most 65502 bytes.



-Boo
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Klaus said:
Could you give an example, please? Working with the binary-objects is new to
me.

If the file is not too large, you could read it all into a byte array.
Then locate the data in the array and create a MemoryStream from that
section of the array, and use a StreamReader to read it.
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

You can't read a binary file using a StreamReader.
you can just use a stream reader and loop untill you reach the XML from
that point you read all the data in a string untill you read the end of the
XML

Dim sr As New System.IO.StreamReader("C:\afile.bin")

Do Until sr.EndOfStream

Debug.WriteLine(sr.ReadLine)

Loop

note that readline returns a string object

hth

Michel Posseth [MCP]

Klaus Jensen said:
Could you give an example, please? Working with the binary-objects is new
to me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top