Serialize/marshal/reverse engineer unknown structure

P

PJC

Is there a way to deserialize or marshal or somehow parse a byte array
back into a structure when you don't know what that structure was in
the first place? The structure probably came from C++.

Some background: I have a flight simulator for R/C planes and I'm
trying to figure out if I can automate it. There is no API. I know how
to automate the input. I'm trying to get at the output of the program.
(flight dynamics of the plane, etc)

The simulator has a multi-player function so I know that it has to
pass the exact info I'm looking for over the network. It's built on
DirectX 9 and uses DirectPlay (deprecated gaming network protocol) for
multi-player communication. My *guess* is the simulator itself is
written in C++.

So, I can actually connect to the program and have gotten a message
with 13 bytes. Great. Now what.

In general, how would one reverse-engineer something like this?
 
A

Arne Vajhøj

PJC said:
Is there a way to deserialize or marshal or somehow parse a byte array
back into a structure when you don't know what that structure was in
the first place? The structure probably came from C++.

Some background: I have a flight simulator for R/C planes and I'm
trying to figure out if I can automate it. There is no API. I know how
to automate the input. I'm trying to get at the output of the program.
(flight dynamics of the plane, etc)

The simulator has a multi-player function so I know that it has to
pass the exact info I'm looking for over the network. It's built on
DirectX 9 and uses DirectPlay (deprecated gaming network protocol) for
multi-player communication. My *guess* is the simulator itself is
written in C++.

So, I can actually connect to the program and have gotten a message
with 13 bytes. Great. Now what.

In general, how would one reverse-engineer something like this?

If it does not contain meta data (which it sounds as if it does not),
then NO.

Arne
 
H

Heandel

PJC said:
Is there a way to deserialize or marshal or somehow parse a byte array
back into a structure when you don't know what that structure was in
the first place? The structure probably came from C++.

Some background: I have a flight simulator for R/C planes and I'm
trying to figure out if I can automate it. There is no API. I know how
to automate the input. I'm trying to get at the output of the program.
(flight dynamics of the plane, etc)

The simulator has a multi-player function so I know that it has to
pass the exact info I'm looking for over the network. It's built on
DirectX 9 and uses DirectPlay (deprecated gaming network protocol) for
multi-player communication. My *guess* is the simulator itself is
written in C++.

So, I can actually connect to the program and have gotten a message
with 13 bytes. Great. Now what.

In general, how would one reverse-engineer something like this?


It is very hard to do, you have to guess how it works. People who develop
emulators or private servers do the same.
 
P

Peter Duniho

PJC said:
Is there a way to deserialize or marshal or somehow parse a byte array
back into a structure when you don't know what that structure was in
the first place? The structure probably came from C++.
[...]
So, I can actually connect to the program and have gotten a message
with 13 bytes. Great. Now what.
In general, how would one reverse-engineer something like this?

If it does not contain meta data (which it sounds as if it does not),
then NO.

Well, that's not strictly true. People reverse-engineer undocumented,
unadorned data and code all the time.

But it definitely is a LOT more work (it's basically a lot of trial and
error), and certainly isn't on-topic for this newsgroup in any case.

Pete
 
A

Arne Vajhøj

Peter said:
PJC said:
Is there a way to deserialize or marshal or somehow parse a byte array
back into a structure when you don't know what that structure was in
the first place? The structure probably came from C++.
[...]
So, I can actually connect to the program and have gotten a message
with 13 bytes. Great. Now what.
In general, how would one reverse-engineer something like this?

If it does not contain meta data (which it sounds as if it does not),
then NO.

Well, that's not strictly true. People reverse-engineer undocumented,
unadorned data and code all the time.

But it definitely is a LOT more work (it's basically a lot of trial and
error),

True.

So let me correct the "NO" to "There is nothing in C#/.NET (or any other
language/platform for that matter) to help you".

If experimentation can reveal the structure used, then it can obviously
be implemented in C#/.NET (or any other language/platform for that
matter).

Arne
 
M

Mike Schilling

Arne said:
Peter said:
PJC wrote:
Is there a way to deserialize or marshal or somehow parse a byte
array back into a structure when you don't know what that
structure was in the first place? The structure probably came
from
C++. [...]
So, I can actually connect to the program and have gotten a
message with 13 bytes. Great. Now what.
In general, how would one reverse-engineer something like this?

If it does not contain meta data (which it sounds as if it does
not), then NO.

Well, that's not strictly true. People reverse-engineer
undocumented, unadorned data and code all the time.

But it definitely is a LOT more work (it's basically a lot of trial
and error),

True.

So let me correct the "NO" to "There is nothing in C#/.NET (or any
other language/platform for that matter) to help you".

If experimentation can reveal the structure used, then it can
obviously be implemented in C#/.NET (or any other language/platform
for that matter).


Let me ask this tangentially question (which I should probably know
the answer to, but don't.)

In C or C++, I can fill a structure with a single I/O call, e.g

struct point
{
int x;
int y;
} p;

read(fd, &p, sizeof(p));

I can almost do it portably, though in more complex examples padding
becomes an issue. So once the problem of "What are the fields in
this message?" is solved, all that's required is to define a struct
that reflects it.

In Java nothing this simple is possible. The layout of fields in an
object can't be discussed; even their order is undefined. The
corresponding code looks like:

void read(DataInputStream strm) throws IOException
{
x = strm.readInt();
y = strm.readInt();
}

After determining what the fields are, I need both to add them to the
class and to write the read method. (If there's padding I need to
code that in explicitly too.)

I know that I can write Java-like code in C# using BinaryReader. Can
I also write something C-like?
 
P

Peter Duniho

Let me ask this tangentially question (which I should probably know
the answer to, but don't.)

In C or C++, I can fill a structure with a single I/O call, e.g

struct point
{
int x;
int y;
} p;

read(fd, &p, sizeof(p));

[...]
I know that I can write Java-like code in C# using BinaryReader. Can
I also write something C-like?

Sort of, but it requires unsafe code as far as I know. You can specify
the exact structure layout for C# structs, and then use unsafe code to
copy a byte array into a struct instance.

But it begs the question as to why one would bother. There are a few
issues that I see, all of which are somewhat moot in the context of a
managed code environment like Java or C#:

-- Ease of writing the i/o formatting code. Okay, sure...you can get
away with just a blt. But really, is it that hard to enumerate the
serialized fields and write/read them explicitly? Besides, if you wind up
changing the data structure, there's a good chance the maintenance
headache will be just as much a hassle, since you still have to deal with
versioning. And in any case, if what you want is easy, then just make
sure you're serializing public properties and use the built-in .NET stuff.

-- Performance. But, i/o is already slow and hasn't gotten faster at
nearly the same pace as processing power. So the idea that manually
extracting fields or properties is a cost that needs to be avoided isn't
really true. You can easily afford to write/read the data stream one
field or property at a time, without having any real effect on
throughput. (Note that you can still get the data with a single i/o
call...it's the parsing of the bytes that requires extra overhead).

-- Byte ordering. Okay, so...AFAIK the .NET implementation is
little-endian everywhere. But I don't think it _has_ to be. And once you
start having to deal with endian issues, you're back to dealing with
individual fields or properties anyway.

-- The whole point of managed code is to abstract away these
implementation-specific details. So why saddle yourself with
implementation-specific details for some small subset of your program?

-- Storage/transmission costs. Just as processing power has advanced
so much that manipulating the data once it's at the CPU really isn't a big
deal, so too has storage and transmission bandwidth. Not as rapidly as
CPU power, granted (see point #2 :) ), but still by quite a lot. So
priorities have changed somewhat, I feel. Lots of data protocols
(transmission or storage) are no longer binary, but instead some kind of
human-readable format (e.g. XML). Heck, even in the olden days, text
files weren't uncommon. Binary files still have their use, but I'd guess
in a managed code environment, it's a much lower priority or even
counter-productive in many cases.

In other words, even in Java, I'm not convinced the language and framework
would benefit much from supporting a C-style binary-copy approach to
struct i/o. In C#, you can do it if you really want to, but we're not
likely to get anything fancier than what we've already got, given how it's
counter to the general philosophies of the environment, and how relatively
infrequently it would likely be used.

Pete
 
M

Mike Schilling

Peter said:
Sort of, but it requires unsafe code as far as I know. You can
specify the exact structure layout for C# structs, and then use
unsafe code to copy a byte array into a struct instance.

But it begs the question as to why one would bother. There are a few
issues that I see, all of which are somewhat moot in the context of a
managed code environment like Java or C#:

-- Ease of writing the i/o formatting code. Okay, sure...you can
get away with just a blt. But really, is it that hard to enumerate
the serialized fields and write/read them explicitly? Besides, if
you wind up changing the data structure, there's a good chance the
maintenance headache will be just as much a hassle, since you still
have to deal with versioning. And in any case, if what you want is
easy, then just make sure you're serializing public properties and
use the built-in .NET stuff.

Explicitly for a case like this. There's an existing format (in memory or
on disk), and I want a simple and easily modifiable way to code my current
guess at it.
 
A

Arne Vajhøj

Mike said:
Arne said:
Peter said:
PJC wrote:
Is there a way to deserialize or marshal or somehow parse a byte
array back into a structure when you don't know what that
structure was in the first place? The structure probably came
from
C++. [...]
So, I can actually connect to the program and have gotten a
message with 13 bytes. Great. Now what.
In general, how would one reverse-engineer something like this?
If it does not contain meta data (which it sounds as if it does
not), then NO.
Well, that's not strictly true. People reverse-engineer
undocumented, unadorned data and code all the time.

But it definitely is a LOT more work (it's basically a lot of trial
and error),
True.

So let me correct the "NO" to "There is nothing in C#/.NET (or any
other language/platform for that matter) to help you".

If experimentation can reveal the structure used, then it can
obviously be implemented in C#/.NET (or any other language/platform
for that matter).


Let me ask this tangentially question (which I should probably know
the answer to, but don't.)

In C or C++, I can fill a structure with a single I/O call, e.g

struct point
{
int x;
int y;
} p;

read(fd, &p, sizeof(p));

I can almost do it portably, though in more complex examples padding
becomes an issue. So once the problem of "What are the fields in
this message?" is solved, all that's required is to define a struct
that reflects it.

In Java nothing this simple is possible. The layout of fields in an
object can't be discussed; even their order is undefined. The
corresponding code looks like:

void read(DataInputStream strm) throws IOException
{
x = strm.readInt();
y = strm.readInt();
}

After determining what the fields are, I need both to add them to the
class and to write the read method. (If there's padding I need to
code that in explicitly too.)

I know that I can write Java-like code in C# using BinaryReader. Can
I also write something C-like?

..NET has some options that Java does not have for stuff like this.

Demo code:

using System;
using System.Runtime.InteropServices;

namespace E
{
[StructLayout(LayoutKind.Sequential)]
public struct Point
{
public int X;
public int Y;
}
public class Program
{
public static Point ReadPoint(byte[] b)
{
GCHandle h = GCHandle.Alloc(b, GCHandleType.Pinned);
Point res =
(Point)Marshal.PtrToStructure(h.AddrOfPinnedObject(), typeof(Point));
h.Free();
return res;
}
public static void Main(string[] args)
{
byte[] b = { 1, 0, 0, 0, 2, 0, 0, 0 };
Point p = ReadPoint(b);
Console.WriteLine(p.X + " " + p.Y);
Console.ReadKey();
}
}
}

Arne
 
P

PJC

Thank you all for the pointers. I tried something similar to Arne's
demo code above but still can't make heads or tails of the data. I
suppose I've learned the hard way the benefits of XML...
 
Joined
Jan 5, 2015
Messages
1
Reaction score
0
I've actually reverse engineered the network protocol of Microsoft combat flight simulator. I could send you the code if you want.

With my code you can get all information about the planes. I've never used it for anything it was a thesis work.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top