Very large arrays and .NET classes

P

Peter Duniho

This is kind of a question about C# and kind of one about the framework.
Hopefully, there's an answer in there somewhere. :)

I'm curious about the status of 32-bit vs 64-bit in C# and the framework
classes. The specific example I'm running into is with respect to byte
arrays and the BitConverter class. In C# you can create arrays larger than
2^32, using the overloaded methods that take 64-bit parameters. But as near
as I can tell, the BitConverter class can only address up to a 32-bit offset
within the array.

I see similar issues in other areas. The actual framework classes (of which
the Array class itself is one, if I understand things correctly, and thus an
exception to this generality) don't all seem to provide full 64-bit support,
even though the C# language does (through specific overloads to .NET classes
that form built-in language elements).

I suppose one workaround in this example would be to copy the interesting
parts of the array to a smaller one that can be indexed by BitConverter with
its 32-bit parameters. But not every situation is resolvable with such a
simple workaround. For example, if one is displaying an array in a
scrollable control and wants to set the scrollbar to something in the same
order of magnitude as the array length itself, this is not possible because
the scrollbar controls use only 32-bit values.

Am I missing something? Is there a general paradigm that addresses these
sorts of gaps between things that can be 64-bit and things that cannot? Or
is this just par for the course with respect to being in a transition period
between the "old" 32-bit world and the "new" 64-bit world?

Having already made it through the transitions from 8-bit to 16-bit, and
from 16-bit to 32-bit, I guess I was sort of hoping we'd have learned our
lesson and gotten a little better at this. But I'm worried that's not the
case. I'm hopeful someone can reassure me. :)

Thanks,
Pete
 
?

=?ISO-8859-2?Q?Marcin_Grz=EAbski?=

Hi Peter,

Peter Duniho napisa³(a):
This is kind of a question about C# and kind of one about the framework.
Hopefully, there's an answer in there somewhere. :)

I'm curious about the status of 32-bit vs 64-bit in C# and the framework
classes. The specific example I'm running into is with respect to byte
arrays and the BitConverter class. In C# you can create arrays larger than
2^32, using the overloaded methods that take 64-bit parameters. But as near
as I can tell, the BitConverter class can only address up to a 32-bit offset
within the array.

Single 32-bit BitConverter, can adress 32-bit.
Use two BitConverters to adress 64 of bits.
byte[][];
somethig like: new byte[2^32][2^32];
I see similar issues in other areas. The actual framework classes (of which
the Array class itself is one, if I understand things correctly, and thus an
exception to this generality) don't all seem to provide full 64-bit support,
even though the C# language does (through specific overloads to .NET classes
that form built-in language elements).

I suppose one workaround in this example would be to copy the interesting
parts of the array to a smaller one that can be indexed by BitConverter with
its 32-bit parameters. But not every situation is resolvable with such a
simple workaround. For example, if one is displaying an array in a
scrollable control and wants to set the scrollbar to something in the same
order of magnitude as the array length itself, this is not possible because
the scrollbar controls use only 32-bit values.

OK.
But how can you imagine the UI that works with more than million
of lines? I can not imagine that thing.
Reasonable solution is to divide those data on:
1. MOST SIGNIFICANT
2. LESS SIGNIFICANT
And then you can provide the useful UI.

with regards
Marcin
 
W

Willy Denoyette [MVP]

The current versions of the CLR (all platforms 32/64 bit) limits the size of
all possible objects to ~2GByte anyway. That means that you won't be able to
create an array larger than ~2GB, whatever the language.

Willy.

| This is kind of a question about C# and kind of one about the framework.
| Hopefully, there's an answer in there somewhere. :)
|
| I'm curious about the status of 32-bit vs 64-bit in C# and the framework
| classes. The specific example I'm running into is with respect to byte
| arrays and the BitConverter class. In C# you can create arrays larger
than
| 2^32, using the overloaded methods that take 64-bit parameters. But as
near
| as I can tell, the BitConverter class can only address up to a 32-bit
offset
| within the array.
|
| I see similar issues in other areas. The actual framework classes (of
which
| the Array class itself is one, if I understand things correctly, and thus
an
| exception to this generality) don't all seem to provide full 64-bit
support,
| even though the C# language does (through specific overloads to .NET
classes
| that form built-in language elements).
|
| I suppose one workaround in this example would be to copy the interesting
| parts of the array to a smaller one that can be indexed by BitConverter
with
| its 32-bit parameters. But not every situation is resolvable with such a
| simple workaround. For example, if one is displaying an array in a
| scrollable control and wants to set the scrollbar to something in the same
| order of magnitude as the array length itself, this is not possible
because
| the scrollbar controls use only 32-bit values.
|
| Am I missing something? Is there a general paradigm that addresses these
| sorts of gaps between things that can be 64-bit and things that cannot?
Or
| is this just par for the course with respect to being in a transition
period
| between the "old" 32-bit world and the "new" 64-bit world?
|
| Having already made it through the transitions from 8-bit to 16-bit, and
| from 16-bit to 32-bit, I guess I was sort of hoping we'd have learned our
| lesson and gotten a little better at this. But I'm worried that's not the
| case. I'm hopeful someone can reassure me. :)
|
| Thanks,
| Pete
|
|
 
P

Peter Duniho

Marcin Grzêbski said:
Single 32-bit BitConverter, can adress 32-bit.
Use two BitConverters to adress 64 of bits.
byte[][];
somethig like: new byte[2^32][2^32];

Thanks. I'm not entirely convinced that's better than just temporarily
copying the interesting bytes when needed, since it requires a global change
to the data structure, rather than a local workaround.
OK.
But how can you imagine the UI that works with more than million
of lines? I can not imagine that thing.

Seems to me it works the same as the UI that works with less than a million
lines. Except there's more lines.
Reasonable solution is to divide those data on:
1. MOST SIGNIFICANT
2. LESS SIGNIFICANT
And then you can provide the useful UI.

The data doesn't usefully break down that way. It's true that I could
create two scrollable controls, one to allow the user to navigate 32-bit
"pages" and the other to allow the user to navigate within those "pages".
But that seems to me to be at least as sloppy a workaround as changing an
entire data structure globally just to address some one- or
two-lines-of-code problem. The user interface ought to reflect to the
*user* the abstract view of the data that exists within the data, not some
arbitrary view of the data dictated by limitations of the underlying
architecture. The user should not have to concern himself with the
underlying architecture at all.

I admit that it does appear in this case that there may be no way to
insulate the user from those issues, but I don't agree that that's a
desirable solution.

I guess what I'm hearing is that, no... .NET does not successfully address
the transition from 32-bit to 64-bit code, and that for the time being I
need to consider it 32-bit-only, even though some portions of it imply
64-bit capability.

Thank you very much for the answers.

Pete
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

I think that what Marcin meant is that if you are even close to reaching
the limit of the scrollbar, your UI badly needs a change.

If you present a list with a million items to the user, it's like saying
"I don't want you to use this program".
 
P

Peter Duniho

Göran Andersson said:
I think that what Marcin meant is that if you are even close to reaching
the limit of the scrollbar, your UI badly needs a change.

That really depends on what data your UI is presenting.
If you present a list with a million items to the user, it's like saying
"I don't want you to use this program".

If your user has what is essentially a single list of a million items and
you don't present that list that way to the user, it's like saying "I want
you, the user, to conform your idea of your data to something I can code".

For example, suppose the user interface is for a view onto a stream of bytes
(a very large file, for example). Why should the user be expected to
mentally imagine his data as separate pages of portions of the file, when in
fact the entire file is one long stream of bytes? How do you avoid having
the user interface impose artificial, arbitrary boundaries on the data?
Suppose you have broken the user's data into 2GB units, and the user wants
to view a portion of that data that straddles the arbitrarily assigned
boundary at 2GB? One solution to that is to have a sliding 2GB window onto
the data, but then you're no longer able to present 64-bits worth of address
space to the user (or you have to add yet another layer of paging). In
either case, these are not user-friendly answers to the question.

As another example, consider a video file that has, say, 4 million fields
(about 18 hours of 30fps interlaced video). You can store that on modern
hardware. Why should a user expect to have trouble viewing such data on
modern hardware?

I am not suggesting the user will wind up reviewing each and every byte of a
10GB file or every frame of an 18 hour stream of video. But if one is to
create an application to allow the user to do anything with that data and
wants to present a unit-oriented view onto that data, the user interface
will inherently need to support a "document" of the same order of magnitude
as the units in the data. IMHO, it's a bit arrogant for a person to assume
that there is absolutely no reason a user would ever want a UI that can deal
with a large number of units (whatever those units may be).

I do appreciate the feedback, but I frankly think people are spending too
many cycles second-guessing my needs, and not enough actually answering the
question I asked.

That said, I do believe I've gotten enough feedback to understand that .NET
is still basically a 32-bit API, and that it's not a good idea to expect
64-bit support in the near future. Any application that involves itself
with 64-bit data will have to superimpose its own solution on top of the
32-bit environment .NET presents, just as has always been necessary in
32-bit Windows.

And to those who have offered feedback along those lines, I thank you.

Pete
 
G

GS

are you working large matrix transformation? Ordinary business application
would not need large arrays in memory, rather they would have database
manipulation.
if you do need large array, if you will need more than 54 bit clr. you will
also need 64 bit OS and PC with lots or real RAM.

It is also likely you will be among a few leaders using 64 bit computing on
PC. translation: work and research and possible bugs/ unexpected features.
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Peter said:
That really depends on what data your UI is presenting.


If your user has what is essentially a single list of a million items and
you don't present that list that way to the user, it's like saying "I want
you, the user, to conform your idea of your data to something I can code".

If you present the data as a horrificly long list, it's like saying
"This is what you get, as I am too lazy to create a user interface that
is usable". ;)
For example, suppose the user interface is for a view onto a stream of bytes
(a very large file, for example). Why should the user be expected to
mentally imagine his data as separate pages of portions of the file, when in
fact the entire file is one long stream of bytes?

Just because you don't display all the data at once, it doesn't need to
be separated into pages.
How do you avoid having
the user interface impose artificial, arbitrary boundaries on the data?

Eh.... just don't?
Suppose you have broken the user's data into 2GB units, and the user wants
to view a portion of that data that straddles the arbitrarily assigned
boundary at 2GB? One solution to that is to have a sliding 2GB window onto
the data, but then you're no longer able to present 64-bits worth of address
space to the user (or you have to add yet another layer of paging). In
either case, these are not user-friendly answers to the question.

Of course there is. There is no reason to display more data than fits on
the screen at once, as the user can't see it anyway. That doesn't mean
that you have to use large sliding windows or layered paging.
As another example, consider a video file that has, say, 4 million fields
(about 18 hours of 30fps interlaced video). You can store that on modern
hardware. Why should a user expect to have trouble viewing such data on
modern hardware?

And you think that displaying it all at once is not troublesome?
I am not suggesting the user will wind up reviewing each and every byte of a
10GB file or every frame of an 18 hour stream of video. But if one is to
create an application to allow the user to do anything with that data and
wants to present a unit-oriented view onto that data, the user interface
will inherently need to support a "document" of the same order of magnitude
as the units in the data. IMHO, it's a bit arrogant for a person to assume
that there is absolutely no reason a user would ever want a UI that can deal
with a large number of units (whatever those units may be).

Of course the user interface should be able to handle a large number of
units, but there is no reason that a single list should handle it all,
as the user can't handle it all anyway.
I do appreciate the feedback, but I frankly think people are spending too
many cycles second-guessing my needs, and not enough actually answering the
question I asked.

Of course I have to second guess your needs, as you haven't specified them.

It's quite common in message boards to make assumptions about what the
OP is really needing, or what the OP really should have asked, as many
people don't know what to ask for, what information to provide, or
sometimes even to ask a question...
 
P

Peter Duniho

Göran Andersson said:
If you present the data as a horrificly long list, it's like saying "This
is what you get, as I am too lazy to create a user interface that is
usable". ;)

It's not like saying that at all.
[...]
How do you avoid having the user interface impose artificial, arbitrary
boundaries on the data?

Eh.... just don't?

You are saying that the user should not see the data as the single
contiguous collection of units that it is, but that one should also not
break up the data into smaller collections of units.

Surely as someone accustomed to writing software, you understand the logical
contradiction here. Right?
[...]
Of course there is. There is no reason to display more data than fits on
the screen at once, as the user can't see it anyway. That doesn't mean
that you have to use large sliding windows or layered paging.

What does it mean then? What other UI do you propose for the purpose of
presenting to the user data which is inherently a single contiguous
collection of millions of units? Assuming that the user is to have complete
and immediate access to any portion of his data, and that this access should
conform closely to the user's own mental idea of the data, what user
interface that doesn't involve "large sliding windows or layered paging" do
you suggest?
And you think that displaying it all at once is not troublesome?

You need to define what you mean by "displaying it all at once". I am not
suggesting that every unit of data should be present on the screen
simultaneously.

As far as the question of allowing the user direct access "all at once" to
the entire data stream, no...it is not at all troublesome. It is in fact
what the user typically expects. Every single one of the commonly used
video editing programs does exactly this, and no one in the industry seems
to think it's a problem.
Of course the user interface should be able to handle a large number of
units, but there is no reason that a single list should handle it all, as
the user can't handle it all anyway.

First, you underestimate the user as well as the nature of a single-list
paradigm for a user interface. Second, how many lists do you suggest? How
do you suggest that the user be forced to navigate amongst these lists? And
how do you suggest implementing a multiple list scenario in which artificial
boundaries are not imposed on the data?

You keep making statements about what should NOT happen, but you have yet to
suggest what SHOULD happen, and your claims of what should not happen
contradict each other.
Of course I have to second guess your needs, as you haven't specified
them.

Baloney. You have no need to second-guess my needs, as I'm not asking a
question about those needs. The question I asked was quite specific, and
it's arrogant and insulting of you to make your own assumptions about what
help I need.

And frankly, it seems to me that you are more interested in your own ego
than in actually helping. A person who wants to help would suggest an
alternative, rather than invest all of their time denigrating the other
person's ideas. Maybe that makes you feel better about yourself, but it's
not helpful to anyone else and least of all to me.
It's quite common in message boards to make assumptions about what the OP
is really needing, or what the OP really should have asked, as many people
don't know what to ask for, what information to provide, or sometimes even
to ask a question...

Thankfully, it is NOT "quite common", and especially not when the original
question was very clear and to the point. Only people who cannot understand
that they don't have the big picture, nor are they invited to have the big
picture, insist on making assumptions, and as a result offer completely
inadequate, misleading, and insulting advice.

Pete
 
M

Marc Gravell

On assumptions:
IMO it is very natural and reasonable to start trying to make (and
state) assumptions about what you are trying to do, as your proposed
solution is not, itself, "normal" practice. Therefore a better solution
may be in order, hence the asusmptions about what you are trying to do.

On data size:
I agree that it is entirely natural to let the UI provide access to
this data, but that doesn't mean you have to load it all at once. I
would be upset if my player loaded an entire DVD into RAM before it
started playing... Perhaps you need to look at virtual mode for the
lists? And keep the data on disk, loading a chunk into a buffer as
needed, and handling the logical offset in the wrapper.
The object model (API) just has to provide access to the data; it
doesn't *necessarily* have to have it all "on hand". Lazy loading and
seamless chunk switching could be your friend here.

Marc
 
P

Peter Duniho

Marc Gravell said:
On assumptions:
IMO it is very natural and reasonable to start trying to make (and
state) assumptions about what you are trying to do, as your proposed
solution is not, itself, "normal" practice. Therefore a better solution
may be in order, hence the asusmptions about what you are trying to do.

Be that as it may, I wasn't asking about solutions to what I'm trying to do,
nor about solutions to the examples I presented in my post. I was asking
specifically about the state of 64-bit support in .NET.
On data size:
I agree that it is entirely natural to let the UI provide access to
this data, but that doesn't mean you have to load it all at once.

I do agree with that, generally speaking. But in the end, it really depends
on the nature of the computer system in use and the source of the data. If
the data is not currently present on disk, and the computer system in
question can easily hold the data (in memory, in the swap file, or some
combination thereof), then a byte array containing the data may well be
appropriate.

As near as I can tell, .NET does not offer support for memory-mapped files,
which is yet another reason one might find it more practical to maintain an
in-memory array rather than windowing the data from a file. As much data as
needs to be cached on the disk will be anyway (through the virtual memory
manager), and it allows the application to access the data in a more
convenient method (as well as allows the OS to optimize access to the data
as best it can).

If one could categorically state that one should never have more than X
elements in an array (say, 2 billion), then there should be no reason for
the Array class to even have the 64-bit LongLength property, nor should
there be any way to create an array larger than X (say, 2 billion) elements.

And of course, as with the other aspects of these diversions, the array
issue is a red herring. Even if we assume a different solution in which the
data is not resident in virtual memory all at once, there is still the
question of presenting the data to the user, which is what the scroll bar
issue is relevant to.

But in either case, those are not what I was asking about specifically.
They were just examples to help frame my question (those sentences in my
post that were followed by question marks).
[...] Perhaps you need to look at virtual mode for the
lists? And keep the data on disk, loading a chunk into a buffer as
needed, and handling the logical offset in the wrapper.
The object model (API) just has to provide access to the data; it
doesn't *necessarily* have to have it all "on hand". Lazy loading and
seamless chunk switching could be your friend here.

Again, these are not helpful assumptions. I do not have a specific issue in
mind that I don't have a solution for, and offering these tangential
"suggestions" just distracts from the original question I asked.
Ironically, these extra presumptuous suggestions don't even address the
question of 32-bit versus 64-bit, but instead treat any large collection as
potentially "bad for the user", even if it would be supported under Win32.

I realize that there's no reason for anyone here to assume I'm knowledgeable
and experienced enough to work around whatever issues I run into, but I'll
remind folks that there's also no reason for anyone to assume that I'm
*not*. IMHO, it would be better if respondents would give the benefit of
the doubt first, rather than assuming the worst.

Thanks,
Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top