Error: "ran out of resources" running 5,000 row array

A

agbiggs

I'm using an array formula to calculate the Gini coefficient of
columns of data. I'm getting the 'Excel ran out of resources' message
on any array more than 5,000 or so rows, which seems very low to me.
I'm using Excel 2007 and have 3 GB of RAM, so it's not clear where the
problem is.

Does anyone know of a fix for this? Either a way to free more
resources, or to calculate Gini coefficients without using an array
formula?

Thanks!
 
B

Bob I

Only 2 of the GB are available for the User, the rest is for the system.
So you may have to reduce your data set.
 
A

agbiggs

Thanks, Bob. I'm still a little surprised, since 5,000 rows of data
isn't very much -- my guess is by itself it doesn't top 100k.

Andrew
 
B

Bob I

I have no idea what the process you are using required in the way of
temporary storage of partial answers. What did you do before you
attempted to do this in Excel 2007?
 
A

agbiggs

Thanks, Bob.

I'm not totally sure of the answer to your question (i.e., I've used
the gini array to calculate gini's for smaller data sets; alternately,
I can use statistical software like SPSS or Stata, but I'd like to
keep things in Excel if possible). But tere's the array formula I
used:


=AVERAGE(ABS(X-TRANSPOSE(X)))/AVERAGE(X)/2

where X is the array. If I keep to 4,000 or so numbers the formula
works, but anything beyond that and I get the out of resources error.
 
B

Bob I

Thanks, Bob.

I'm not totally sure of the answer to your question (i.e., I've used
the gini array to calculate gini's for smaller data sets; alternately,
I can use statistical software like SPSS or Stata, but I'd like to
keep things in Excel if possible). But tere's the array formula I
used:


=AVERAGE(ABS(X-TRANSPOSE(X)))/AVERAGE(X)/2

where X is the array. If I keep to 4,000 or so numbers the formula
works, but anything beyond that and I get the out of resources error.


Then you have the answer. Everything has a limit, and you have
discovered that somewhere between 4000 and 5000 rows is where it is for
that particular formula.
 
C

Charles Williams

Your formula looks extremely strange to me: I suspect its not actually doing
what you think its doing.
Try using Evaluate formula to debug it on a very small dataset.

The resource problem is almost certainly because your formula is trying to
create a 25 million cell array (5000 rows x the 5000 columns created by the
transpose)


regards
Charles
___________________________________
London Excel Users Conference April 1-2
The Excel Calculation Site
http://www.decisionmodels.com
 
A

agbiggs

Thanks -- I'll try evaluate formula and see what it tells me. It may
be it's simply too much for the program to handle, or that there's a
more efficient way to calculate it (at least resources-wise).

Andrew
 
C

Charles Williams

Andrew,

This is my translation of an array formula for the Gini coefficient from
Wikipedia.

{=(ROWS(Incomes)+1)/(ROWS(Incomes)-1)-2/(ROWS(Incomes)*(ROWS(Incomes)-1)*AVERAGE(Incomes))*SUM(RANK(Incomes,Incomes)*Incomes)}

I don't know if it gives the correct answer since I don't have any test
data. The answer seems to be slightly different but similar to your formula.

I tested it on 30000 rows and there was no resource problem although its
pretty slow on that amount of data because of the RANK function.


--
Charles
___________________________________
London Excel Users Conference April 1-2
The Excel Calculation Site
http://www.decisionmodels.com
 
D

Dana DeLouis

"...on any array more than 5,000 or so..."


Hi. This probably doesn't apply here, but when one mentions "about
5,000" the following kb article comes to mind.

They say this limit doesn't apply with current versions, but it has come
up sometimes where this limit is still around.

XL7: Array Formulas Return #NUM! Error Value
http://support.microsoft.com/kb/132221

"...when you use an array greater than 5458 elements in a function, the
function may return the #NUM! error value."

= = =
HTH
Dana DeLouis
 
D

Dana DeLouis

Thanks -- I'll try evaluate formula and see what it tells me. It may
be it's simply too much for the program to handle, or that there's a
more efficient way to calculate it (at least resources-wise).

Andrew

Hi. I believe a 5,000 by 5,000 array is too large for Excel.
Perhaps a custom function? This is the same as your equation.
However, we ignore the zero main diagonal, and only calculate half the
matrix. We then just multiply by 2. However, this cancels the 2 in the
denominator. This assumes a vertical array of data, such as:
=Gini(A1:A5000)

Function Gini(v)
Dim r As Long
Dim c As Long
Dim n As Long
Dim t, M
With WorksheetFunction
M = .Transpose(v.Cells.Value)
n = v.Cells.Count

For r = 2 To n
For c = 1 To (r - 1)
t = t + Abs(M(r) - M(c))
Next c
Next r
Gini = t / (n * n * .Average(M))
End With
End Function

= = =
HTH :>)
Dana DeLouis
 
D

Dana DeLouis

Just to mention. You can do a quick test via the following.
Enter 1,2,3...n
The solution should equal (n-1) / (3*n)

For example, if A1:A10 have 1,2,...10 (ie n=10)
Then Gini(A1:A10) should equal

=(n - 1)/(3*n)

0.3

= = =
HTH
Dana DeLouis
 
C

Charles Williams

Looking at this again the formula I used (Angus Deaton, 1997) includes a
built-in bias correction factor of N/(N-1) that accounts for the different
results. Your original formula does indeed work for small amounts of data,
but excludes the bias correction factor.

For large datasets you really need a more efficient method: try this UDF
which calculates very fast on substantial data volumes.

Function GiniCoef(InputRange As Range, _
Optional Sorted As Long = 0, Optional BiasCorrect As
Boolean = True) As Variant
'
' Fast Calculation of Gini Coefficent
'
' Using formula by Angus Deaton (Princeton 1997):
' Gini=(N+1)/(N+2)-2/(N*(N-1)*Average(X))*SUM(P(i)*X(i))
' where:
' N= number of rows in the InputRange
' X= the set of values in the inputrange (must be >=0)
' X(i) is the i'th item in InputRange
' P(i) is the rank of X(i) (1=largest value)
'
' Data can be sorted Ascending (-1), Unsorted (0), or sorted Descending (1)
' the Default is unsorted
'
' The Deaton formula contains a Bias correction factor,
' which can be undone if BiasCorrect=False
'
' Charles Williams 2 Jan 2009
'
Dim InputValues As Variant
Dim j As Long
Dim jRank As Long
Dim dObs As Double
Dim dGini As Double

On Error GoTo Fail
InputValues = InputRange.Value2

dObs = UBound(InputValues)

If Sorted = 0 Then QSort InputValues, 1, CLng(dObs)

For j = 1 To dObs
If Sorted = 1 Then
jRank = j
Else
jRank = dObs - j + 1
End If
dGini = dGini + InputValues(j, 1) * jRank
Next j

GiniCoef = (dObs + 1) / (dObs - 1) - dGini * 2# / (dObs * (dObs - 1) * _
Application.WorksheetFunction.Average(InputRange))

If Not BiasCorrect Then GiniCoef = GiniCoef * (dObs - 1) / dObs
Exit Function
Fail:
GiniCoef = CVErr(xlErrValue)
End Function

Sub QSort(InputValues As Variant, jStart As Long, jEnd As Long)
Dim jStart2 As Long
Dim jEnd2 As Long
Dim v1 As Variant
Dim v2 As Variant

jStart2 = jStart
jEnd2 = jEnd
v1 = InputValues((jStart + jEnd) \ 2, 1)
While jStart2 < jEnd2
While InputValues(jStart2, 1) < v1 And jStart2 < jEnd
jStart2 = jStart2 + 1
Wend
While InputValues(jEnd2, 1) > v1 And jEnd2 > jStart
jEnd2 = jEnd2 - 1
Wend
If jStart2 < jEnd2 Then
v2 = InputValues(jStart2, 1)
InputValues(jStart2, 1) = InputValues(jEnd2, 1)
InputValues(jEnd2, 1) = v2
End If
If jStart2 <= jEnd2 Then
jStart2 = jStart2 + 1
jEnd2 = jEnd2 - 1
End If
Wend
If jEnd2 > jStart Then QSort InputValues, jStart, jEnd2
If jStart2 < jEnd Then QSort InputValues, jStart2, jEnd

End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top