Function for missing data in a series?

G

Guest

I am collecting a series of data labelled numerically from 1-1000. I need to
determine which in the series are missing. ie I know I have 750 data elements
out of 1000, so I know there are 250 missing. But I need to detmine which
specific data elements are missing.

Is there a function, or routine to automate this?

Thx!
 
G

Guest

I'm sure there is a more ellegant way to go about this, but I think this will
work for you:

Let's say your data is in A1:A750

In a seperate column, enter the formula:

=IF(ISNA(RANK(ROW(),$A$1:$A$1000)),ROW(),"")

Copy down through row 1000

The displayed results should be your missing data.

HTH,
Elkar
 
G

Guest

Thanks that is what I was looking for.
Not sure how that works, but it works, thanks!
 
E

Epinn

=IF(ISNA(RANK(ROW(),$A$1:$A$1000)),ROW(),"")

Since we don't really care about the rank that RANK( ) returns, I am very curious if MATCH( ) can be used for this purpose as well. If yes, is there a difference between the two in terms of speed if the file is huge? It looks like both RANK and MATCH (FALSE) don't require the data to be sorted and take care of duplicate numbers.

Can someone shed some light on this please? I am trying to learn here.

Epinn

I am collecting a series of data labelled numerically from 1-1000. I need to
determine which in the series are missing. ie I know I have 750 data elements
out of 1000, so I know there are 250 missing. But I need to detmine which
specific data elements are missing.

Is there a function, or routine to automate this?

Thx!
 
B

Biff

Here's another way that returns the missing values in a contiguous block.

Entered as an array using the key combination of CTRL,SHIFT,ENTER (not just
ENTER):

=INDEX(ROW($1:$1000),SMALL(IF(ISNA(MATCH(ROW($1:$1000),A$1:A$750,0)),ROW($1:$1000)),ROW(A1)))

Copy down umtil you get #NUM! errors meaning all the missing values have
been returned. This method works but can be slow on large data sets. Also
note that due to the use of the ROW function the largest number in the
sequence can be 65,536 (in current versions of Excel).

Biff
 
E

Epinn

Biff,

Thanks for the formula. I like the contiguous block. I like RANK too.

I played with the MATCH formula and I took out the INDEX function. It seemed to give identical results. Do you see any problem not having INDEX? Is INDEX a case of good programming technique (i.e. index/match usually go together) that Bob taught me? I am trying to save some time by not doing INDEX, seeing that this is an array formula. ;)

I am curious about what you think.

Appreciate your advice.

Epinn

Here's another way that returns the missing values in a contiguous block.

Entered as an array using the key combination of CTRL,SHIFT,ENTER (not just
ENTER):

=INDEX(ROW($1:$1000),SMALL(IF(ISNA(MATCH(ROW($1:$1000),A$1:A$750,0)),ROW($1:$1000)),ROW(A1)))

Copy down umtil you get #NUM! errors meaning all the missing values have
been returned. This method works but can be slow on large data sets. Also
note that due to the use of the ROW function the largest number in the
sequence can be 65,536 (in current versions of Excel).

Biff
 
E

Epinn

Forgot to paste the array formula that I experimented with. A smaller range for testing ......

=SMALL(IF(ISNA(MATCH(ROW($1:$20),A$1:A$10,0)),ROW($1:$20)),ROW(A1))

Please understand that my only purpose is to learn and sometimes I have a unique way of seeing things.

Epinn

Biff,

Thanks for the formula. I like the contiguous block. I like RANK too.

I played with the MATCH formula and I took out the INDEX function. It seemed to give identical results. Do you see any problem not having INDEX? Is INDEX a case of good programming technique (i.e. index/match usually go together) that Bob taught me? I am trying to save some time by not doing INDEX, seeing that this is an array formula. ;)

I am curious about what you think.

Appreciate your advice.

Epinn

Here's another way that returns the missing values in a contiguous block.

Entered as an array using the key combination of CTRL,SHIFT,ENTER (not just
ENTER):

=INDEX(ROW($1:$1000),SMALL(IF(ISNA(MATCH(ROW($1:$1000),A$1:A$750,0)),ROW($1:$1000)),ROW(A1)))

Copy down umtil you get #NUM! errors meaning all the missing values have
been returned. This method works but can be slow on large data sets. Also
note that due to the use of the ROW function the largest number in the
sequence can be 65,536 (in current versions of Excel).

Biff
 
B

Biff

Is INDEX a case of good programming technique (i.e. index/match usually go
together)

Hmmm.....

It may be just a force of habit (in my case). Removing Index does seem to
work (though not exhaustively tested). Removing Index didn't improve
performance, though. Actually, I used to use this:

=INDEX(ROW($1:$1000),SMALL(IF(COUNTIF(A$1:A$1000,ROW(A$1:A$1000))=0,ROW($1:$1000)),ROW(A1)))

But replacing Countif with Match improves performance significantly on large
sequences.

Biff

Biff,

Thanks for the formula. I like the contiguous block. I like RANK too.

I played with the MATCH formula and I took out the INDEX function. It
seemed to give identical results. Do you see any problem not having INDEX?
Is INDEX a case of good programming technique (i.e. index/match usually go
together) that Bob taught me? I am trying to save some time by not doing
INDEX, seeing that this is an array formula. ;)

I am curious about what you think.

Appreciate your advice.

Epinn

Here's another way that returns the missing values in a contiguous block.

Entered as an array using the key combination of CTRL,SHIFT,ENTER (not just
ENTER):

=INDEX(ROW($1:$1000),SMALL(IF(ISNA(MATCH(ROW($1:$1000),A$1:A$750,0)),ROW($1:$1000)),ROW(A1)))

Copy down umtil you get #NUM! errors meaning all the missing values have
been returned. This method works but can be slow on large data sets. Also
note that due to the use of the ROW function the largest number in the
sequence can be 65,536 (in current versions of Excel).

Biff
 
E

Epinn

Thanks Biff for sharing.

This is why I want to learn from you. Your practical experience is precious. I prefer MATCH to COUNTIF as I don't want to split up the couple (INDEX/MATCH). ;)

In case you are interested, when I first suggested using MATCH, I was thinking of just using MATCH (similar to RANK) without using CSE or SMALL. But contiguous block is good unless someone is really concerned about speed.

Do you want to know what inspired me to drop INDEX? I used "evaluate formula" and I saw it not doing much. I do believe that dropping it won't speed up ......

Bob, I forgot to start a new thread. You know what, if only a couple of posts I may just squeeze in because sometimes I think efficiency is important too.

Bman342, thank you for the question and please feel free to come back with any follow-up questions. I am sure the experts won't mind me saying this as they are quite friendly on top of being knowledgeable.

Epinn

Biff said:
Is INDEX a case of good programming technique (i.e. index/match usually go
together)

Hmmm.....

It may be just a force of habit (in my case). Removing Index does seem to
work (though not exhaustively tested). Removing Index didn't improve
performance, though. Actually, I used to use this:

=INDEX(ROW($1:$1000),SMALL(IF(COUNTIF(A$1:A$1000,ROW(A$1:A$1000))=0,ROW($1:$1000)),ROW(A1)))

But replacing Countif with Match improves performance significantly on large
sequences.

Biff

Biff,

Thanks for the formula. I like the contiguous block. I like RANK too.

I played with the MATCH formula and I took out the INDEX function. It
seemed to give identical results. Do you see any problem not having INDEX?
Is INDEX a case of good programming technique (i.e. index/match usually go
together) that Bob taught me? I am trying to save some time by not doing
INDEX, seeing that this is an array formula. ;)

I am curious about what you think.

Appreciate your advice.

Epinn

Here's another way that returns the missing values in a contiguous block.

Entered as an array using the key combination of CTRL,SHIFT,ENTER (not just
ENTER):

=INDEX(ROW($1:$1000),SMALL(IF(ISNA(MATCH(ROW($1:$1000),A$1:A$750,0)),ROW($1:$1000)),ROW(A1)))

Copy down umtil you get #NUM! errors meaning all the missing values have
been returned. This method works but can be slow on large data sets. Also
note that due to the use of the ROW function the largest number in the
sequence can be 65,536 (in current versions of Excel).

Biff
 
G

Guest

Even better, thanks!

Biff said:
Here's another way that returns the missing values in a contiguous block.

Entered as an array using the key combination of CTRL,SHIFT,ENTER (not just
ENTER):

=INDEX(ROW($1:$1000),SMALL(IF(ISNA(MATCH(ROW($1:$1000),A$1:A$750,0)),ROW($1:$1000)),ROW(A1)))

Copy down umtil you get #NUM! errors meaning all the missing values have
been returned. This method works but can be slow on large data sets. Also
note that due to the use of the ROW function the largest number in the
sequence can be 65,536 (in current versions of Excel).

Biff
 
G

Guest

Thanks, Epinn, your solution worked as well. Now I need to spend some time
understanding HOW these work (after nn nested functions my brain begins to
cramp).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top