Sampling

G

Guest

I have multiple columns of data which consist of numeric and text. There are
15000 rows of which I wish to extract a sample of 2500 rows. When I use the
sampling feature it only gives me an extract of the first column of data
which is numeric. How can I get Excel to bring the rest of the data in the
other columns across with it.

If that is not possible, how do I then go about matching up the extracted
numerics with the numerics in the original range.

Regards
Kharrison
 
D

David McRitchie

Hi Kharrison,

Don't know what the sampling feature is, so I don't know what you did.
Perhaps you simply had to select all of the cells.

You could create a new column
K1:K6 on number to each cell 1, 2, 3, 4, 5 ,6
Select K1:K6 and fill down

Use automatic filter and select value of 1
Select all cells (ctrl + A)
Data, Filter, Auto Filter
click on the arrow at the top of the added column,
and select 1 -- which would include the header row if you have a
one row hader.

You can paste the filtered data to another worksheet.-
 
M

Max

Perhaps one alternative approach to play with ..

Assume a small data set

In Sheet1
-------------
in A1:D11, data from row2 down

Field1 Field2 Field3 Field4
Text1 Num1 Text11 Num11
Text2 Num2 Text12 Num12
Text3 Num3 Text13 Num13
Text4 Num4 Text14 Num14
Text5 Num5 Text15 Num15
Text6 Num6 Text16 Num16
Text7 Num7 Text17 Num17
Text8 Num8 Text18 Num18
Text9 Num9 Text19 Num19
Text10 Num10 Text20 Num20

Put in E2: =RAND()
Copy down to E11

In Sheet2
------------
Copy > paste the same col headers into A1:D1, viz.:
Field1 Field2 Field3 Field4

Put in A2:
=INDEX(Sheet1!A:A,MATCH(SMALL(Sheet1!$E:$E,ROWS($A$1:A1)),Sheet1!$E:$E,0))

Copy across to D1, fill down to D11

In A2:D11 will be a randomized, non-repeating shuffle
of the data from Sheet1

Each press of F9 key will recalc a fresh shuffle

Just select the first x rows of data as your sample,
and freeze the results elsewhere
via a copy > paste special > values > ok

Adapt / extend to suit

Note:
--------
For large data-sets, since recalc will be intensive, it's
better to switch the book's calc mode to "Manual" first,
via: Tools > Options > Calc tab > Check "Manual" > OK

If there are 15,000 rows in Sheet1, col E
(with the =RAND() formula) need to be filled right down,
but for the formulas in Sheet2, you could stop filling down
when you've reached the desired sample size, viz.
at row 2500 thereabouts
(no need to fill down all 15K rows)

Then just press F9 to recalc, as before

(but you'll need to wait a while for calcs to complete <g>
- see the status bar indication at the bottom left corner)
 
M

Max

Think one advantage of the suggested approach is that it'll
generate non-repeating samples (usually required),
unlike the Tools > Data Analysis > Sampling route
which might throw up repeats, besides the issues posted ..
 
M

Max

Put in A2:
=INDEX(Sheet1!A:A,MATCH(SMALL(Sheet1!$E:$E,ROWS($A$1:A1)),Sheet1!$E:$E,0))

Copy across to D1, fill down to D11

Typo correction: Last line above should be
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top