File size and data loss compare

  • Thread starter Thread starter Del
  • Start date Start date
D

Del

I have a file that I download off my companies webserver.
When I pull it off the webserver to my harddrive it comes
down as a 12 meg CSV file. I open the file in Excel 2000
and then save it as an Excel file and the file size
changes to around 3 meg. When the file is in Excel I have
6000 rows and the columns go out to AK. Is there an easy
way to see what I am loosing when I go from the CSV to
Excel? It worrys me that there is such a huge file
difference
 
Sounds like your file is comma-delimited...I assume you
are using Excel's built-in conversion tool to seperate the
lines of text into columns? If not, you may be cutting
off some data...or possibly not. That is a large drop in
size...not sure what else could be causing that.

I googled this: http://www.imf.org/external/help/csv.htm
 
Can you open the .csv file in your favorite text editor and try to eyeball the
difference?

It sure sounds like something big is being lost.
 
I tried that but there is TONS of data and I was not able
to pick anything out.
[Dave Peterson wrote] -----Original Message-----
Can you open the .csv file in your favorite text editor and try to eyeball the
difference?

It sure sounds like something big is being lost.

Not necessarily; I'd take a look to see whether the .csv data is
padded with leading (or trailing, but probably leading) spaces. All
that extra invisible but pernicious text would contribute to the
..csv's size, but would be ignored by Excel. I tried an experiment in
which I created a csv of the same dimensions described in the original
message; 37 columns by 6000 rows, all random numeric entries, but
formatted as text with a few leading spaces. The file was about 3.5
meg. (Much smaller than the OP's, but I used only 3 digit values.
Indeed, this variance is instructive. Even when I created a CSV with
values a random 11 digits long (but without any space padding) the csv
file size reached only 2.5 Meg. Of course there were no labels in
there, and we don't know whether there were in the OP's file.)

Loaded into Excel and saved as an Excel file, though, the original
(padded) 3.5 Meg csv shrank to a mere 1.5 meg .xls. If you've done
random samplings of the data and can't find anything missing, my bet
is that this would be the explanation.

(If you're still worried about it and you have a copy of Microsoft
Access, you may want to try loading it into that as well and see
whether you get the same type of data as you get in Excel. I've often
found "Find Unmatched" queries to be useful in tracking down (seeming)
data anomalies, though if my guess about the reason is correct they
won't help much here.)
 
Back
Top