Need a simple tab file editor for large files

B

Bill

Hi All,

I have been given two data files exported from another system, that purport
to be a CSV files but are, I suspect, tab delimited files. The file
extensions are CSV. The files are large i.e. 2GB and contain too many
records for Excel - when trying to open in Excel one gives an error saying
no data, the other truncates the data.

Trying to import into Access its says file is not in a csv format for one of
the files, the other seems ok but has a top row that does not match the rest
of the file and screws the wizard.

I suspect that if I could load the files into something that would allow me
to remove that offending top row, and resave them to another file name in a
conventional CSV format, I could remedy the situation but have not had a lot
of success.

I have tried CSVed and Killink CSV without success - any ideas?

Regards.
Bill.
 
G

Guest

Hi Bill,

If the files are 2GB in size, I doubt if they will fit into an Access
database anyway as a .mdb or .accdb file is limited to 2GB in size.

You may need to import them into something like SQL Server.

How about good old Notepad for editing the top row of the problem file?
 
B

Bill

Jerry,

I understand there is a lot of white space in the files and the actual mdb
files will be much smaller. For example they zip up to about 10MB each.

Have tried notepad but maybe I wasn't patient enough!! Will try again.

Cheers.
Bill.
 
J

John Nurick

Hi Bill,

I use TextPad as a general-purpose text editor.

IIRC, if the files are larger than 2GB, Access's text file import
routine won't import them even if they 'boil down' to something small
enough to fit in an mdb file. (But it's unusual for delimited files to
contain much white space: could these be fixed-width files?)

You may need to use text file tools to munge the files into a form
that Access can import, e.g. by
- converting them from fixed width to delimited (or removing surplus
white space some other way)
- filtering out fields or records you're not interested in
- normalising them.

Post back with more informatoin about their contents and we'll be able
to help.
 
B

Bill

Hi John,

Textpad gives me the out of disk space error that it is known for.
Notepad complains that the file is too large.

Have found a product that allows me to *view* both files and there is masses
of white space (this is an export from an old financial system.) and you are
right it may well be fixed width rather than tab delimited. Happy to use any
text file tools to knock the files around a bit as you suggest.

Given the fact that Textpad and Notepad can't do it, can you suggest
anything?

Thanks for your help.
Bill
 
J

John Nurick

Hi Bill,

I generally write little Perl scripts or one-liners to do this sort of
thing, and also Windows versions of Unix utilities, available from
http://unxutils.sourceforge.net. Perl is a free download from
www.activestate.com.

One thing I'd do is make a cut-down file to play with. You can do this
by using the Unix 'head' utility:
head -100 bigfile.txt > smallfile.txt
will create a new file containing the first 100 lines from
bigfile.txt.

The Perl equivalent is
perl -pe"last if $. >= 100" bigfile.txt > smallfile.txt

To convert a fixed-width file such as this

1Smith Peter 1
2Henderson Helen 22
3Stewart Jackie 31
4Andrews Anna 97
5Adonis Andrew 8

to tab-delimited, I'd use a Perl script like this:

#FixedToTab.pl
use strict;

#Template for fields
# a1 means a 1-character field
# a10 means 10 characters and so on
my $template = 'a1 a15 a10 a2' ;


while(<>) { #iterate through file line by line
chomp; #strip newline from end of line

#unpack line into an array of fields using template
my (@fields) = unpack $template, $_;

#trim leading & trailing spaces from each field
map s/^ +// , @fields;
map s/ +$// , @fields;

#output fields with tabs between
print join "\t", @fields;
#output newline
print "\n";
}

called like this:

Perl FixedToTab.pl bigfile.txt > tabbedfile.txt
 
J

John Nurick

Forgot to say: it doesn't have to be Perl, lots of other languages can
do the job, though most of them need many more lines of code.
 
B

Bill

Thanks John that is very helpfull.

Unix was never one of my skills but there are a few in the office that I'm
sure will have no problem with what you suggest.

Thanks again.
Bill.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top