gary said:
When I try to open a 2 gb file, I get a "Disk Full" error.
How can I split the original file into smaller files without first
opening the original file?
Some suggestions here, for alternatives to TEXTPAD.
http://kb.mozillazine.org/Edit_large_mbox_files
( no file size limit listed here...
http://www.textpad.com/products/textpad/screenshots/index.html )
*******
There are plenty of other tools around, scripting languages
like "awk" or "perl", that can also be used to do stuff. But with
some luck, the tools in the Mozillazine article will allow you
to have a GUI while you work.
In Unix or Linux environments, "head" or "tail" can be used for
chopping up files, such as
head -n 1000 bigfile.txt > smaller_1.txt
head -n 2000 bigfile.txt | tail -n 1000 > smaller_2.txt
The advantage of commands like that, is they're "line oriented",
respecting breakage of your file, on sentence boundaries.
The first example command, grabs the first 1000 lines.
The second example command, extracts the second 1000 lines, using two
commands piped together. The ">" redirects output to the named new
file.
If the file being imported into the environment, appears to be
"one long line", then those programs won't work quite right.
You need to do termination character conversion in that case.
( <cr> to <cr><lf> or vice versa). Which I'll ignore for now,
as there is always "dd" to finish this task
http://linux.die.net/man/1/head
http://linux.die.net/man/1/tail
You can even get ports of programs like this, so they'll run under Windows.
In my software collection, I can see them in "coreutils" package. An example
of that, is in an older post I made. Some of these ported packages have
"installers", but when I'm in a rush, I just copy the necessary files
into the current working directory and get the job done.
http://groups.google.ca/group/microsoft.public.windowsxp.general/msg/d63ee15fbd9ff9fc?dmode=source
These two links give the component parts, and still seem to be available,
even if the original main page for coreutils is gone. You should be
able to find "head" and "tail" in here, plus whatever DLLs they need.
You would use them in a Command Prompt window, where you'd "change directory"
or cd to the appropriate working directory, to do your work. Put a copy
of head, tail, the DLLs, into the current working directory with the
"big file", then issue the necessary commands, then wait ...
http://iweb.dl.sourceforge.net/sourceforge/gnuwin32/coreutils-5.3.0-bin.zip
http://iweb.dl.sourceforge.net/sourceforge/gnuwin32/coreutils-5.3.0-dep.zip
The command syntax in Windows should be exactly the same, for those ports.
And as long as the line termination characters, allow the tools to
recognize the end of each line, it'll work. Using a series of commands
like this, it'll take a dog's age to chop up the file, but you'll eventually
do it.
head -n 2000 bigfile.txt | tail -n 1000 > smaller_2.txt
*******
Now, if you're the kind of person "more comfortable with a chain saw", this
tool gets the same job done more rapidly. It will not respect sentence structure,
so the last line in each file, could be split in half for example.
http://www.chrysocome.net/dd
You work in a command prompt. Drop a copy of dd.exe into your working
directory.
dd if=bigfile.txt of=smaller_1.txt bs=1048576 count=1024
dd if=bigfile.txt of=smaller_2.txt bs=1048576 count=1024 skip=1024
That copies the first (1024 x 1048576) characters into smaller_1.txt .
The second command copies the second (1024 x 1048576) characters
into smaller_2.txt and so on. For the last command, I suspect
something like this would work, where the command copies until it
hits the end of file on input. This would copy everything from
2GB up to the end of bigfile.txt for example, making a third file.
The "skip" option, skips you X blocks along, before starting the
copy. The "bs" or "Block Size" field, is a multiple of 512 bytes,
which is a sector on older hard drives. The command really
likes binary numbers, at least for the block size field.
dd if=bigfile.txt of=smaller_3.txt bs=1048576 skip=2048
Commands like that, run anywhere from 13MB/sec to 60MB/sec,
depending on conditions.
HTH,
Paul