Delete a line from a text file

T

tomtown.net

Hello

I'm trying to get a single line removed from a text file using a search
pattern (pretty simple: if line contains "NODE1") -> remove line). To
achieve this I's like to operate with only the original file and no
temp file since the file is 1. on a network resource 2. pretty big 3.
accessed by plenty of clients. Copying the file or reading everything
into an array and then writing it back using the StreamWriter would
lock the file for an unacceptable period of time.
using a database is not an option :blush:(

The file is ment for logging a progam execution and contains the
following:

NODE USER TIMESTAP VERSION LOCALREG
NODE1 USERNAME1 23.06.2006 16:28:28 1.1 LL=3;
NODE1 USERNAME1 23.06.2006 16:28:28 1.1 LL=3;
NODE2 USERNAME2 23.06.2006 16:29:48 1.1 LL=3;

As an example i'd like to remove all lines containing "NODE1" giving
me:

NODE USER TIMESTAP VERSION LOCALREG
NODE2 USERNAME2 23.06.2006 16:29:48 1.1 LL=3;

Any help would be greatly appreciated....

Tom
 
N

Nicholas Paldino [.NET/C# MVP]

Tom,

Doing it in-file is not possible. You can't shorten a file (not by any
means I know).

Rather, what you have to do is resort to creating a new file, and then
overwriting the original file with that.

So, that being said, you would read through your file, and as you find
the records you want to keep, you would write them to the new file.

Then, when you have the new, shorter file, delete the old file, and
rename the new file.

Hope this helps.
 
G

GhostInAK

Hello tomtown.net,

Well, first off, you don't *remove* anything from a file... any file. You
can *add* (append) to a file or or delete ALL the contents of a file and
re-write it. If you can modify the consuming application (provided people
aren't opening the file directly in notepad or some crazy shit like that)
then you may want to consider an index file. Using an index you could easily
mark records as deleted or to be ignored.

Unfortunately you can't treat the file with an ODBC connection/command because
the ODBC text file driver does not support the DELETE command.

-Boo
 
B

Ben Voigt

Nicholas Paldino said:
Tom,

Doing it in-file is not possible. You can't shorten a file (not by any
means I know).
SetEndOfFile


Rather, what you have to do is resort to creating a new file, and then
overwriting the original file with that.

So, that being said, you would read through your file, and as you find
the records you want to keep, you would write them to the new file.

Correct. But, you can do that in-place, as far as the file is concerned.

Think of the way memmove works.

How about the following algorithm? Note that ptrin and ptrout can be
pointers, indexes, or whatever your language prefers.

char[] buffer1 = allocate 2k;
char[] buffer2 = allocate 1k;
read from file at [buffer1 + 0, buffer1 + 2k)

ptrin = buffer1
ptrout = buffer2

flag removeit = false
until (reached end of input file) do
if ptrin reached buffer1 + 1k then
move [buffer1 + 1k ... buffer1 + 2k) to buffer1 + 0
ptrin -= 1k
read from file at [buffer1 + 1k, buffer + 2k)
endif
if ptrout reached buffer2 + 1k then
write to file [buffer2 + 0, buffer2 + 1k)
ptrout = buffer2
endif
if removeit then
if value at ptrin is newline then
removeit = false
endif
else
if value at ptrin is "NODE1" then
removeit = true
else
move value at ptrin to ptrout
advance ptrout
endif
endif

advance ptrin
loop
write to file [buffer2, ptrout)
set end of file
Then, when you have the new, shorter file, delete the old file, and
rename the new file.

Hope this helps.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)


tomtown.net said:
Hello

I'm trying to get a single line removed from a text file using a search
pattern (pretty simple: if line contains "NODE1") -> remove line). To
achieve this I's like to operate with only the original file and no
temp file since the file is 1. on a network resource 2. pretty big 3.
accessed by plenty of clients. Copying the file or reading everything
into an array and then writing it back using the StreamWriter would
lock the file for an unacceptable period of time.
using a database is not an option :blush:(

The file is ment for logging a progam execution and contains the
following:

NODE USER TIMESTAP VERSION LOCALREG
NODE1 USERNAME1 23.06.2006 16:28:28 1.1 LL=3;
NODE1 USERNAME1 23.06.2006 16:28:28 1.1 LL=3;
NODE2 USERNAME2 23.06.2006 16:29:48 1.1 LL=3;

As an example i'd like to remove all lines containing "NODE1" giving
me:

NODE USER TIMESTAP VERSION LOCALREG
NODE2 USERNAME2 23.06.2006 16:29:48 1.1 LL=3;

Any help would be greatly appreciated....

Tom
 
B

Ben Voigt

Ben Voigt said:
Correct. But, you can do that in-place, as far as the file is concerned.

Think of the way memmove works.

How about the following algorithm? Note that ptrin and ptrout can be
pointers, indexes, or whatever your language prefers.

More notes:
* Of course you must keep the read and write offsets in the file separate.
* You will need to lock the file while checking the file length, but you can
release the lock immediately.
* Run this on the file server if at all possible, to avoid the roundtrips
and minimize the length of time locked.
* For the last block (partial block), keep the file locked, so that you can
truncate the file before some client extends it more.
* You can pause and resume anytime, just by saving your pointers, because
the clients are writing to a different area of the file. This means if you
know the start the file is already processed, you can skip to the new data.
* This algorithm will actually only remove from the word "NODE1" to the end
of the line. If the keyword does not appear at the start of the line, some
of the line will remain. To remove the entire line, you can save the
pointer/index each time you hit a newline, and move back to writing at that
location. Note that this affects your in-memory pointer and potentially
also your file write pointer.
char[] buffer1 = allocate 2k;
char[] buffer2 = allocate 1k;
read from file at [buffer1 + 0, buffer1 + 2k)

ptrin = buffer1
ptrout = buffer2

flag removeit = false
until (reached end of input file) do
if ptrin reached buffer1 + 1k then
move [buffer1 + 1k ... buffer1 + 2k) to buffer1 + 0
ptrin -= 1k
read from file at [buffer1 + 1k, buffer + 2k)
endif
if ptrout reached buffer2 + 1k then
write to file [buffer2 + 0, buffer2 + 1k)
ptrout = buffer2
endif
if removeit then
if value at ptrin is newline then
removeit = false
endif
else
if value at ptrin is "NODE1" then
removeit = true
else
move value at ptrin to ptrout
advance ptrout
endif
endif

advance ptrin
loop
write to file [buffer2, ptrout)
set end of file
Then, when you have the new, shorter file, delete the old file, and
rename the new file.

Hope this helps.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)


tomtown.net said:
Hello

I'm trying to get a single line removed from a text file using a search
pattern (pretty simple: if line contains "NODE1") -> remove line). To
achieve this I's like to operate with only the original file and no
temp file since the file is 1. on a network resource 2. pretty big 3.
accessed by plenty of clients. Copying the file or reading everything
into an array and then writing it back using the StreamWriter would
lock the file for an unacceptable period of time.
using a database is not an option :blush:(

The file is ment for logging a progam execution and contains the
following:

NODE USER TIMESTAP VERSION LOCALREG
NODE1 USERNAME1 23.06.2006 16:28:28 1.1 LL=3;
NODE1 USERNAME1 23.06.2006 16:28:28 1.1 LL=3;
NODE2 USERNAME2 23.06.2006 16:29:48 1.1 LL=3;

As an example i'd like to remove all lines containing "NODE1" giving
me:

NODE USER TIMESTAP VERSION LOCALREG
NODE2 USERNAME2 23.06.2006 16:29:48 1.1 LL=3;

Any help would be greatly appreciated....

Tom
 
T

tomtown.net

Hello all

Thanks all! This was my first post and so many people are helping!
Thank you very much!
I've once made a class that replaces certain characters (code below),
but as most of you suggested it also needs a temp file to be created
and then copies the file to the original location.
I just got the idea using XML instead of bare text files since
manipulations seem to be much easier using the SelectSingleNode,
InnerXml and ReplaceChild method of XmlElement. I found a pretty good
article here (might help someone else too)
http://www.codeproject.com/soap/myXPath.asp

Code to replace strings in a textfile here (I used this for a
commandline search and replace tool downloadable at
http://www.tomtown.net/?com=tech&scom=scorner&action=browse&cid=002 ):

// =================================================================

public static void ReplaceString(string textFileName, string
searchStr, string replaceStr, int backup)
{
string tempFileName = Path.GetTempFileName();

StreamReader sr = null;
sr = new StreamReader(textFileName,
Encoding.GetEncoding("windows-1252"));
StreamWriter sw = null;
sw = new StreamWriter(tempFileName, false,

Encoding.GetEncoding("windows-1252"));
string line;

System.Text.StringBuilder newline = new
System.Text.StringBuilder();

while ((line = sr.ReadLine()) != null)
{
string correctString = line.Replace(searchStr,
replaceStr);
sw.WriteLine(correctString);
}

sr.Close();
sw.Close();

if (backup == 1)
{
if (File.Exists(textFileName + "_bak"))
File.Delete(textFileName + "_bak");
File.Move(textFileName, textFileName + "_bak");
}

File.Delete(textFileName);
File.Move(tempFileName, textFileName);
}

// =================================================================

Thanx again for all your efforts!!!

Tom
 
B

Ben Voigt

tomtown.net said:
Hello all

Thanks all! This was my first post and so many people are helping!
Thank you very much!
I've once made a class that replaces certain characters (code below),
but as most of you suggested it also needs a temp file to be created
and then copies the file to the original location.

If all write operations are append-only, and the resulting text is always no
longer than the original, you can do the operation in-place.

If you have multiple processes writing the file, presumably each locks the
file for write to prevent corruption, and then closes the file immediately
afterwards? This is important, because when you reach the end of the file
you'll do almost exactly the same thing, except you'll call SetEndOfFile
instead of appending data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top