How to add a string to a big file in csharp !

G

Guest

I want to add a string to the file and the file is sort by letter! for
examply:
the follow file is a big file
//////////////////////
abort
black
cabbage
dog
egg
fly
..
..
////////////////////
and now i want to add "dad" into it ! Just after "cabbage" and at the front
of "dog"! Because of so many word in file so i need to adopt binary search to
find the location !

/// <summary>
/// want to find the word from given file
/// </summary>
/// <param name="?"></param>

private bool find(string word)
{
if (word == null)
{
throw new ArgumentNullException("word is null.");
}

StreamReader sr = new StreamReader(file.FullName); //file is object of
FileInfo
lock(this)
{
//Check the word is in the first!
string str = sr.ReadLine();
if (str == null)
{
return false;
}
if (string.Compare(str.Trim(),word))
return true;
}

// binary search starts
FileStream fs = File.OpenRead(file.FullName);
long lower = 0;
long upper = fs.Length - 1;
while (lower <= upper)
{
long index = (lower + upper) / 2;
fs.seek(index,SeekOrigin.End);

// read off an incomplete line
str = fs.Read();
////i donot know how to set the parameters of Read() so that it can read a
line

// the line might be null if it's the end of file
int t = str == null ? -1
: string.Compare(word, str.trim());
// found it
if (t == 0)
{
return true;
}
if (t > 0)
{
lower = index + 1;
}
else
{
upper = index - 1;
}
}
}

that is the fuction of method and my question is
1: the FileStream is fitable in it ?
2 : string.Compare is fitable in it ?
3: is there any method i can do it better ?

thanx of all !
 
Z

Zach

zjut said:
I want to add a string to the file and the file is sort by letter! for
examply:
the follow file is a big file
//////////////////////
abort
black
cabbage
dog
egg
fly
.
.
////////////////////
and now i want to add "dad" into it !

Given:
* Your file ("old_file") is in alphabetical order
* old_file is immensely big

Required:
* Adding word ("new_word") to old_file at the right place.

Solution:
* Sequentially read old_file ("word_read") and write word_read to new_file
* If (new_word > word_read) and (new_word< word_read+1) then shove it in
 
Z

Zach

zjut said:
I want to add a string to the file and the file is sort by letter! for

If you are having problems with the algorithm let me know and I will post an
example. The example sorts a short alphabetically ordered file into a very
big alphabetically ordered file. By the way, WordPerfect can deal with
immensely big files (100,000+ words). Microsoft Word can't.
 
J

Justin Rogers

I see another NG member has already given you a possible
solution, but I don't feel it would be an optimal solution... You
really have a couple of different options that all revolve around the
same set of principles... First, you know the new size of the word
you are sorting into place, so you'll want to open and then grow
the file by that amount. This is to make sure you can copy the
rest of the file around while you are doing your searching. For a
sanity check, go ahead and check the first and last element to make
sure this isn't a trivial case.

Okay, the binary search is going to involve, cutting the file in half,
you can do this based on length, and then seeking to that location.
Once you've done that, you are going to walk backwards and
forwards until you encounter newlines on either side. This'll be
your *word*, and you'll compare it and continue the process of
cutting the file in half (aka a binary search)... Once you've found
your insertion location, you are going to do large buffer copies (4K
is probably best) of bytes moving all of the end elements into that
space you allocated in the beginning. With that done, write your
word into place. You've just managed an in place insertion.

If you have multiple words to merge, then merge sorting and other
heuristics come into play. Get your basic algorithm and then think
about refactoring.
 
J

Jon Skeet [C# MVP]

Zach said:
If you are having problems with the algorithm let me know and I will post an
example. The example sorts a short alphabetically ordered file into a very
big alphabetically ordered file. By the way, WordPerfect can deal with
immensely big files (100,000+ words). Microsoft Word can't.

While I wouldn't be surprised if Word had some limits somewhere, Word
can certainly cope with 100,000+ words easily. I just created a
document with over 300,000 words, and Word didn't have any problems
with it.
 
Z

Zach

Jon Skeet said:
While I wouldn't be surprised if Word had some limits somewhere, Word
can certainly cope with 100,000+ words easily. I just created a
document with over 300,000 words, and Word didn't have any problems
with it.

I had to process > 100,000 words - sort them and so on to create a
spellcheck
vocabulary, and Word wouldn't do it. Message saying it couldn't handle the
volume. WordPerfect had no problems sorting >100,000 words etc.
(So I wrote some software for the job.)
 
Z

Zach

NB the words of the OP are in a
file to start with and have to be read
and re-written at least once!

Sequentially reading through the parent file,
slipping in the words from a sorted array,
at their respective right places in the parent
file, whilst checking for doubles, is fast, simple
to write and has no capacity constraints.
IMO Doing binary sorts in this situation is silly,
even more so if the new words are in random order.
 
J

Jon Skeet [C# MVP]

Zach said:
I had to process > 100,000 words - sort them and so on to create a
spellcheck vocabulary, and Word wouldn't do it. Message saying it couldn't
handle the volume. WordPerfect had no problems sorting >100,000 words etc.
(So I wrote some software for the job.)

I would argue that a word processor isn't the right tool for sorting a
vocabulary file anyway. While Word may not be able to sort a document
with over 100,000 words, it's fine when it comes to normal word
processing tasks with the same size of document.
 
Z

Zach

Jon Skeet said:
I would argue that a word processor isn't the right tool for sorting a
vocabulary file anyway. While Word may not be able to sort a document
with over 100,000 words, it's fine when it comes to normal word
processing tasks with the same size of document.

Yes, and I wanted to throw out the words that WP didn't know,
because they wouldn't be every day vocabulary.
 
S

Sushant Bhatia

I have a few suggestions:

1) Can you just split the names in the files into 26 separate files
such that 1st file has all A's, 2nd file has all B's and so on. I
think that will reduce the amount of text you need to process.

2) You can also try using a B+ tree. Very good for frequent finds and
few updates.


3) Alternatively, why don't you use a hash table to store the hash of
each of the words. That way, when you want to find where a word goes,
compute its hash and you should be able to see which hash should come
before the word you want to add. That way, when your searching for the
word, it will be much much faster because you can search for a
specific word, rather than comparing each and every word (ie. you can
ignore chunks of data using hashes).


Let me know what you decide to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top