Random String

S

shapper

Hello,

I am creating a random string, 4 characters long, as follows:

Int16 length = 4;
Random random = new Random();
String characters = "abcdefghijklmnopqrstuvwxyz1234567890";
StringBuilder alias = new StringBuilder();
while (length-- > 0)
alias.Append(characters[(int)(random.NextDouble() *
characters.Length)]);

I suppose this is a good option but any suggestion is welcome.

I am inserting the random string into an SQL table. I need to be sure
that is unique ...
.... if it isn't then I should recreate a new one.

What is the best way to do this?

Thanks,
Miguel
 
S

shapper

I am creating a random string, 4 characters long, as follows:
      Int16 length = 4;
      Random random = new Random();
      String characters = "abcdefghijklmnopqrstuvwxyz1234567890";
      StringBuilder alias = new StringBuilder();
      while (length-- > 0)
        alias.Append(characters[(int)(random.NextDouble() *
characters.Length)]);
I suppose this is a good option but any suggestion is welcome.

Some minor cleanup:

       Int16 length = 4;
       Random random = new Random();
       String characters = "abcdefghijklmnopqrstuvwxyz1234567890";
       StringBuilder alias = new StringBuilder(length);
       while (length-- > 0)
         alias.Append(characters[random.Next(characters.Length)]);

I like to initialize StringBuilders explicitly if I know the exact length 
ahead of time, even if that length is the same or smaller than the default  
initial capacity.  And of course, the Random class already has a perfectly  
fine way to get an integer in the range [0, upper-bound).

You could also make the "characters" variable a constant instead of  
initializing it each time through the code.
I am inserting the random string into an SQL table. I need to be sure
that is unique ...
... if it isn't then I should recreate a new one.
What is the best way to do this?

To do what?  Make sure the string is unique?  That depends on what sources  
the string could have, and over what period of time you need for the  
instances to be unique.  But, an obvious possibility is to simply inspect  
the table before inserting it to see if it already exists.  Or (I'm no SQL  
expert, so bear with me if this makes no sense) use the string as the (a?)  
key for the table, and simply check for an error trying to insert it (my  
understanding is that SQL has an idea of a column in a table that has to  
have a unique value for each record...I'm calling that column the "key").

If the string only needs to be unique for a given execution of your  
program, you could maintain a dictionary (Hashtable, Dictionary<TKey,  
TValue>, etc.) and check the dictionary for the presence of the key before  
using it in your table.  For that matter, if the table exists outside of a  
given execution of your program (not uncommon for databases :) ), you  
could initialize the dictionary when you start using the table, using the 
current records in the table, before you start trying to add things to it..

Note that your 4-character key has "only" about a million and a half  
possibilities.  You may find the Guid class more useful, if you happen to  
be dealing with really large numbers of data records.  For that matter, 
even a standard int (even if converted to a string) would give you about 4  
billion possibilities instead.

Finally, note that it might make more sense just to assign these keys in  
sequence, rather than picking them randomly.  Unless you have some  
specific need to obfuscate the selection pattern for the keys, the code  
would be simpler and could deal with collisions in a much more efficient  
way (i.e. just pick the next available one, rather than repeatedly trying 
a random one until you find a non-duplicate).

Pete

Pete,

I am trying to replicate TinyUrl system on my ASP.NET MVC Application.

The application creates Twitter entries where:
http://mywebsite.com/Article/Edit/568aad33-c406-46c3-a760-a486f9b35370

Would become:
http://mywebsite.com/ab12

Every time a twitter entry is created a new alias is added to a table:
AliasId = ab12;
SourceUrl = Article/Edit/568aad33-c406-46c3-a760-a486f9b35370

With a 4 characters length I get 36^4 = 1 679 616
I don't believe I will get more than 10 000 alias (this is even a very
high limit) so it means that I get at worse a 0.6% probability of
create a repeated alias.
And the probability of making it twice is almost null (0.0035%) ... So
I think I would be safe to use the random method.

I create a new Alias and check if it exists. If it does I create it
again.

Your idea of creating it in sequence it is interesting but I would
always need to check what was the last alias created so I would need
to add a new column to the SQL table (CreateAt), order it by
CreatedAt, get the last one and determine what would be the next one.

I think in terms of performance it will be the same.

Note: if I use an alias with 36^5 = 60 566 176 then all this numbers
get even better so ...

Any suggestion is welcome.

Thanks,
Miguel
 
P

Pavel Minaev

To do what?  Make sure the string is unique?  That depends on what sources  
the string could have, and over what period of time you need for the  
instances to be unique.  But, an obvious possibility is to simply inspect  
the table before inserting it to see if it already exists.

Which, of course, presents some interesting race conditions, given the
concurrent nature of your typical DBMS :)

 Or (I'm no SQL  
expert, so bear with me if this makes no sense) use the string as the (a?)  
key for the table, and simply check for an error trying to insert it (my  
understanding is that SQL has an idea of a column in a table that has to  
have a unique value for each record...I'm calling that column the "key").

In SQL it's not called a key, but you can create a UNIQUE constraint
on any column (or combination thereof) with the same effect.
 
P

Pavel Minaev

Your idea of creating it in sequence it is interesting but I would
always need to check what was the last alias created so I would need
to add a new column to the SQL table (CreateAt), order it by
CreatedAt, get the last one and determine what would be the next one.

Er, no, you wouldn't. Pretty much any SQL DBMS out there has some ways
of auto-generating a number for a column. For MSSQL, you'd use
IDENTITY (http://msdn.microsoft.com/en-us/library/aa933196(SQL.
80).aspx). For Oracle, CREATE SEQUENCE and associated functions.

Then, of course, you can always use GUIDs.
 
S

shapper

Er, no, you wouldn't. Pretty much any SQL DBMS out there has some ways
of auto-generating a number for a column. For MSSQL, you'd use
IDENTITY (http://msdn.microsoft.com/en-us/library/aa933196(SQL.
80).aspx). For Oracle, CREATE SEQUENCE and associated functions.

Then, of course, you can always use GUIDs.

Pavel, yes I can use Guid's and yes I can use Int.
(I usually use Guid's even if the performance is a little bit
worse ... but there are some advantages)

But that is not the point here because I need a 4/5 character long
id ...
 
H

Hans Kesting

shapper presented the following explanation :
Hello,

I am creating a random string, 4 characters long, as follows:

Int16 length = 4;
Random random = new Random();
[snip]

One thing to note is that if you do this very often, you might end up
with the same "random" sequence.
When you create an instance of the Random class, it initializes itself
based on the current clock. If you create a second instance within the
same clock-tick, it will start at and generate the same values!

The solution is simple: instead of a new instance every time, use a
single static instance
private static readonly Random random = new Random();

Hans Kesting
 
P

proxyuser

shapper said:
I am inserting the random string into an SQL table. I need to be sure
that is unique ...
... if it isn't then I should recreate a new one.

What is the best way to do this?

You have around 1,700,000 possibilities. Do you plan on using a lot of
them? Is this something that's going to go on for a long time over months
or years of a large application? If so, you could generate a table of all
the strings, and pick one at random to remove. Then you don't have to worry
about checking for duplicates. For example, install the
(file/table/whatever) initially with 1,700,000 entries. Generate a random
number between 1 and 1,700,000 and remove that entry. Next time check the
size of the (f/t/w) - it will be 1,699,999. Generate a random number
between 1 and 1,699,999 and remove that entry. etc.

This is probably not what you want though.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top