W
William Stacey [MVP]
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!