OP's often don't provide comprehensive examples, as you know. If the
urls always have protocol specifiers, and there's always 2 slashes
just after the protocol specifier and colon, then the domain name will
appear between :// and the subsequent /, but such urls *can* also
contain port number specifiers. For example,
http://www.foo.com:80/bar/
which your approach chokes on but mine parses as foo.com. Then there
are mailto: and protocol specifiers that aren't followed by two
slashes, but they're perhaps a digression.
The domain name will be the last 2 or 3 period-separated tokens
between the first colon, possibly followed by 2 slashes, and the first
subsequent colon or slash. The only characters you need to check for
as delimiters are colons and slashes. The domain name will contain 1
or 2 periods separating any other characters.
Actually, my VBA regex approaches parses out port specifiers OK. But I think
there is confusion, for me and others, about what constitutes a "domain name".
(I'm not particularly knowledgeable here).
But I see definitions for URL; domain name; registered domain name; hostname;
as well as various types of Top Level Domains (generic, country specific);
second level domains; and various levels of subdomains.
And the specifications are changing. Including allowing the use non-ascii
characters both in country level TLD's as well as in legitimate domain names.
In any event, the OP said he had a list of URL's; wanted to extract the domain
name; and remove the www. if present.
So I have simplified my original regex and VBA routine to do that. I start
matching at the first ":", with an optional "//"; capture the (
www.) into a
group which I will ignore, and return the subsequent string that includes
letters, digits, underscore, hyphens and dots.
re.Pattern = "

//)?(www\.)?([-\w.]+)"
This returns the domains and all the subdomains, with the exception of the
"www."
There are some differences in what we return in some of the URL's you listed.
I'm not sure what the OP would want. For some of them, he might want the
leftmost subdomain, and for others not.
URL
http://www.firstmonday.dk/issues/issue3_3/raymond/
http://www.insurance.ca.gov/docs/index.html
http://www.tdi.state.tx.us/wc/indexwc.html
http://en-US.www.mozilla.com/en-US/firefox/help/
http://xxx.lanl.gov/
http://www.stats.ox.ac.uk/pub/MASS4/
http://gd.tuwien.ac.at/opsys/linux/RPM/
Ron Harlan
firstmonday.dk
www.firstmonday.dk
insurance.ca.gov ca.gov
tdi.state.tx.us state.tx.us
en-US.
www.mozilla.com mozilla.com
xxx.lanl.gov lanl.gov
stats.ox.ac.uk ox.ac.uk
gd.tuwien.ac.at tuwien.ac.at
I can "correct" the entry with mozilla.com by making a small change in my
regex:
"

//)?([-\w.]*www\.)?([-\w.]+)"
and that works on the samples you provided. But I don't know if it would work
in all cases.
In addition, as you know, javascript does not match unicode characters, so that
causes another set of problems :-(
Enought for now -- I've got some errands to do. Below is the VBA code I used:
Ron:
====================================
Function ExtrURL(str As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = False
're.Pattern = "

//)?(www\.)?([-\w.]+)"
re.Pattern = "

//)?([-\w.]*www\.)?([-\w.]+)"
If re.test(str) = True Then
Set mc = re.Execute(str)
ExtrURL = mc(mc.Count - 1).submatches(2)
End If
End Function
'Harlan--------------------------------------------------------
Function ExtrURLH(str As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "[^:]*

//)?[^/:]*?([^./:]+\.[^./:]+(\.[a-z]{2})?)[:/].*"
ExtrURLH = re.Replace(str, "$2")
End Function
=======================================
Best,
--ron