Have you ever found yourself looking for a solution to parse or validate a domain name? Probably, you spent several hours trying to find the most efficient and comprehensive regular expression, but the more examples you found, the more you realized that the final solution doesn’t seem to exist.
And you are right. There is no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list of all top-level domains and the level at which domains can be registered. This is the aim of the effective TLD list.
Here comes the Public Suffix List.
What is the Public Suffix List?
The Public Suffix List is a cross-vendor initiative to provide an accurate list of domain name suffixes.
The Public Suffix List is an initiative of the Mozilla Project, but is maintained as a community resource. It is available for use in any software, but was originally created to meet the needs of browser manufacturers.
A “public suffix” is one under which Internet users can directly register names. Some examples of public suffixes are “.com”, “.co.uk” and “pvt.k12.wy.us”. The Public Suffix List is a list of all known public suffixes.
Does it work with Ruby?
Yeah! Public Suffix Service is a Ruby domain name parser based on the Public Suffix List. To use it you don’t need to download the list or learn how it works. Just install the Gem and you’re done.
1 | $ gem install public_suffix_service |
Here’s a few examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # parse a very standard domain name domain = PublicSuffixService.parse("google.com") domain.tld # => "com" domain.domain # => "google" domain.subdomain # => nil # parse a less standard domain name domain = PublicSuffixService.parse("google.co.uk") domain.tld # => "co.uk" domain.domain # => "google" domain.subdomain # => nil # it works with subdomains too domain = PublicSuffixService.parse("www.google.co.uk") domain.tld # => "co.uk" domain.domain # => "google" domain.subdomain # => "www" |
Domain validation
The Public Suffix Service library offers a quick way to validate a domain.
1 2 3 4 | PublicSuffixService.valid?("google.com") # => true PublicSuffixService.valid?("www.google.com") # => true |
The main difference compared with the regular expression based solutions is that this library actually validates the domain against a white/black list instead of running a soft check on the TLD size.
1 2 3 4 | PublicSuffixService.valid?("google.xx") # => false PublicSuffixService.valid?("google.zip") # => false |
Domain transformation
The PublicSuffixService::Domain class provides a bunch of methods to validate and transform a domain name.
1 2 3 4 5 6 7 8 9 10 11 12 | domain = PublicSuffixService.parse("www.google.com") domain.domain? # => true domain.is_a_domain? # => false domain.is_a_subdomain? # => true domain.subdomain # => "www.google.com" domain.domain # => "google.com" |
Who uses the Public Suffic List?
The list is used by well known browsers such as Google Chrome, Mozilla Firefox and Opera.
The Public Suffix Service Ruby library was created for RoboDomain and it has been used in production since November 2009.
Cool, now I can finally ditch my own hackish implementation.
One thing I am not clear on is how your implementation works with punycode names? The effective TLD file is in UTF-8, but DNS is ASCII (punycode) most of the time.
Cheers,
David
Punicode is just an encoding syntax intended to provide an ASCII representation of the domain. The Public Suffix List doesn’t handle punicode-encoded domains. Instead, it understands the original Unicode representation.
Let me show you an example, using Ruby 1.9.
The following is a list of 3 domains. The first and the second lines contains valid domains. The third is an invalid extension.
2
3
السعودية.com
محمد-بن-راشد.امارا
Here’s how PublicSuffixService works
2
3
4
5
6
7
8
9
10
=> ["محمد-بن-راشد.امارات", "السعودية.com", "محمد-بن-راشد.امارا"]
ruby-1.9.1-p378 > PublicSuffixService.valid?(c.first)
=> true
ruby-1.9.1-p378 > PublicSuffixService.valid?(c.last)
=> false
ruby-1.9.1-p378 > d = PublicSuffixService.parse(c.first)
=> #<PublicSuffixService::Domain:0x00000101318d60 @tld="امارات", @sld="محمد-بن-راشد", @trd=nil>
ruby-1.9.1-p378 > d.tld
=> "امارات"
However, if you need to work with ASCII/punicode strings, there might be an other alternative.
The
PublicSuffixServicelibrary parses and stores the Public Suffix List in aRuleListobject. Because you can add/remove rules on the fly, what about adding your own ASCII/punicode rules?2
3
4
rule_list = PublicSuffixService::RuleList.default
# add the Saudi Arabia punicode TLD
rule_list << PublicSuffixService::Rule.factory("xn--mgberp4a5d4ar")
2
3
4
rule_list = PublicSuffixService::RuleList.default
# add the UAE punicode TLD
rule_list << PublicSuffixService::Rule.factory("xn--mgbaam7a8h")
Now you should be able to do
2
3
4
=> true
d = PublicSuffixService.parse("xn-----dtdcxmeo2a6nbp.xn--mgbaam7a8h")
=> #<PublicSuffixService::Domain:0x00000100930248 @tld="xn--mgbaam7a8h", @sld="xn-----dtdcxmeo2a6nbp", @trd=nil>
Hope this answers your question. :)
Thanks for the explanation. Of course with Ruby 1.9 the Unicode will work fine.
As a DNS person myself I do however think in punycode. My database stores domain names in ascii. All the regulations are in punycode (e.g. only 63 ascii characters per label, not 63 unicode characters). Displaying fancy characters is just a representation, not the canonical version of a domain name.
I think it would be awesome if your library could handle both cases. If it detects that you are using an xn-- domain name it can convert it to Unicode representation and use that representation. Storing every entry on the list twice (once as Unicode, once as ASCII) would be a lot of overhead.
This is something I was investigating a few days ago.
But before following this way, I have to find a valid Unicode/Punicode translation library or to create one if it doesn’t exist.
I’m planning to add IDN support to RoboDomain. This task will probably force me to give a deeper look at the current state of Punicode libraries for Ruby. ;)
Is your library any different from http://github.com/pauldix/domainatrix ? Maybe you could join efforts.
Hello Emmanuel,
thanks for asking this question. I already answered it a few days ago on Reddit, I’m quoting here the answer.
[...] public-suffix-list-library-for-ruby [...]