If you are a web developer, chances are you have used whois before. WHOIS allows you to retrieve basic information about a domain such as when it was registered, when it will expire and the contact information of the owner. There are lots of websites and command line tools that allow you to query this information, but they all use the same protocol in the background.
The WHOIS protocol is a simple, plaintext-based protocol that listens on TCP port 43. There is an RFC that defines the protocol, RFC 3912, but it doesn’t give useful information regarding how WHOIS works. I got all the information about this protocol by running the whois command and inspecting the data using Wireshark.
One problematic aspect of the WHOIS protocol is that the responses are designed to be human-readable rather than machine-readable. Thankfully the information we need to extract usually follow a Header name: Header data
format. You should split at the : and turn the header name into lowercase when you are looking for a specific header.
The protocol
WHOIS requests need to be terminated with a carriage return + line feed (\r\n
).
- Connect to whois.iana.org. Send the TLD, followed by a newline. (e.g. Send “com” + “\r\n”)
- A bunch of data about that TLD will be sent. The WHOIS server responsible for that TLD will be sent in a header called
whois
. - Connect to that server and send the full domain name followed by a newline. (e.g. Send “example.com” + “\r\n”)
- The response data you get from this server is the WHOIS data, but there’s usually more data you can get from another server.
- This server’s address is sent to you in a header called
whois server
. Send the request and get the response in the same way as the first server.
If you are going to be doing a large number of queries, you should probably cache your requests to whois.iana.org in order to keep their traffic low.
Implementation in Python
Let’s make a funtion in Python to get the WHOIS server responsible for a Top-Level domain.
def get_tld_server(tld="com"):
sock = socket.socket()
sock.connect(("whois.iana.org", 43))
sock.send("{}\n".format(tld).encode("utf-8"))
for line in sock.makefile():
parts = line.split(":", 2)
if len(parts) > 1:
header_name = parts[0].strip()
header_value = parts[1].strip()
if header_name.lower() == "whois":
return header_value
This function above sends the TLD to the central WHOIS server, parses the response to find a line that looks like whois: whois.verisign-grs.com
.
Now that we have a server, we can get the data like this.
def get_whois_data(domain):
tld = domain.split(".")[-1]
server = get_tld_server(tld)
sock = socket.socket()
sock.connect((server, 43))
sock.send("{}\n".format(domain).encode("utf-8"))
for line in sock.makefile():
parts = line.split(":", 2)
if len(parts) > 1:
header_name = parts[0].strip()
header_value = parts[1].strip()
if header_name.lower() == "whois server":
print(header_value)
yield line.replace("\n", "")
This function queries the actual server and yields the lines that it receives. If there is a second server to get more data from, it prints the address to the console. Using this information, we can modify the function a little to make it query the second server automatically.
def get_whois_data(domain, server=None):
if not server:
tld = domain.split(".")[-1]
server = get_tld_server(tld)
nextserver = None
sock = socket.socket()
sock.connect((server, 43))
sock.send("{}\n".format(domain).encode("utf-8"))
for line in sock.makefile():
parts = line.split(":", 2)
if len(parts) > 1:
header_name = parts[0].strip()
header_value = parts[1].strip()
if header_name.lower() == "whois server":
nextserver = header_value
yield line.replace("\n", "")
if nextserver:
for line in get_whois_data(domain, nextserver):
yield line
We can now use our new get_whois_data() function like this. It should give identical output to other WHOIS utilities.
for line in get_whois_data("gkbrk.com"):
print(line)
Thanks for reading this article about the WHOIS protocol. I hope you enjoyed it. You can find related information in these sources.