Numerical Domains of China

Discovering the Secret Internet of Chinese People
Reading time: about 3 minutes

I recently noticed that numbers are used a lot in China for email addresses and user names. I also found out that a number of popular websites, such as Alibaba and Baidu, had official domain names that are entirely numbers. It seemed that people had a preference for numbers instead of latin letters, and even big websites wanted to accommodate for this.

My girlfriend later confirmed that there are indeed lots of websites using just numbers as their domains. She told me this is sometimes used as a way to hide websites, mostly gambling and porn, and in some cases even sell access to them by just getting money and giving the secret domain name in exchange.

Wait a second! This sounds like security through obscurity, hiding things in plain sight. It’s a very creative way to restrict and sell access to websites, and it clearly works well enough for their purpose. But that’s not enough to stop us from finding them with a simple script. Numbers are very easy to generate and the fact that we’re looking for domains with all numbers increases our chances of coming across one.

Scanning random domains

My strategy in trying to find these websites is checking random domains until we find one. And the first step in anything involving randomness is to import random. Now we can start our script by writing a generator for random domains.

def domains():
    while True:
        yield "{}.com".format(random.randint(1000, 1000000))

This will give us an endless stream of random domains. After this, we will want to check if these domains actually have a DNS record, which is basically checking if that domain exists. To do that; we can use the socket library, mainly the socket.gethostbyname function.

def ips(domain):
    try:
        yield socket.gethostbyname(domain)
    except socket.error:
        pass

All this code does is try to get the IP address for the domain and return it if we succeed. If the way we’re writing these functions look weird, don’t worry. They actually fit together quite nicely.

These two functions should be enough to do random scans to see if anything turns up. We can use them together like this.

for domain in domains():
    for ip in ips(domain):
        print(domain, ip)

This will start scanning random domains and probably print lots of domains and their IP addresses. Here’s the full code of the scanner.

import socket
import random

def domains():
    while True:
        yield "{}.com".format(random.randint(1000, 1000000))

def ips(domain):
    try:
        yield socket.gethostbyname(domain)
    except socket.error:
        pass

for domain in domains():
    for ip in ips(domain):
        print(domain, ip)

Future work and improvements

We’re getting domains but you will notice some of them are just Domains for sale! pages. In order to help us find interesting domains faster, we can write another function to grab the title of these websites.

def titles(domain):
    try:
        html = requests.get("http://{}".format(domain), timeout=3).text
        title = re.search("<title>(.*?)</title>", html)
        if title:
            yield title.group(1)
    except:
        pass

We can combine this with the two other functions in order to print valid domains with their title, and print just the domains if there isn’t any title.

for domain in domains():
    for ip in ips(domain):
        for title in titles(domain):
            print(domain, ip, title)
            break
        else:
            print(domain, ip)

There are some easy ways this script can be improved in the future. Adding multithreading or an asynchronous DNS implementation might increase the performance. Also highlighting certain keywords and characters in the title should help us find interesting websites more efficiently.

The following pages link here

Citation

If you find this work useful, please cite it as:
@article{yaltirakli201702numericaldomainsofchina,
  title   = "Numerical Domains of China",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2017",
  url     = "https://www.gkbrk.com/2017/02/numerical-domains-of-china/"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation
Gokberk Yaltirakli, "Numerical Domains of China", February, 2017. [Online]. Available: https://www.gkbrk.com/2017/02/numerical-domains-of-china/. [Accessed Dec. 17, 2024].
APA Style
Yaltirakli, G. (2017, February 08). Numerical Domains of China. https://www.gkbrk.com/2017/02/numerical-domains-of-china/
Bluebook Style
Gokberk Yaltirakli, Numerical Domains of China, GKBRK.COM (Feb. 08, 2017), https://www.gkbrk.com/2017/02/numerical-domains-of-china/

Comments

Comment by Geertje123
2017-02-08 at 18:15
Spam probability: 0.248%

Certain numbers have certain meanings in China. Some are lucky, which are more occuring, and some are unlucky numbers, which you wont see much. This is the main reason for their use in domain names. More info here: https://en.wikipedia.org/wiki/Numbers_in_Chinese_culture

Comment by teknologist
2017-02-08 at 09:38
Spam probability: 0.64%

Did you try running your script on the .cn and .com.cn TLDs? I'm sure you'd find more interesting results.

© 2024 Gokberk Yaltirakli