As I mentioned in the last Status Update, I was working on a small project that used Twitch data. The data I needed was not fancy or complex. I only needed the Live/Online status of the channel, the stream title, and the stream start time. As I am only looking for things that appear on the channel page, I thought I could parse the page HTML. To my disappointment, Twitch had turned their website into a slow, single-page application.
This threw a wrench in the plans, but it wasn’t too bad. I looked at the network tab, hoping to find some endpoints that provided the data I wanted. Instead of something REST-like, I saw that almost all the requests were going to a single endpoint. The endpoint was https://gql.twitch.tv/gql
, which hinted at this being a GraphQL API.
I was not familiar with GraphQL, as I had never used it before. From what I understood, it was like SQL and it lets you query and fetch the data that you need in a structured format. I looked at some online examples and came up with a query to test. When I attempted to run the query; I, unfortunately, encountered an error. It turns out, their GraphQL endpoint needed an API token called Client-ID. If you send a request without this header, you will get the error below.
{
'error': 'Bad Request',
'status': 400,
'message': 'The "Client-ID" header is missing from the request.'
}
A website or app that uses an API needs to either hard-code the tokens or provide another endpoint to fetch them. Twitch embeds this in a JavaScript snippet on the pages. You can grab this with a small regular expression. This token has stayed unchanged for a long time. But it is still a good idea to fetch it every time, in case they decide to change it at some point. Below is a code snippet that can grab this token. While the code snippet is in Python, you can port it to other languages without difficulty. In fact, the original version I wrote was a shell script.
import requests
import re
homepage = requests.get("https://www.twitch.tv").text
client_id = re.search('"Client-ID" ?: ?"(.*?)"', homepage).group(1)
Here is an example script that outputs the number of followers for a user. Please note that you can send many queries with a single POST to a GraphQL endpoint. This will result in better performance.
#!/usr/bin/env python3
import requests
import re
import json
# Fetch client-id
homepage = requests.get("https://www.twitch.tv").text
client_id = re.search('"Client-ID" ?: ?"(.*?)"', homepage).group(1)
def get_followers(username):
query = 'query {user(login: "%s") { followers { totalCount } } }' % username
resp = requests.post(
"https://gql.twitch.tv/gql",
data=json.dumps([{"query": query}]),
headers={"AClient-ID": client_id},
)
return resp.json()[0]["data"]["user"]["followers"]["totalCount"]
print(get_followers("tsoding"))