Budget App Scraper

I’ve bought a daily budget application last November, and I’ve been entering my expenses to it every day. But when I wanted to process this data with some scripts, I noticed that the application didn’t have export functionality. In hindsight, I should have been more careful while choosing an app. While this was a mistake on my part, it also provided an opportunity to play around with Python and write about it.

While the app doesn’t have exports, it does have a History page with infinite scroll. Because my phone isn’t rooted, I am not able to copy the app files directly. That leaves the history page as my only source of data. My plan is to take screenshots of the whole page, stitch them together and do OCR to read the text.

Taking the screenshots

The first part is the easiest. It can be accomplished with adb and some shell scripting. Here’s a script that takes a screenshot of the phone, scrolls it down a little and repeats this process until you stop it.

#!/bin/sh

TARGET='/home/leo/screen-images'

num=0
while true; do
    formatted=$(printf '%05d' $num);
    adb shell screencap -p > "$TARGET/$formatted.png";
    adb shell input swipe 500 600 500 400;
    ((num++))
done;

This created exactly 200 images in my target folder before it reached the end of the screen. This number will be different based on the app, screen size and how many entries you have saved.

Stitching the images

The next task is to stitch these images together using the common parts. Normally this is a complicated task that requires fancy algorithms. But in our case it is straightforward because these are screenshots, the pixels are moving one axis and everything is aligned perfectly.

#!/usr/bin/env python3
from PIL import Image
from PIL import ImageChops
import numpy as np
import glob

pattern = '/home/leo/screen-images/*.png'
images = sorted(glob.glob(pattern))

final_height = 1440 + (len(images) - 1) * 260
main_image = Image.new('RGB', (720, final_height))

main_image.paste(Image.open(images[0]))

def size(im1, im2):
    box = ImageChops.difference(im1, im2).getbbox()
    if box is None:
        return 0

    return (box[2] - box[0]) * (box[3] - box[1])

def find_overlap_y(img1, img2):
    crop1 = img1.crop((0, 1000, 720, 1440))
    min_y = min(range(500, 1000), key=lambda x: size(crop1, img2.crop((0, x, 720, x+440))))
    return min_y + 440

prev_img = Image.open(images[0])
main_y = 1440
for i, path in enumerate(images[1:]):
    print(path)
    screen = Image.open(path)
    y = find_overlap_y(prev_img, screen)
    cropped = screen.crop((0, y, 720, 1440))
    main_image.paste(cropped, (0, main_y))
    main_y += 1440 - y
    prev_img = screen

main_image.save('/home/leo/test.png', 'PNG')

Comments BETA





Page built on Sun Apr 7 22:36:42 BST 2019