-2
0

AO3 Bookmark Counter Script

5mon 2d ago by reddthat.com/u/PumpkinDrama in PumpkinDrama@reddthat.com from chatgpt.com

I wrote a Python script that takes a list of AO3 work IDs and builds a local database tracking how often other works appear in the bookmarks of users who bookmarked those works.

Key properties:

  • Input: plaintext file with AO3 work IDs (one per line)
  • Storage: local SQLite database
  • Data source: AO3 HTML pages (no API)
  • Login: dummy AO3 account via requests.Session
  • Incremental: safe to re-run with additional work IDs; already-seen user–work pairs are not double-counted

This is intended for small–medium batches (e.g. ~40 work IDs), with rate limiting and persistence.


What the script does

  1. Logs into AO3 using a session (credentials via environment variables).

  2. For each input work ID:

    • Scrapes /works/<id>/bookmarks to collect all users who bookmarked it.
  3. For each of those users:

    • Scrapes their bookmarks page to collect all bookmarked work IDs.
  4. Stores data in SQLite:

    • A (user, work) table to prevent duplicate counting.
    • A work_counts table that increments only for new pairs.
  5. On subsequent runs:

    • Previously processed (user, work) pairs are skipped.
    • Counts increase only for genuinely new relationships.

Database schema

relationships(
    user TEXT,
    work TEXT,
    PRIMARY KEY(user, work)
)

work_counts(
    work TEXT PRIMARY KEY,
    count INTEGER
)

Requirements

  • Python 3.9+
  • requests
  • beautifulsoup4

Install dependencies:

pip install requests beautifulsoup4

Set environment variables:

export AO3_USERNAME="your_dummy_username"
export AO3_PASSWORD="your_dummy_password"

Usage

python ao3_bookmark_counter.py work_ids.txt

work_ids.txt should contain one AO3 work ID per line.


Full script

#!/usr/bin/env python3
"""
AO3 Bookmark Counter

Reads AO3 work IDs from a text file, logs into AO3, collects bookmark data,
and stores incremental counts in a SQLite database.

Environment variables required:
- AO3_USERNAME
- AO3_PASSWORD
"""

import os
import sys
import time
import re
import logging
import sqlite3
import requests
from bs4 import BeautifulSoup

AO3_BASE = "https://archiveofourown.org/"

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s: %(message)s"
)

def get_authenticity_token(html):
    soup = BeautifulSoup(html, "html.parser")
    tag = soup.find("input", {"name": "authenticity_token"})
    return tag["value"] if tag else None

def login_to_ao3(username, password):
    session = requests.Session()

    login_page = session.get(f"{AO3_BASE}/users/login")
    login_page.raise_for_status()

    token = get_authenticity_token(login_page.text)
    if not token:
        raise RuntimeError("Could not find authenticity_token")

    response = session.post(
        f"{AO3_BASE}/user_sessions",
        data={
            "user_session[login]": username,
            "user_session[password]": password,
            "authenticity_token": token,
        }
    )
    response.raise_for_status()
    return session

def fetch_bookmark_users(session, work_id):
    users = set()
    page = 1

    while True:
        url = f"{AO3_BASE}/works/{work_id}/bookmarks?page={page}"
        logging.info("Fetching bookmarkers for work %s (page %d)", work_id, page)

        r = session.get(url)
        if r.status_code != 200:
            break

        soup = BeautifulSoup(r.text, "html.parser")
        for a in soup.find_all("a", href=re.compile(r"^/users/")):
            users.add(a.text.strip())

        if soup.find("a", string=re.compile("Next")):
            page += 1
            time.sleep(1)
        else:
            break

    return users

def fetch_user_bookmarks(session, username):
    works = set()
    page = 1
    base = f"{AO3_BASE}/users/{username}/pseuds/{username}/bookmarks"

    while True:
        url = f"{base}?page={page}"
        logging.info("Fetching bookmarks for user %s (page %d)", username, page)

        r = session.get(url)
        if r.status_code != 200:
            break

        soup = BeautifulSoup(r.text, "html.parser")
        for a in soup.find_all("a", href=re.compile(r"^/works/")):
            m = re.match(r"/works/(\d+)", a["href"])
            if m:
                works.add(m.group(1))

        if soup.find("a", string=re.compile("Next")):
            page += 1
            time.sleep(1)
        else:
            break

    return works

def main(work_file):
    user = os.environ.get("AO3_USERNAME")
    password = os.environ.get("AO3_PASSWORD")

    if not user or not password:
        sys.exit("AO3_USERNAME and AO3_PASSWORD must be set")

    with open(work_file) as f:
        work_ids = {line.strip() for line in f if line.strip()}

    db = sqlite3.connect("ao3_bookmarks.db")
    cur = db.cursor()

    cur.execute("""
        CREATE TABLE IF NOT EXISTS relationships(
            user TEXT,
            work TEXT,
            PRIMARY KEY(user, work)
        )
    """)

    cur.execute("""
        CREATE TABLE IF NOT EXISTS work_counts(
            work TEXT PRIMARY KEY,
            count INTEGER
        )
    """)

    db.commit()

    session = login_to_ao3(user, password)

    for work_id in work_ids:
        users = fetch_bookmark_users(session, work_id)

        for u in users:
            bookmarked_works = fetch_user_bookmarks(session, u)

            for w in bookmarked_works:
                cur.execute(
                    "SELECT 1 FROM relationships WHERE user=? AND work=?",
                    (u, w)
                )
                if cur.fetchone():
                    continue

                cur.execute(
                    "INSERT INTO relationships(user, work) VALUES(?, ?)",
                    (u, w)
                )

                cur.execute(
                    "INSERT INTO work_counts(work, count) VALUES(?, 1) "
                    "ON CONFLICT(work) DO UPDATE SET count = count + 1",
                    (w,)
                )

                db.commit()

        time.sleep(2)

    db.close()
    logging.info("Done.")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python ao3_bookmark_counter.py work_ids.txt")
        sys.exit(1)

    main(sys.argv[1])

Notes / Caveats

  • AO3 has no public API; this relies on HTML scraping.

  • Please be polite:

    • Keep request rates low.
    • Use a throwaway account.
    • Do not hammer the site.
  • Bookmark pages can be large; this is intentionally conservative and slow.