How to Generate a Complete, Deduplicated Index of Your Steemit Posts (Python Script)</h1></center?

gungasnake (51)in #steemit • 2 months ago

Generate a Deduplicated Total Index of Your Steemit Posts (Python Script)

One thing Steemit does not provide is a reliable way for a user to generate a complete, clean list of their own blog posts.

At first glance this sounds trivial: “just list my posts.” In practice, it turns out to be surprisingly difficult.

The reason is that Steemit’s backend and UI counters mix together several different things:

Original top-level posts
Edits to posts (each edit is stored as a separate blockchain operation)
Reblog operations
Historical artifacts that still exist on the chain

As a result, the post count shown on a Steemit profile is not the same thing as “how many distinct blog posts currently exist.”

To solve this properly, you have to walk the blockchain account history directly, handle API edge cases, and deduplicate posts by their stable identifier (author + permlink), keeping only the latest version of each post.

The Python script below does exactly that.

What this script does

Walks your full Steem account history via public RPC nodes
Extracts only top-level posts (not comments or replies)
Handles Steem API limits and node instability
Automatically removes duplicates caused by post edits
Outputs a clean HTML index, newest post first
Produces results that include both your first and most recent posts

The result is a list that reflects the number of real, current blog posts, not blockchain bookkeeping artifacts.

IMPORTANT: what you must change

Before running the script, you must replace:

YOUR_STEEMIT_NAME

with your own Steemit username (without the @ symbol).

For example, if your profile is @exampleuser, then use:

--user exampleuser

The script

 import argparse import html import json import sys import time import requests BASE_URL = "https://steemit.com" NODES = [ "https://api.steemit.com", "https://api.steemitdev.com", "https://api.steemyy.com", "https://steem.justyy.com", "https://api.justyy.com", "https://api2.justyy.com", ] BATCH = 100 # Required by api.steemit.com def rpc_call(node_url, method, params, req_id=1, timeout=60): payload = {"jsonrpc": "2.0", "id": req_id, "method": method, "params": params} r = requests.post(node_url, json=payload, timeout=timeout) r.raise_for_status() data = r.json() if "error" in data: raise RuntimeError(data["error"]) return data["result"] def pick_working_node(): for node in NODES: try: rpc_call(node, "condenser_api.get_dynamic_global_properties", [], 1, 15) return node except Exception: continue raise RuntimeError("No working Steem RPC node found.") def first_tag_from_json_metadata(jm): try: md = json.loads(jm) if jm and jm.strip() else {} tags = md.get("tags") or [] if isinstance(tags, list) and tags: return str(tags[0]) except Exception: pass return "blog" def main(): ap = argparse.ArgumentParser(description="Generate a deduplicated Steemit post index.") ap.add_argument("--user", default="YOUR_STEEMIT_NAME", help="Replace YOUR_STEEMIT_NAME with your own username") args = ap.parse_args() user = args.user node = pick_working_node() print(f"Using RPC node: {node}", file=sys.stderr) posts = [] seen = set() start = -1 req_id = 1000 prev_oldest = None while True: limit = BATCH if start < 0 else min(BATCH, start) batch = rpc_call( node, "condenser_api.get_account_history", [user, start, limit], req_id, 60 ) req_id += 1 if not batch: break oldest_idx = batch[0][0] if prev_oldest is not None and oldest_idx >= prev_oldest: break prev_oldest = oldest_idx for idx, item in reversed(batch): op = item.get("op") if not op or len(op) != 2: continue op_type, data = op if op_type != "comment": continue if data.get("author") != user: continue if data.get("parent_author"): continue permlink = data.get("permlink") key = (user, permlink) if key in seen: continue seen.add(key) title = html.escape(data.get("title", ""), quote=True) created = item.get("timestamp", "") category = first_tag_from_json_metadata(data.get("json_metadata", "")) url = f"{BASE_URL}/{category}/@{user}/{permlink}" posts.append((created, title, url)) if oldest_idx == 0: break start = oldest_idx - 1 time.sleep(0.1) posts.sort(key=lambda x: x[0], reverse=True) print("<html><body>") for created, title, url in posts: print(f'<a href="{url}">{title} ({created})</a><br>') print("</body></html>") if __name__ == "__main__": main()

How to run it

 python steemit_index.py --user YOUR_STEEMIT_NAME > steemit_index.html

Open steemit_index.html in your browser and you will have a complete, deduplicated index of your Steemit blog posts, newest first.

Final note

Steemit’s profile counter may report a larger number. That number includes edits and other non-post operations. The output of this script reflects the number of distinct, current blog posts, which is what most users actually care about.

Feel free to use, modify, or share this script.

#programming

2 months ago in #steemit by gungasnake (51)

$0.00

STEEM 0.06

TRX 0.30

JST 0.054

BTC 74028.94

ETH 2308.10

USDT 1.00

SBD 0.51