Module 09 — Real Document Workflows

🎯 Goal: Apply everything to your real day-to-day work — processing the documents and exports you pull out of your company's web tools. ⏱️ Time: 2–4 hours, spread across real tasks.

Almost every company runs on SaaS tools — web apps like Salesforce, Jira, SharePoint, Google Drive, a CRM, an HR portal, or a ticketing system. They're great for storing things and terrible at the boring bulk work: renaming 200 files, pulling fields out of 50 PDFs, cleaning up an ugly export. That bulk work is exactly what you'll automate here.

The approach that works on day one, with the lowest risk: export/download from the SaaS tool by hand, then let Claude Code do the bulk processing locally — no direct SaaS connections. Even so, follow your company's AI and data policy; some workplaces require approval before any AI-assisted handling of company data.

⚠️ Before you point this at real files

This is the module where real company documents enter the picture. Run this checklist before your first real task:

☐ Your company's policy allows this (or you've asked IT — see Module 01).

☐ You're working on copies, never the only original.

☐ You started on non-sensitive or sample files to prove the workflow.

☐ You will not paste document contents into a web chat box — Claude Code reads files locally; that's different from copy-pasting text into a browser.

☐ Stop and check first if the data includes regulated personal, health, or financial information. When unsure, treat it as sensitive.

Working with files you've already downloaded and processing them locally is the low-risk pattern this course is built around. Full detail in Module 10.

The universal pattern

Almost every document task you have is one of these four shapes. Learn to recognize them and you can describe any task to Claude Code:

Extract — pull specific information out of many documents into a table.
Transform — convert/clean/reshape files (PDF→text, messy Excel→tidy Excel).
Combine — merge many files into one, or split one into many.
Organize/Rename — apply naming and filing rules at scale.

Almost every document job is one of four shapes — Extract, Transform, Combine, Organize. Recognize which one you're facing and you can describe any task to Claude Code: download the files, process them locally, done.

Each recipe below is a starting prompt. Adapt the specifics to your documents — you know what they contain; tell Claude Code.

Always start safe

For every workflow below:

Work on copies. Put downloaded files in a dedicated folder like ~/work-automation/inbox/ and never point a tool at your only copy.
Set up a project once: uv init document-automation && cd document-automation, then run claude inside it. Reuse it for all these tasks.
Dry-run / preview first, then execute (you learned this in Module 08).
Spot-check the output against a couple of source documents by hand.

Recipe 1 — Extract data from many PDFs into a spreadsheet

The classic. You have 50 benefit summary PDFs; you need a spreadsheet of key fields.

I have a folder of PDF benefit documents at ~/work-automation/inbox. For each PDF, I need to extract: the member/client name, the plan name, and the effective date. Start by opening just ONE pdf and showing me all the text you can read from it, so we can find exactly where those fields live. Don't build the full thing yet.

Once you've identified where the fields are, continue:

Great. The member name always follows "Member Name:", the plan follows "Plan:", and the effective date follows "Effective:". Build a program that does this for every PDF and writes the results to an Excel file called summary.xlsx with columns Member, Plan, Effective Date, and Source File. Run it and show me the first few rows.

If your PDFs are scanned images (you can't select the text in Preview), tell Claude Code — it can use OCR (optical character recognition) to read them. Just say "these are scanned PDFs, the text isn't selectable."

Recipe 2 — Clean up a messy spreadsheet

Spreadsheet exports from SaaS tools are often ugly — extra system columns, weird internal header names, inconsistent dates.

I have an Excel export at ~/work-automation/inbox/export.xlsx. Open it and describe what columns it has and what the data looks like. Then I'll tell you how I want it cleaned.

Then, based on what you see:

Please: remove the columns "sys_id" and "internal_ref", rename "u_member_name" to "Member", make all the dates use MM/DD/YYYY format, and sort by Member name. Save the result as export-clean.xlsx and keep the original untouched.

Recipe 3 — Batch rename files to a standard

You downloaded 200 files from a SaaS tool with inconsistent names; your team needs them named LASTNAME_FIRSTNAME_PLANYEAR.pdf.

The PDFs in ~/work-automation/inbox have inconsistent names. Each file's first page contains the member's name and the plan year. I want them renamed to the format LASTNAME_FIRSTNAME_PLANYEAR.pdf. First, do a DRY RUN: show me a table of each current filename and what you'd rename it to, but don't actually rename anything yet. I'll review before you proceed.

Review the proposed renames carefully, then:

The proposed names look right. Go ahead and rename them. Keep a log file recording the old and new name of each, in case I need to reverse it.

That log file is your undo button — always ask for one when renaming in bulk.

Recipe 4 — Merge or split PDFs

Combine all the PDFs in ~/work-automation/inbox into a single file called packet.pdf, in alphabetical order by filename, and add a divider page before each one showing its filename.

Or the reverse:

Split big-report.pdf into separate files, one per page, named page-01.pdf, page-02.pdf, and so on.

Recipe 5 — Generate a recurring report

This is where the time savings compound — a task you do weekly.

Every week I get a folder of intake forms (PDFs) and I have to produce a summary report. Build me a reusable tool: it should read all PDFs in a folder I specify, pull out [the fields you need], count how many are [pending vs. complete], and produce a one-page summary as both an Excel sheet and a readable PDF. Walk me through using it so I can run it myself each week.

Once it works, ask Claude Code to document it:

Create a CLAUDE.md and a short README in this project explaining how to run the weekly report, so future-me remembers.

Now you have a repeatable, documented tool. Next week, you drop files in a folder and run one command.

Turning a one-off into a routine

When a tool works and you trust it, you can ask Claude Code to make it dead-simple to re-run:

Make a single command or a tiny script I can run that does the whole weekly report from start to finish, and remind me how to use it.

Down the line, if your company approves it, the same logic can be pointed at your SaaS tool's API to skip the manual download — but the processing you've built stays exactly the same. You're building durable skills, not throwaway hacks.

When you eventually get API access (a preview, not for now)

Most SaaS tools offer an API — a way for a program to connect directly and pull files or data without you downloading by hand. If IT later approves a direct connection, the new ingredients are usually:

An API token (a secret key) issued by your tool's admin.
An official Python SDK or REST API for that tool — Claude Code knows the common ones (Salesforce, Google Drive, SharePoint, Jira, and so on).

When that day comes, you'd tell Claude Code "I now have an API token for [tool], help me set up a secure connection to download files automatically" — and it'll guide you, including storing the token safely (Module 10). But not until it's approved and you've read Module 10.

✅ You're done with this module when

You've completed at least one of these recipes on real (copied) files and saved real time.
You instinctively work on copies and dry-run before executing.
You can look at a new task and recognize which of the four shapes it is.
You've built at least one reusable tool with a CLAUDE.md.

Next: Module 10 — Security & Good Habits.