Ever needed to audit cloud APIs across AWS, GCP, and Azure? This tool does exactly that—pulls service definitions, maps out their parameters, and exports everything to Excel workbooks you can actually use.
- Overview
- Features
- Project Layout
- Prerequisites
- Quick Start
- Advanced Usage
- Workflow Reference
- Troubleshooting
- Excel Output Details
- Service Catalog Refresh
- Extending
- License
I built this because manually digging through SDK documentation to understand API parameters is tedious. This extractor grabs live service definitions from AWS, GCP, and Azure, organizes all the operations and request parameters into a clean hierarchy, and spits out Excel files you can share with your team. The CLI walks you through picking a provider and service, then handles the rest.
- Live service catalogs – Pulls the latest service lists from AWS Botocore, GCP Discovery, and Azure REST specs. Everything caches locally in
service_catalog_cache.jsonso you're not stuck online. - Provider-specific extraction – Each cloud has its quirks. I wrote dedicated extractors for Botocore
service-2models, GCP Discovery resources, and Azure OpenAPI specs that actually understand their structure. - Smart Excel output – Generates worksheets with provider-specific headers, hierarchical parameter trees, required-field indicators, and frozen top rows so you can scroll comfortably through massive APIs.
- Interactive CLI – Just run it and answer a few prompts. Press
Lto list available services, and it'll suggest sensible file names like<csp>-<service>-api-extract.xlsx. - Version handling – AWS has weird service variants (looking at you, SageMaker Runtime). GCP has multiple API versions. This tool detects them and creates separate worksheets so nothing gets lost.
| Path | Description |
|---|---|
fetch_api_params.py |
Main CLI that handles provider selection, catalog refresh, prompting, and Excel generation. |
extractors/ |
Provider-specific modules (aws_extractor.py, gcp_extractor.py, azure_extractor.py) that extract {resource, method, tree} data. |
writer/excel_writer.py |
Takes extracted data and builds provider-aware Excel workbooks using openpyxl. |
utils/schema_parser.py |
Recursively transforms JSON schemas into (level, label) rows and marks required parameters. |
utils/cache_manager.py |
Manages reading and writing the service catalog cache. |
OUTPUT_FILES/ |
Where your generated workbooks go by default. |
requirements.txt |
Python dependencies (Botocore/Boto3, Requests, Google client, OpenPyXL). |
- Python 3.8 or newer (check with
python --version). - Internet connection for the first run when fetching provider catalogs. After that, it uses the cache.
- No cloud credentials needed—this just reads public API descriptions from the providers' SDKs and REST specs.
-
Clone this repo and navigate into it:
git clone <csp_api_extractor repo> cd csp_api_extractor -
[Recommended] If using pyenv, set the local Python version:
pyenv local 3.8.13 # or your preferred 3.8+ versionThis creates a
.python-versionfile that ensures consistent Python across your team. -
Run the extractor:
python run_extractor.py -
Answer the prompts:
- Pick your provider:
aws,gcp, orazure - Enter the service name (hit
Lto see available services). For AWS, you can include variants; for GCP, pick the specific version—each creates its own worksheet. - Use the suggested workbook name or type your own (defaults to
OUTPUT_FILES/).
- Pick your provider:
-
The script sets up a local
.venv, installs what it needs, and prints the path to your finished Excel file when done.
If you're managing your own Python environment, skip the helper script and run the CLI directly:
pip install -r requirements.txt
python fetch_api_params.py
Same functionality—you just control the environment yourself.
- Provider – Choose
aws,gcp, orazure. - Service – Type the identifier or press
Lto list options. Include AWS variants or pick specific GCP versions if you need them. - Output file – Defaults to
<csp>-<service>-api-extract.xlsx(adds.xlsxautomatically if you forget). - Destination – Hit Enter to save in
OUTPUT_FILES/, or specify your own path. - The tool prints the full path to your workbook and refreshes the service catalog cache when newer lists are available.
- "Unable to create default output directory" – Check you have write permissions in the project folder, or use a different destination.
- "Unable to write to
<file>.xlsx" – Close Excel (or whatever has the file open) and try again. - Network errors while refreshing services – Falls back to the cached list. Run again when you're online to refresh.
- "Service '' not found" – Double-check the service name. Press
Lto see the exact spelling. - Unexpected crash – Try
python run_extractor.pyto rebuild the virtual environment. Still broken? Run withLto confirm the service exists and send me the console output.
| Provider | Columns | Notes |
|---|---|---|
| AWS | <service> API Action, Level 1..N |
One worksheet per service or variant (like sagemaker-runtime). Each action's request parameters show up as an indented tree. |
| GCP | <service> REST Resource, API Method, Level 1..N |
One worksheet per API version. Resources are organized with their REST methods and parameter hierarchies. |
| Azure | <service> Resource Operation, API Method, Level 1..N |
Same layout as GCP, just using Azure's terminology. |
Levelcolumns expand automatically if schemas go deeper than Level 5.- Required parameters show
(required)in their label. - Map/dictionary structures get nested Key/Value rows so nothing gets hidden.
- Top row is frozen so you can scroll through huge APIs without losing context.
- Loads
service_catalog_cache.jsonif it exists. - Tries to fetch the latest service list for your provider.
- Shows you how many services were added or removed.
- Saves the updated catalog back to the cache.
- If the refresh fails (you're offline, got throttled, etc.), it uses the cached list and warns you.
Want to customize this? Here are some starting points:
- Azure OpenAPI tweaks – Edit extractors/azure_extractor.py to select different tags/spec files or cache Swagger docs locally.
- GCP rate limits – Cache Discovery documents if you're running this in automated workflows and hitting rate limits.
- Excel formatting – Modify writer/excel_writer.py to add filters, conditional formatting, or custom layouts.
- Logging – Wrap extractor calls in fetch_api_params.py with structured logging for better diagnostics.
MIT. Use it however you want.