SPAR is a framework that leverages the generative capabilities of LLMs to automatically produce valid, diverse, and semantically accurate PDDL domains from natural language input.
Authors: Songhao Huang *, Yuwei Wu *, Guangyao Shi, Gaurav S. Sukhatme, and Vijay Kumar
Related Paper: Songhao Huang*, Yuwei Wu*, Guangyao Shi, Gaurav S. Sukhatme, and Vijay Kumar. "SPAR: Scalable LLM-based PDDL Domain Generation for Aerial Robotics." arxiv Preprint
If this repo helps your research, please cite our paper at:
@article{huang2025spar,
title={SPAR: Scalable LLM-based PDDL Domain Generation for Aerial Robotics},
author={Huang, Songhao and Wu, Yuwei and Shi, Guangyao and Sukhatme, Gaurav S and Kumar, Vijay},
journal={arXiv preprint arXiv:2509.13691},
year={2025}
}action_gen.py: generate a PDDL domain for one benchmark domain.eval_syntax.py: run domain-generation experiments and aggregate syntax error counts.problem_gen.py: generate PDDL problem files compatible with generated domains.batch_solve.py: solve generated domain/problem pairs with ENHSP.eval/domain_similarity.py: validate plans against generated domains with VAL.pddl_validator.py: syntax and semantic checks used during iterative correction.llm_model.py: model wrapper, embedding lookup, and retrieval utilities.planner/: ENHSP wrapper plus the bundledenhsp-20.jar.prompts/: prompt templates, retrieval assets, and embedding-cache scripts.uav_domain_benchmark/: benchmark and dataset for UAV task-planning domains.
- Python 3.10+
- Java 17+ available as
java, or setJAVA_BIN - At least one model API key:
OPENAI_API_KEYfor OpenAI modelsDEEPSEEK_API_KEYfor DeepSeek models
- Local sentence-transformer checkpoint for retrieval:
local_model/all-mpnet-base-v2
Optional:
VAL_BINforeval/domain_similarity.pylocal_model/bge-reranker-v2-m3for regenerating BGE retrieval caches
Install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtSet the environment variables you need:
export OPENAI_API_KEY=your_key_here
export DEEPSEEK_API_KEY=your_key_here
export JAVA_BIN=java
export VAL_BIN=/path/to/ValidateEdit the variables in the __main__ block of action_gen.py:
_domain_name_str_engine_prompt_method_result_log_dir
Then run:
python action_gen.pyOutputs include the generated domain, intermediate LLM transcripts, extracted predicates/functions, and validation error counts.
eval_syntax.py has two entry paths:
syntax_eval()to generate domains and log validation errorstotal_error_count()to aggregate existing results
Before running, edit the module-level settings in eval_syntax.py, especially:
engine_listprompt_method_restart_listrestart_domainrestart_methodrestart_engine
Then run:
python eval_syntax.pyResults are written under results/<timestamp>/<engine>/<domain>/<prompt_method>/.
problem_gen.py expects generated domains to already exist under results/. Update the module-level variables near the top of the file:
date_strenginegpt_engine- restart controls such as
restart_domain,restart_method, andrestart_problem
Then run:
python problem_gen.pyGenerated problems are written under:
results/<date_str>/<engine>/<domain>/<prompt_method>/pddl/
Edit the module-level configuration in batch_solve.py:
date_strengineprompt_method_restart_list- optional restart filters such as
restart_domain
Then run:
python batch_solve.pyThe script writes .plan files next to generated problems and prints per-method success rates.
eval/domain_similarity.py compares generated plans and generated domains using VAL.
Requirements:
- source plans in the benchmark domain folders under
uav_domain_benchmark/<domain>/pddl/*.plan - generated domains and problems under
results/ VAL_BINpointing to theValidateexecutable
Edit the module-level variables in eval/domain_similarity.py, then run:
python eval/domain_similarity.pyIf you need to rebuild the retrieval cache, update the options in prompts/save_action_embed.py and run:
python prompts/save_action_embed.pyThis only requires OPENAI_API_KEY when the script is configured with use_llm=True.
- ENHSP: planner-based evaluation of generated domain and problem pairs.
- VAL: plan validation.
- all-mpnet-base-v2: retrieval-based prompting and action similarity search.
- OpenAI API and DeepSeek API
For any technical issues, please contact Yuwei Wu (yuweiwu@seas.upenn.edu, yuweiwu20001@outlook.com).
