diff --git a/.github/workflows/docs-autogen-pr.yml b/.github/workflows/docs-autogen-pr.yml new file mode 100644 index 000000000..b4e8125bf --- /dev/null +++ b/.github/workflows/docs-autogen-pr.yml @@ -0,0 +1,84 @@ +name: Auto-generate docs PR (PyPI mellea) + +on: + push: + tags: + - "v*" + workflow_dispatch: {} + +permissions: + contents: write + pull-requests: write + +concurrency: + group: docs-autogen-pr-${{ github.ref }} + cancel-in-progress: true + +jobs: + docs_autogen: + runs-on: ubuntu-latest + + steps: + - name: Checkout + uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.11" + cache: "pip" + + - name: Resolve version from tag + id: ver + shell: bash + run: | + set -euo pipefail + TAG="${GITHUB_REF_NAME}" # e.g. v0.3.0 + VERSION="${TAG#v}" # 0.3.0 + echo "tag=$TAG" >> "$GITHUB_OUTPUT" + echo "version=$VERSION" >> "$GITHUB_OUTPUT" + echo "Using version: $VERSION" + + - name: Install tooling dependencies (decorator deps etc.) + run: | + python -m pip install --upgrade pip + pip install -r tooling/docs-autogen/requirements.txt || true + + - name: Generate API docs + merge docs.json (install-mode) + run: | + set -euxo pipefail + python3 tooling/docs-autogen/generate-ast.py \ + --docs-json docs/docs/docs.json \ + --docs-root docs/docs \ + --pypi-name mellea \ + --pypi-version "${{ steps.ver.outputs.version }}" + + - name: Decorate API MDX (SidebarFix + pills) + run: | + set -euxo pipefail + python3 tooling/docs-autogen/decorate_api_mdx.py \ + --docs-root docs/docs + + - name: Create Pull Request (only if there are changes) + id: cpr + uses: peter-evans/create-pull-request@v6 + with: + commit-message: "docs: auto-generate (PyPI mellea ${{ steps.ver.outputs.version }})" + title: "docs: auto-generate (PyPI mellea ${{ steps.ver.outputs.version }})" + body: | + This PR was automatically generated for tag `${{ steps.ver.outputs.tag }}`. + + Target PyPI version: `${{ steps.ver.outputs.version }}` + branch: "automation/docs-autogen-${{ steps.ver.outputs.version }}" + delete-branch: true + labels: | + documentation + automation + + - name: Cleanup local tooling environment (always) + if: always() + run: | + rm -rf .venv-docs-autogen .mdxify-run-cwd || true + rm -rf docs/api || true diff --git a/docs/docs/LICENSE b/docs/docs/LICENSE new file mode 100644 index 000000000..261eeb9e9 --- /dev/null +++ b/docs/docs/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/docs/docs/README.md b/docs/docs/README.md new file mode 100644 index 000000000..6b2a3d914 --- /dev/null +++ b/docs/docs/README.md @@ -0,0 +1,41 @@ +# πŸ“š Mellea Documentation + +This repository contains the documentation for the [**Mellea**](https://github.com/generative-computing/mellea) project. It provides clear, developer-focused guides and reference materials for working with the Mellea platform. + +Visit Mellea documentation site: [https://mellea.ai/](https://mellea.ai) + +--- + +## πŸ”Ž About Mellea + +**Mellea** is a library for writing generative programs. Generative programming replaces flaky agents and brittle prompts with structured, maintainable, robust, and efficient AI workflows. + +--- + +## πŸš€ Getting Started + +Follow these steps to run the documentation site locally: + +### 1️⃣ Install Mintlify CLI + +````bash +npm install -g mint + + +## πŸš€ Getting Started + +### 1️⃣ Install Mintlify CLI globally + +```bash +npm install -g mint +```` + +### 2️⃣ Run locally + +```bash +mint dev +``` + +Your site will be available at [http://localhost:3000](http://localhost:3000). + +--- diff --git a/docs/docs/analytics.js b/docs/docs/analytics.js new file mode 100644 index 000000000..03c0f88c2 --- /dev/null +++ b/docs/docs/analytics.js @@ -0,0 +1,64 @@ +// IBM Analytics setup for Mintlify +// Configure this script as a custom JS file in your Mintlify settings + +(function() { + // Only run in production or when explicitly enabled + if (window.location.hostname === 'localhost' || window.location.hostname === '127.0.0.1') { + console.log('IBM Analytics disabled in development'); + return; + } + + // Set up IBM Analytics configuration + window.idaPageIsSPA = true; + + // Configure digital data + window.digitalData = { + page: { + category: { + primaryCategory: 'PC340' + }, + pageInfo: { + ibm: { + siteId: 'granite-developer-enablement' + } + } + } + }; + + // Configure IBM Analytics settings + window._ibmAnalytics = { + settings: { + name: 'granite-developer-enablement', + isSpa: true + } + }; + + // Load IBM Analytics library + var script = document.createElement('script'); + script.src = 'https://1.www.s81c.com/common/stats/ibm-common.js'; + script.type = 'text/javascript'; + script.defer = true; + document.head.appendChild(script); + + function trackPageview() { + // Wait for IBM Analytics to load, then track pageview + setTimeout(function() { + if (window.ibmStats && typeof window.ibmStats.pageview === 'function') { + window.ibmStats.pageview(); + } + }, 100); + } + + // Track initial page load + if (document.readyState === 'loading') { + document.addEventListener('DOMContentLoaded', trackPageview); + } else { + trackPageview(); + } + + if(window.navigation) { + window.navigation.addEventListener("navigate", trackPageview) + } else { + window.addEventListener('popstate', trackPageview); + } +})(); \ No newline at end of file diff --git a/docs/docs/api/cli/alora/commands.mdx b/docs/docs/api/cli/alora/commands.mdx new file mode 100644 index 000000000..c1c67e65c --- /dev/null +++ b/docs/docs/api/cli/alora/commands.mdx @@ -0,0 +1,38 @@ +--- +title: "cli.alora.commands" +sidebarTitle: "cli.alora.commands" +--- + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `alora_train` + +```python +alora_train(datafile: str = typer.Argument(..., help='JSONL file with item/label pairs'), basemodel: str = typer.Option(..., help='Base model ID or path'), outfile: str = typer.Option(..., help='Path to save adapter weights'), promptfile: str = typer.Option(None, help='Path to load the prompt format file'), adapter: str = typer.Option('alora', help='Adapter type: alora or lora'), epochs: int = typer.Option(6, help='Number of training epochs'), learning_rate: float = typer.Option(6e-06, help='Learning rate'), batch_size: int = typer.Option(2, help='Per-device batch size'), max_length: int = typer.Option(1024, help='Max sequence length'), grad_accum: int = typer.Option(4, help='Gradient accumulation steps')) +``` + + +Train an aLoRA or LoRA model on your dataset. + + +
+ +### FUNC `alora_upload` + +```python +alora_upload(weightfile: str = typer.Argument(..., help='Path to saved adapter weights'), name: str = typer.Option(..., help='Destination model name (e.g., acme/carbchecker-alora)')) +``` + + +Upload trained adapter to remote model registry. + +
diff --git a/docs/docs/api/cli/alora/train.mdx b/docs/docs/api/cli/alora/train.mdx new file mode 100644 index 000000000..f98a53286 --- /dev/null +++ b/docs/docs/api/cli/alora/train.mdx @@ -0,0 +1,76 @@ +--- +title: "cli.alora.train" +sidebarTitle: "cli.alora.train" +--- + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `load_dataset_from_json` + +```python +load_dataset_from_json(json_path, tokenizer, invocation_prompt) +``` + +
+ +### FUNC `formatting_prompts_func` + +```python +formatting_prompts_func(example) +``` + +
+ +### FUNC `train_model` + +```python +train_model(dataset_path: str, base_model: str, output_file: str, prompt_file: str | None = None, adapter: str = 'alora', run_name: str = 'multiclass_run', epochs: int = 6, learning_rate: float = 6e-06, batch_size: int = 2, max_length: int = 1024, grad_accum: int = 4) +``` + +
+ +## Classes + +
+ + +### CLASS `SaveBestModelCallback` + + +
+**Methods:** + +
+ +#### FUNC `on_evaluate` + +```python +on_evaluate(self, args, state, control, **kwargs) +``` + +
+ +### CLASS `SafeSaveTrainer` + + +
+**Methods:** + +
+ +#### FUNC `save_model` + +```python +save_model(self, output_dir: str | None = None, _internal_call: bool = False) +``` + +
diff --git a/docs/docs/api/cli/alora/upload.mdx b/docs/docs/api/cli/alora/upload.mdx new file mode 100644 index 000000000..6cc862cb9 --- /dev/null +++ b/docs/docs/api/cli/alora/upload.mdx @@ -0,0 +1,31 @@ +--- +title: "cli.alora.upload" +sidebarTitle: "cli.alora.upload" +--- + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `upload_model` + +```python +upload_model(weight_path: str, model_name: str, private: bool = True) +``` + + +Upload a trained adapter (LoRA/aLoRA) to Hugging Face Hub. + +**Args:** +- `weight_path`: Directory containing adapter weights (from save_pretrained). +- `model_name`: Target model repo name (e.g., "acme/carbchecker-alora"). +- `private`: Whether the repo should be private. Default\: True. + +
diff --git a/docs/docs/api/cli/decompose/decompose.mdx b/docs/docs/api/cli/decompose/decompose.mdx new file mode 100644 index 000000000..9e49764ed --- /dev/null +++ b/docs/docs/api/cli/decompose/decompose.mdx @@ -0,0 +1,36 @@ +--- +title: "cli.decompose.decompose" +sidebarTitle: "cli.decompose.decompose" +--- + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `run` + +```python +run(out_dir: Annotated[Path, typer.Option(help='Path to an existing directory to save the output files.')], out_name: Annotated[str, typer.Option(help='Name for the output files. Defaults to "m_result".')] = 'm_decomp_result', prompt_file: Annotated[typer.FileText | None, typer.Option(help='Path to a raw text file containing a task prompt.')] = None, model_id: Annotated[str, typer.Option(help='Model name/id to be used to run the decomposition pipeline.' + ' Defaults to "mistral-small3.2:latest", which is valid for the "ollama" backend.' + " If you have a vLLM instance serving a model from HF with vLLM's OpenAI" + " compatible endpoint, then this option should be set to the model's HF name/id," + ' e.g. "mistralai/Mistral-Small-3.2-24B-Instruct-2506" and the "--backend" option' + ' should be set to "openai".')] = 'mistral-small3.2:latest', backend: Annotated[DecompBackend, typer.Option(help='Backend to be used for inference. Defaults to "ollama".' + ' Options are: "ollama" and "openai".' + ' The "ollama" backend runs a local inference server.' + ' The "openai" backend will send inference requests to any' + " endpoint that's OpenAI compatible.", case_sensitive=False)] = DecompBackend.ollama, backend_req_timeout: Annotated[int, typer.Option(help='Time (in seconds) for timeout to be passed on the model inference requests. Defaults to "300"')] = 300, backend_endpoint: Annotated[str | None, typer.Option(help='The "endpoint URL", sometimes called "base URL",' + ' to reach the model when using the "openai" backend.' + ' This option is required if using "--backend openai".')] = None, backend_api_key: Annotated[str | None, typer.Option(help='The API key for the configured "--backend-endpoint".' + ' If using "--backend openai" this option must be set,' + ' even if you are running locally (an OpenAI compatible server), you' + ' must set this option, it can be set to "EMPTY" if your local' + " server doesn't need it.")] = None, version: Annotated[DecompVersion, typer.Option(help='Version of the mellea program generator template to be used.', case_sensitive=False)] = DecompVersion.latest, input_var: Annotated[list[str] | None, typer.Option(help='If your task needs user input data, you must pass' + ' a descriptive variable name using this option, this way' + ' the variable names can be templated into the generated prompts.' + ' You can pass this option multiple times, one for each input variable name.' + ' These names must be all uppercase, alphanumeric, with words separated by underscores.')] = None) -> None +``` + + +Runs the decomposition pipeline. + + +
+ +## Classes + +
+ + +### CLASS `DecompVersion` + +
diff --git a/docs/docs/api/cli/decompose/pipeline.mdx b/docs/docs/api/cli/decompose/pipeline.mdx new file mode 100644 index 000000000..f40cf6166 --- /dev/null +++ b/docs/docs/api/cli/decompose/pipeline.mdx @@ -0,0 +1,44 @@ +--- +title: "cli.decompose.pipeline" +sidebarTitle: "cli.decompose.pipeline" +--- + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `decompose` + +```python +decompose(task_prompt: str, user_input_variable: list[str] | None = None, model_id: str = 'mistral-small3.2:latest', backend: DecompBackend = DecompBackend.ollama, backend_req_timeout: int = 300, backend_endpoint: str | None = None, backend_api_key: str | None = None) -> DecompPipelineResult +``` + +
+ +## Classes + +
+ + +### CLASS `ConstraintResult` + +
+ +### CLASS `DecompSubtasksResult` + +
+ +### CLASS `DecompPipelineResult` + +
+ +### CLASS `DecompBackend` + +
diff --git a/docs/docs/api/cli/decompose/utils.mdx b/docs/docs/api/cli/decompose/utils.mdx new file mode 100644 index 000000000..ce7922281 --- /dev/null +++ b/docs/docs/api/cli/decompose/utils.mdx @@ -0,0 +1,23 @@ +--- +title: "cli.decompose.utils" +sidebarTitle: "cli.decompose.utils" +--- + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `validate_filename` + +```python +validate_filename(candidate_str: str) -> bool +``` + +
diff --git a/docs/docs/api/cli/eval/commands.mdx b/docs/docs/api/cli/eval/commands.mdx new file mode 100644 index 000000000..78850db0f --- /dev/null +++ b/docs/docs/api/cli/eval/commands.mdx @@ -0,0 +1,27 @@ +--- +title: "cli.eval.commands" +sidebarTitle: "cli.eval.commands" +description: "Use the eval command for LLM-as-a-judge evaluation, given a (set of) test file(s) consisting of prompts, instructions, and optionally, targets." +--- + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +Instantiate a generator model to produce candidate responses, and a judge model to determine whether the instructions have been followed. + +## Functions + +
+ + +### FUNC `eval_run` + +```python +eval_run(test_files: list[str] = typer.Argument(..., help='List of paths to json/jsonl files containing test cases'), backend: str = typer.Option('ollama', '--backend', '-b', help='Generation backend'), model: str = typer.Option(None, '--model', help='Generation model name'), max_gen_tokens: int = typer.Option(256, '--max-gen-tokens', help='Max tokens to generate for responses'), judge_backend: str = typer.Option(None, '--judge-backend', '-jb', help='Judge backend'), judge_model: str = typer.Option(None, '--judge-model', help='Judge model name'), max_judge_tokens: int = typer.Option(256, '--max-judge-tokens', help="Max tokens for the judge model's judgement."), output_path: str = typer.Option('eval_results', '--output-path', '-o', help='Output path for results'), output_format: str = typer.Option('json', '--output-format', help='Either json or jsonl format for results'), continue_on_error: bool = typer.Option(True, '--continue-on-error')) +``` + +
diff --git a/docs/docs/api/cli/eval/runner.mdx b/docs/docs/api/cli/eval/runner.mdx new file mode 100644 index 000000000..2b030185a --- /dev/null +++ b/docs/docs/api/cli/eval/runner.mdx @@ -0,0 +1,146 @@ +--- +title: "cli.eval.runner" +sidebarTitle: "cli.eval.runner" +--- + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `create_session` + +```python +create_session(backend: str, model: str | None, max_tokens: int | None) -> mellea.MelleaSession +``` + + +Create a mellea session with the specified backend and model. + + +
+ +### FUNC `run_evaluations` + +```python +run_evaluations(test_files: List[str], backend: str, model: str | None, max_gen_tokens: int | None, judge_backend: str | None, judge_model: str | None, max_judge_tokens: int | None, output_path: str, output_format: str, continue_on_error: bool) +``` + + +Run all 'unit test' evaluations + + +
+ +### FUNC `execute_test_eval` + +```python +execute_test_eval(test_eval: TestBasedEval, generation_session: mellea.MelleaSession, judge_session: mellea.MelleaSession) -> TestEvalResult +``` + + +Execute a single test evaluation +For each input in the test, generate a response using generation_session +Then, after all inputs are processed, validate using judge_session. + + +
+ +### FUNC `parse_judge_output` + +```python +parse_judge_output(judge_output: str) +``` + +
+ +### FUNC `save_results` + +```python +save_results(results: List[TestEvalResult], output_path: str, output_format: str) +``` + +
+ +### FUNC `summary_stats` + +```python +summary_stats(results: List[TestEvalResult]) +``` + +
+ +## Classes + +
+ + +### CLASS `InputEvalResult` + + +Store results of a single input evaluation (within a unit test). + + + +
+**Methods:** + +
+ +#### FUNC `to_dict` + +```python +to_dict(self) +``` + +
+ +### CLASS `TestEvalResult` + + +Store results of a single test evaluation. + + + +
+**Methods:** + +
+ +#### FUNC `to_dict` + +```python +to_dict(self) +``` + +
+ +#### FUNC `passed_count` + +```python +passed_count(self) -> int +``` + +
+ +#### FUNC `total_count` + +```python +total_count(self) -> int +``` + +
+ +#### FUNC `pass_rate` + +```python +pass_rate(self) -> float +``` + +
diff --git a/docs/docs/api/cli/m.mdx b/docs/docs/api/cli/m.mdx new file mode 100644 index 000000000..d4f800aff --- /dev/null +++ b/docs/docs/api/cli/m.mdx @@ -0,0 +1,29 @@ +--- +title: "cli.m" +sidebarTitle: "cli.m" +description: "Entrypoint for the M CLI." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `callback` + +```python +callback() -> None +``` + + +Perform M Tasks. + +
diff --git a/docs/docs/api/mellea/backends/adapters/adapter.mdx b/docs/docs/api/mellea/backends/adapters/adapter.mdx new file mode 100644 index 000000000..b2766fed7 --- /dev/null +++ b/docs/docs/api/mellea/backends/adapters/adapter.mdx @@ -0,0 +1,250 @@ +--- +title: "mellea.backends.adapters.adapter" +sidebarTitle: "mellea.backends.adapters.adapter" +description: "Module for adapters to backends." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `get_adapter_for_intrinsic` + +```python +get_adapter_for_intrinsic(intrinsic_name: str, intrinsic_adapter_types: list[AdapterType] | tuple[AdapterType, ...], available_adapters: dict[str, T]) -> T | None +``` + + +Finds an adapter from a dict of available adapters based on the intrinsic name and its allowed adapter types. + +**Args:** +- `repo_id`: Name of Hugging Face Hub repository containing the adapters that + implement the intrinsic +- `intrinsic_name`: the name of the intrinsic, like "answerability" +- `intrinsic_adapter_types`: the adapter types allowed for this intrinsic, like ALORA / LORA +- `available_adapters`: the available adapters to choose from; maps adapter.qualified_name to the Adapter + +**Returns:** +- an Adapter if found; else None + + +
+ +## Classes + +
+ + +### CLASS `Adapter` + + +An adapter that can be added to a single backend. + + +
+ +### CLASS `OpenAIAdapter` + + +Adapter for OpenAIBackends. + + + +
+**Methods:** + +
+ +#### FUNC `get_open_ai_path` + +```python +get_open_ai_path(self, base_model_name: str, server_type: _ServerType = _ServerType.LOCALHOST, remote_path: str | None = None) -> str +``` + +Returns the path needed to load the adapter. + +**Args:** +- `base_model_name`: the base model; typically the last part of the huggingface model id like "granite-3.3-8b-instruct" +- `server_type`: the server type (ie LOCALHOST / OPENAI); usually the backend has information on this +- `remote_path`: optional; used only if the server_type is REMOTE_VLLM; base path at which to find the adapter + + +
+ +### CLASS `LocalHFAdapter` + + +Adapter for LocalHFBackends. + + + +
+**Methods:** + +
+ +#### FUNC `get_local_hf_path` + +```python +get_local_hf_path(self, base_model_name: str) -> str +``` + +Returns the path needed to load the adapter. + +**Args:** +- `base_model_name`: the base model; typically the last part of the huggingface model id like "granite-3.3-8b-instruct" + + +
+ +### CLASS `GraniteCommonAdapter` + + +Adapter for intrinsics that utilize the ``granite-common`` library. + + + +
+**Methods:** + +
+ +#### FUNC `get_open_ai_path` + +```python +get_open_ai_path(self, base_model_name: str, server_type: _ServerType = _ServerType.LOCALHOST, remote_path: str | None = None) -> str +``` + +Returns the path needed to load the adapter. + +**Args:** +- `base_model_name`: the base model; typically the last part of the huggingface +model id like "granite-3.3-8b-instruct" +- `server_type`: the server type (ie LOCALHOST / OPENAI); usually the backend +has information on this +- `remote_path`: optional; used only if the server_type is REMOTE_VLLM; base +path at which to find the adapter + + +
+ +#### FUNC `get_local_hf_path` + +```python +get_local_hf_path(self, base_model_name: str) -> str +``` + +Returns the path needed to load the adapter. + +**Args:** +- `base_model_name`: the base model; typically the last part of the huggingface +model id like "granite-3.3-8b-instruct" + + +
+ +#### FUNC `download_and_get_path` + +```python +download_and_get_path(self, base_model_name: str) -> str +``` + +Downloads the required rag intrinsics files if necessary and returns the path to them. + +**Args:** +- `base_model_name`: the base model; typically the last part of the huggingface +model id like "granite-3.3-8b-instruct" + +**Returns:** +- a path to the files + + +
+ +#### FUNC `get_path_on_remote` + +```python +get_path_on_remote(self, base_model_name: str, base_path: str) -> str +``` + +Assumes the files have already been downloaded on the remote server. + + +
+ +### CLASS `AdapterMixin` + + +Mixin class for backends capable of utilizing adapters. + + + +
+**Methods:** + +
+ +#### FUNC `base_model_name` + +```python +base_model_name(self) -> str +``` + +Returns the base_model_id of the model used by the backend. For example, `granite-3.3-8b-instruct` for `ibm-granite/granite-3.3-8b-instruct`. + + +
+ +#### FUNC `add_adapter` + +```python +add_adapter(self, *args, **kwargs) +``` + +Adds the given adapter to the backend. Must not have been added to a different backend. + + +
+ +#### FUNC `load_adapter` + +```python +load_adapter(self, adapter_qualified_name: str) +``` + +Loads the given adapter for the backend. Must have previously been added. + + +
+ +#### FUNC `unload_adapter` + +```python +unload_adapter(self, adapter_qualified_name: str) +``` + +Unloads the given adapter from the backend. + + +
+ +#### FUNC `list_adapters` + +```python +list_adapters(self) -> list[str] +``` + +Lists the adapters added via add_adapter(). + +:returns: list of adapter names that are currently registered with this backend + +
diff --git a/docs/docs/api/mellea/backends/adapters/catalog.mdx b/docs/docs/api/mellea/backends/adapters/catalog.mdx new file mode 100644 index 000000000..8e1e0a764 --- /dev/null +++ b/docs/docs/api/mellea/backends/adapters/catalog.mdx @@ -0,0 +1,72 @@ +--- +title: "mellea.backends.adapters.catalog" +sidebarTitle: "mellea.backends.adapters.catalog" +description: "Catalog of available intrinsics." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +Catalog of intrinsics currently known to Mellea,including metadata about where to find +LoRA and aLoRA adapters that implement said intrinsics. + + +## Functions + +
+ + +### FUNC `known_intrinsic_names` + +```python +known_intrinsic_names() -> list[str] +``` + + +:returns: List of all known user-visible names for intrinsics. + + +
+ +### FUNC `fetch_intrinsic_metadata` + +```python +fetch_intrinsic_metadata(intrinsic_name: str) -> IntriniscsCatalogEntry +``` + + +Retrieve information about the adapter that backs an intrinsic. + +:param intrinsic_name: User-visible name of the intrinsic + +:returns: Metadata about the adapter(s) that implement the intrinsic. + + +
+ +## Classes + +
+ + +### CLASS `AdapterType` + + +Possible types of adapters for a backend. + + +
+ +### CLASS `IntriniscsCatalogEntry` + + +A single row in the main intrinsics catalog table. + +We use Pydantic for this dataclass because the rest of Mellea also uses Pydantic. + +
diff --git a/docs/docs/api/mellea/backends/backend.mdx b/docs/docs/api/mellea/backends/backend.mdx new file mode 100644 index 000000000..b3c24e136 --- /dev/null +++ b/docs/docs/api/mellea/backends/backend.mdx @@ -0,0 +1,34 @@ +--- +title: "mellea.backends.backend" +sidebarTitle: "mellea.backends.backend" +description: "FormatterBackend." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `FormatterBackend` + + +`FormatterBackend`s support legacy model types. + +The `mellea` library was designed to support generative computing with [spanned attention](https://generative.computing/what-are-spans.html) over [generative programming primitives](https://generative.computing/what-are-generative-programs.html). +In the ideal world, context management is handled via span scope-relations and all generative programming primitives are baked into the model via fine-tuning. +I.e., the model's instruction tuning is done in terms of generative programming primitives, and the model is then prompted with the same set of templates that were used for that tuning. + +Today, most models do not yet support spans and even those that do are not properly tuned to leverage generative programming primitives. +The `mellea` library supports these legacy models primarily through prompt engineering surfaced via `FormatterBackends`. +A `FormatterBackend` is a backend that uses hand-engineered prompts for rendering generative programming primitives to a model and parsing responses from the model back into `mellea`. +By default, a `FormatterBackend` uses jinja2 templates for pretty-printing, and relies on the user's ad-hoc logic for parsing. + +
diff --git a/docs/docs/api/mellea/backends/cache.mdx b/docs/docs/api/mellea/backends/cache.mdx new file mode 100644 index 000000000..b70093d8f --- /dev/null +++ b/docs/docs/api/mellea/backends/cache.mdx @@ -0,0 +1,107 @@ +--- +title: "mellea.backends.cache" +sidebarTitle: "mellea.backends.cache" +description: "Caching strategies." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `Cache` + + +A Cache for storing model state (e.g., kv cache). + + + +
+**Methods:** + +
+ +#### FUNC `put` + +```python +put(self, key: str, value: Any) +``` + +Inserts into the cache. May result in eviction of other cached values. + + +
+ +#### FUNC `get` + +```python +get(self, key: str) -> Any | None +``` + +Retrieves a value from the cache. Returns `None` if the `id` has no cached value. May impact which cache values are evicted. + + +
+ +#### FUNC `current_size` + +```python +current_size(self) -> int +``` + +Returns the number of things currently in the cache. Mostly useful for debugging. + + +
+ +### CLASS `SimpleLRUCache` + + +A simple [LRU](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_Recently_Used_(LRU)) cache. + + + +
+**Methods:** + +
+ +#### FUNC `current_size` + +```python +current_size(self) +``` + +Just return the size of the key set. This isn't necessarily safe. + + +
+ +#### FUNC `get` + +```python +get(self, key: str) -> Any | None +``` + +Gets a value from the cache. + + +
+ +#### FUNC `put` + +```python +put(self, key: str, value: Any) +``` + +Put a value into the cache. + +
diff --git a/docs/docs/api/mellea/backends/dummy.mdx b/docs/docs/api/mellea/backends/dummy.mdx new file mode 100644 index 000000000..0d2546a6c --- /dev/null +++ b/docs/docs/api/mellea/backends/dummy.mdx @@ -0,0 +1,40 @@ +--- +title: "mellea.backends.dummy" +sidebarTitle: "mellea.backends.dummy" +description: "This module holds shim backends used for smoke tests." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `DummyBackend` + + +A backend for smoke testing. + + + +
+**Methods:** + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +See constructor for an exmplanation of how DummyBackends work. + +
diff --git a/docs/docs/api/mellea/backends/huggingface.mdx b/docs/docs/api/mellea/backends/huggingface.mdx new file mode 100644 index 000000000..9ca7bd662 --- /dev/null +++ b/docs/docs/api/mellea/backends/huggingface.mdx @@ -0,0 +1,183 @@ +--- +title: "mellea.backends.huggingface" +sidebarTitle: "mellea.backends.huggingface" +description: "A backend that uses the Huggingface Transformers library." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +The purpose of the Hugginface backend is to provide a setting for implementing experimental features. If you want a performance local backend, and do not need experimental features such as Span-based context or ALoras, consider using Ollama backends instead. + + +## Classes + +
+ + +### CLASS `HFAloraCacheInfo` + + +A dataclass for holding some KV cache and associated information. + + +
+ +### CLASS `LocalHFBackend` + + +The LocalHFBackend uses Huggingface's transformers library for inference, and uses a Formatter to convert `Component`s into prompts. This backend also supports Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397). + +This backend is designed for running an HF model for small-scale inference locally on your machine. + +This backend is NOT designed for inference scaling on CUDA-enabled hardware. + + + +
+**Methods:** + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +Generate using the huggingface model. + + +
+ +#### FUNC `processing` + +```python +processing(self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids) +``` + +Process the returned chunks or the complete response. + + +
+ +#### FUNC `post_processing` + +```python +post_processing(self, mot: ModelOutputThunk, conversation: list[dict], _format: type[BaseModelSubclass] | None, tool_calls: bool, tools: dict[str, Callable], seed, input_ids) +``` + +Called when generation is done. + + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk] +``` + +Generate using the completions api. Gives the input provided to the model without templating. + + +
+ +#### FUNC `cache_get` + +```python +cache_get(self, id: str) -> HFAloraCacheInfo | None +``` + +Retrieve from cache. + + +
+ +#### FUNC `cache_put` + +```python +cache_put(self, id: str, v: HFAloraCacheInfo) +``` + +Put into cache. + + +
+ +#### FUNC `base_model_name` + +```python +base_model_name(self) +``` + +Returns the base_model_id of the model used by the backend. For example, `granite-3.3-8b-instruct` for `ibm-granite/granite-3.3-8b-instruct`. + + +
+ +#### FUNC `add_adapter` + +```python +add_adapter(self, adapter: LocalHFAdapter) +``` + +Adds the given adapter to the backend. Must not have been added to a different backend. + + +
+ +#### FUNC `load_adapter` + +```python +load_adapter(self, adapter_qualified_name: str) +``` + +Loads the given adapter for the backend. Must have previously been added. Do not call when generation requests are happening. + + +
+ +#### FUNC `unload_adapter` + +```python +unload_adapter(self, adapter_qualified_name: str) +``` + +Unloads the given adapter from the backend. + + +
+ +#### FUNC `list_adapters` + +```python +list_adapters(self) -> list[str] +``` + +Lists the adapters added via add_adapter(). + +:returns: list of adapter names that are currently registered with this backend + +
diff --git a/docs/docs/api/mellea/backends/kv_block_helpers.mdx b/docs/docs/api/mellea/backends/kv_block_helpers.mdx new file mode 100644 index 000000000..9ca29d882 --- /dev/null +++ b/docs/docs/api/mellea/backends/kv_block_helpers.mdx @@ -0,0 +1,53 @@ +--- +title: "mellea.backends.kv_block_helpers" +sidebarTitle: "mellea.backends.kv_block_helpers" +description: "Utilities for KV smashing." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `legacy_cache_smash` + +```python +legacy_cache_smash(a: LegacyCache, b: LegacyCache) -> LegacyCache +``` + + +Concatenates two LegacyCache Ks and Vs along the time axis. + + +
+ +### FUNC `merge_dynamic_caches` + +```python +merge_dynamic_caches(caches: Iterable[DynamicCache]) -> DynamicCache +``` + + +Merges two DynamicCache Ks and Vs along the time axis. + + +
+ +### FUNC `tokens_to_legacy_cache` + +```python +tokens_to_legacy_cache(model, device: str, tokens_or_cache: BatchEncoding | DynamicCache) -> Iterable[LegacyCache] +``` + + +Prefills and returns Ks and Vs as a LegacyCache. + +
diff --git a/docs/docs/api/mellea/backends/litellm.mdx b/docs/docs/api/mellea/backends/litellm.mdx new file mode 100644 index 000000000..3bfd06748 --- /dev/null +++ b/docs/docs/api/mellea/backends/litellm.mdx @@ -0,0 +1,91 @@ +--- +title: "mellea.backends.litellm" +sidebarTitle: "mellea.backends.litellm" +description: "A generic LiteLLM compatible backend that wraps around the openai python sdk." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `LiteLLMBackend` + + +A generic LiteLLM compatible backend. + + + +
+**Methods:** + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +See `generate_from_chat_context`. + + +
+ +#### FUNC `processing` + +```python +processing(self, mot: ModelOutputThunk, chunk: litellm.ModelResponse | litellm.ModelResponseStream) +``` + +Called during generation to add information from a single ModelResponse or a chunk / ModelResponseStream to the ModelOutputThunk. + +For LiteLLM, tool call parsing is handled in the post processing step. + + +
+ +#### FUNC `post_processing` + +```python +post_processing(self, mot: ModelOutputThunk, conversation: list[dict], tools: dict[str, Callable], thinking, _format) +``` + +Called when generation is done. + + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk] +``` + +Generate using the completions api. Gives the input provided to the model without templating. + +
diff --git a/docs/docs/api/mellea/backends/model_ids.mdx b/docs/docs/api/mellea/backends/model_ids.mdx new file mode 100644 index 000000000..99a85cc9a --- /dev/null +++ b/docs/docs/api/mellea/backends/model_ids.mdx @@ -0,0 +1,25 @@ +--- +title: "mellea.backends.model_ids" +sidebarTitle: "mellea.backends.model_ids" +description: "Dataclasses for ModelIdentifiers." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `ModelIdentifier` + + +The `ModelIdentifier` class wraps around model identification strings. + +
diff --git a/docs/docs/api/mellea/backends/model_options.mdx b/docs/docs/api/mellea/backends/model_options.mdx new file mode 100644 index 000000000..bec040c5e --- /dev/null +++ b/docs/docs/api/mellea/backends/model_options.mdx @@ -0,0 +1,94 @@ +--- +title: "mellea.backends.model_options" +sidebarTitle: "mellea.backends.model_options" +description: "Common ModelOptions for Backend Generation." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `ModelOption` + + +A type that wraps around model options. + +Uses sentinel values (wrapped by @@@) to provide backend and model-agnostic keys for common model options. + +Create a dictionary containing model options like this: + +```python +from mellea.backends import ModelOption +model_options = { + ModelOption.TEMPERATURE : 0.0, + ModelOption.SYSTEM_PROMPT : "You are a helpful assistant" +} +``` + + + +
+**Methods:** + +
+ +#### FUNC `replace_keys` + +```python +replace_keys(options: dict, from_to: dict[str, str]) -> dict[str, Any] +``` + +Returns a new dict with the keys in `options` replaced with the corresponding value for that key in `from_to`. + +* Any key with value == None is treated the same as the key missing. + +* If the destination key already exists in `options`, the original value is kept in the output. + +* Regardless of the presence of the destination key in `options`, + the source key is always absent in the output. + +Example: +```python +>>> options = {"k1": "v1", "k2": "v2", "M1": "m1"} +>>> from_to = {"k1": "M1", "k2": "M2"} + +>>> new_options = replace_keys(options, from_to) +>>> print(new_options) +... {"M1": "m1", "M2": "v2"} +``` + +* Notice that "M1" keeps the original value "m1", rather than "v1". +* Notice that both "k1" and "k2" are absent in the output. + + +
+ +#### FUNC `remove_special_keys` + +```python +remove_special_keys(model_options) -> dict[str, Any] +``` + +Removes all sentiel-valued keys (i.e., those that start with @@@). + + +
+ +#### FUNC `merge_model_options` + +```python +merge_model_options(persistent_opts: dict[str, Any], overwrite_opts: dict[str, Any] | None) -> dict[str, Any] +``` + +Creates a new dict that contains all keys and values from persistent opts and overwrite opts. If there are duplicate keys, overwrite opts key value pairs will be used. + +
diff --git a/docs/docs/api/mellea/backends/ollama.mdx b/docs/docs/api/mellea/backends/ollama.mdx new file mode 100644 index 000000000..be984f6d4 --- /dev/null +++ b/docs/docs/api/mellea/backends/ollama.mdx @@ -0,0 +1,146 @@ +--- +title: "mellea.backends.ollama" +sidebarTitle: "mellea.backends.ollama" +description: "A model backend wrapping the Ollama Python SDK." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `chat_response_delta_merge` + +```python +chat_response_delta_merge(mot: ModelOutputThunk, delta: ollama.ChatResponse) +``` + + +Merges the individual ChatResponse chunks from a streaming response into a single ChatResponse. + +**Args:** +- `mot`: the ModelOutputThunk that the deltas are being used to populated. +- `delta`: the most recent ollama ChatResponse. + + +
+ +## Classes + +
+ + +### CLASS `OllamaModelBackend` + + +A model that uses the Ollama Python SDK for local inference. + + + +
+**Methods:** + +
+ +#### FUNC `is_model_available` + +```python +is_model_available(self, model_name) +``` + +Checks if a specific Ollama model is available locally. + +**Args:** +- `model_name`: The name of the model to check for (e.g., "llama2"). + +**Returns:** +- True if the model is available, False otherwise. + + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +See `generate_from_chat_context`. + + +
+ +#### FUNC `generate_from_chat_context` + +```python +generate_from_chat_context(self, action: Component[C] | CBlock, ctx: Context) -> ModelOutputThunk[C] +``` + +Generates a ModelOutputThunk. The final value for this object can be awaited. + +The new completion is generated from the provided Context using this backend's `Formatter`. + +This implementation treats the `Context` as a chat history, and uses the `ollama.Client.chat()` interface to generate a completion. +This will not always work, because sometimes we want to use non-chat models. + +**Raises:** +- `RuntimeError`: If not called from a thread with a running event loop. + + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk] +``` + +Generate using the generate api. Gives the input provided to the model without templating. + + +
+ +#### FUNC `processing` + +```python +processing(self, mot: ModelOutputThunk, chunk: ollama.ChatResponse, tools: dict[str, Callable]) +``` + +Called during generation to add information from a single ChatResponse to the ModelOutputThunk. + + +
+ +#### FUNC `post_processing` + +```python +post_processing(self, mot: ModelOutputThunk, conversation: list[dict], tools: dict[str, Callable], _format) +``` + +Called when generation is done. + +
diff --git a/docs/docs/api/mellea/backends/openai.mdx b/docs/docs/api/mellea/backends/openai.mdx new file mode 100644 index 000000000..fa6d49c70 --- /dev/null +++ b/docs/docs/api/mellea/backends/openai.mdx @@ -0,0 +1,196 @@ +--- +title: "mellea.backends.openai" +sidebarTitle: "mellea.backends.openai" +description: "A generic OpenAI compatible backend that wraps around the openai python sdk." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `OpenAIBackend` + + +A generic OpenAI compatible backend. + + + +
+**Methods:** + +
+ +#### FUNC `filter_openai_client_kwargs` + +```python +filter_openai_client_kwargs(**kwargs) -> dict +``` + +Filter kwargs to only include valid OpenAI client parameters. + + +
+ +#### FUNC `filter_chat_completions_kwargs` + +```python +filter_chat_completions_kwargs(self, model_options: dict) -> dict +``` + +Filter kwargs to only include valid OpenAI chat.completions.create parameters. + +https://platform.openai.com/docs/api-reference/chat/create + + +
+ +#### FUNC `filter_completions_kwargs` + +```python +filter_completions_kwargs(self, model_options: dict) -> dict +``` + +Filter kwargs to only include valid OpenAI completions.create parameters. + +https://platform.openai.com/docs/api-reference/completions + + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +See `generate_from_chat_context`. + + +
+ +#### FUNC `generate_from_chat_context` + +```python +generate_from_chat_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +Generates a new completion from the provided Context using this backend's `Formatter`. + + +
+ +#### FUNC `processing` + +```python +processing(self, mot: ModelOutputThunk, chunk: ChatCompletion | ChatCompletionChunk) +``` + +Called during generation to add information from a single ChatCompletion or ChatCompletionChunk to the ModelOutputThunk. + +For OpenAI, tool call parsing is handled in the post processing step. + + +
+ +#### FUNC `post_processing` + +```python +post_processing(self, mot: ModelOutputThunk, tools: dict[str, Callable], conversation: list[dict], thinking, seed, _format) +``` + +Called when generation is done. + + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk] +``` + +Generate using the completions api. Gives the input provided to the model without templating. + + +
+ +#### FUNC `base_model_name` + +```python +base_model_name(self) +``` + +Returns the base_model_id of the model used by the backend. For example, `granite-3.3-8b-instruct` for `ibm-granite/granite-3.3-8b-instruct`. + + +
+ +#### FUNC `add_adapter` + +```python +add_adapter(self, adapter: OpenAIAdapter) +``` + +Adds the given adapter to the backend. Must not have been added to a different backend. + + +
+ +#### FUNC `load_adapter` + +```python +load_adapter(self, adapter_qualified_name: str) +``` + +Loads the given adapter for the backend. Must have previously been added. + + +
+ +#### FUNC `unload_adapter` + +```python +unload_adapter(self, adapter_qualified_name: str) +``` + +Unloads the given adapter from the backend. + + +
+ +#### FUNC `list_adapters` + +```python +list_adapters(self) -> list[str] +``` + +Lists the adapters added via add_adapter(). + +:returns: list of adapter names that are currently registered with this backend + +
diff --git a/docs/docs/api/mellea/backends/tools.mdx b/docs/docs/api/mellea/backends/tools.mdx new file mode 100644 index 000000000..9ff27093f --- /dev/null +++ b/docs/docs/api/mellea/backends/tools.mdx @@ -0,0 +1,156 @@ +--- +title: "mellea.backends.tools" +sidebarTitle: "mellea.backends.tools" +description: "Utilities for dealing with tools." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `add_tools_from_model_options` + +```python +add_tools_from_model_options(tools_dict: dict[str, Callable], model_options: dict[str, Any]) +``` + + +If model_options has tools, add those tools to the tools_dict. + + +
+ +### FUNC `add_tools_from_context_actions` + +```python +add_tools_from_context_actions(tools_dict: dict[str, Callable], ctx_actions: list[Component | CBlock] | None) +``` + + +If any of the actions in ctx_actions have tools in their template_representation, add those to the tools_dict. + + +
+ +### FUNC `convert_tools_to_json` + +```python +convert_tools_to_json(tools: dict[str, Callable]) -> list[dict] +``` + + +Convert tools to json dict representation. + +Notes: +- Huggingface transformers library lets you pass in an array of functions but doesn't like methods. +- WatsonxAI uses `from langchain_ibm.chat_models import convert_to_openai_tool` in their demos, but it gives the same values. +- OpenAI uses the same format / schema. + + +
+ +### FUNC `json_extraction` + +```python +json_extraction(text: str) -> Generator[dict, None, None] +``` + + +Yields the next valid json object in a given string. + + +
+ +### FUNC `find_func` + +```python +find_func(d) -> tuple[str | None, Mapping | None] +``` + + +Find the first function in a json-like dictionary. + +Most llms output tool requests in the form `...{"name": string, "arguments": {}}...` + + +
+ +### FUNC `parse_tools` + +```python +parse_tools(llm_response: str) -> list[tuple[str, Mapping]] +``` + + +A simple parser that will scan a string for tools and attempt to extract them; only works for json based outputs. + + +
+ +### FUNC `convert_function_to_tool` + +```python +convert_function_to_tool(func: Callable) -> Tool +``` + + +Imported from Ollama. + + +
+ +## Classes + +
+ + +### CLASS `SubscriptableBaseModel` + + +Class imported from Ollama. + + + +
+**Methods:** + +
+ +#### FUNC `get` + +```python +get(self, key: str, default: Any = None) -> Any +``` + +Get. + +>>> msg = Message(role='user') +>>> msg.get('role') +'user' +>>> msg = Message(role='user') +>>> msg.get('nonexistent') +>>> msg = Message(role='user') +>>> msg.get('nonexistent', 'default') +'default' +>>> msg = Message(role='user', tool_calls=[ Message.ToolCall(function=Message.ToolCall.Function(name='foo', arguments={}))]) +>>> msg.get('tool_calls')[0]['function']['name'] +'foo' + + +
+ +### CLASS `Tool` + + +Class imported from Ollama. + +
diff --git a/docs/docs/api/mellea/backends/utils.mdx b/docs/docs/api/mellea/backends/utils.mdx new file mode 100644 index 000000000..a08be31f3 --- /dev/null +++ b/docs/docs/api/mellea/backends/utils.mdx @@ -0,0 +1,43 @@ +--- +title: "mellea.backends.utils" +sidebarTitle: "mellea.backends.utils" +description: "Utilities for Backends." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `to_chat` + +```python +to_chat(action: Component | CBlock, ctx: Context, formatter: ChatFormatter, system_prompt: str | None) -> list[Chat] +``` + + +Converts a context and an action into a series of dicts to be passed to apply_chat_template . + +This function is used by local inference backends. + + +
+ +### FUNC `to_tool_calls` + +```python +to_tool_calls(tools: dict[str, Callable], decoded_result: str) -> dict[str, ModelToolCall] | None +``` + + +Parse a tool call string. + +
diff --git a/docs/docs/api/mellea/backends/vllm.mdx b/docs/docs/api/mellea/backends/vllm.mdx new file mode 100644 index 000000000..d81cd2181 --- /dev/null +++ b/docs/docs/api/mellea/backends/vllm.mdx @@ -0,0 +1,100 @@ +--- +title: "mellea.backends.vllm" +sidebarTitle: "mellea.backends.vllm" +description: "A backend that uses a VLLM in the current process." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +The purpose of the VLLM backend is to provide a locally running fast inference engine. + + +## Classes + +
+ + +### CLASS `LocalVLLMBackend` + + +The LocalVLLMBackend uses vLLM's python interface for inference, and uses a Formatter to convert `Component`s into prompts. + +The support for Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397) is planned. + +This backend is designed for running an HF model for small-scale inference locally on your machine. + +Its throughput is generally higher than that of LocalHFBackend. +However, it takes longer to load the weights during the instantiation. +Also, if you submit a request one by one, it can be slower. + + + +
+**Methods:** + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +Generate using the huggingface model. + + +
+ +#### FUNC `processing` + +```python +processing(self, mot: ModelOutputThunk, chunk: vllm.RequestOutput) +``` + +Process the returned chunks or the complete response. + + +
+ +#### FUNC `post_processing` + +```python +post_processing(self, mot: ModelOutputThunk, conversation: list[dict], _format: type[BaseModelSubclass] | None, tool_calls: bool, tools: dict[str, Callable], seed) +``` + +Called when generation is done. + + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk] +``` + +Generate using the completions api. Gives the input provided to the model without templating. + +
diff --git a/docs/docs/api/mellea/backends/watsonx.mdx b/docs/docs/api/mellea/backends/watsonx.mdx new file mode 100644 index 000000000..1ec933882 --- /dev/null +++ b/docs/docs/api/mellea/backends/watsonx.mdx @@ -0,0 +1,113 @@ +--- +title: "mellea.backends.watsonx" +sidebarTitle: "mellea.backends.watsonx" +description: "A generic WatsonX.ai compatible backend that wraps around the watson_machine_learning library." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `WatsonxAIBackend` + + +A generic backend class for watsonx SDK. + + + +
+**Methods:** + +
+ +#### FUNC `filter_chat_completions_kwargs` + +```python +filter_chat_completions_kwargs(self, model_options: dict) -> dict +``` + +Filter kwargs to only include valid watsonx chat.completions.create parameters. + + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +See `generate_from_chat_context`. + + +
+ +#### FUNC `generate_from_chat_context` + +```python +generate_from_chat_context(self, action: Component[C] | CBlock, ctx: Context) -> ModelOutputThunk[C] +``` + +Generates a new completion from the provided Context using this backend's `Formatter`. + + +
+ +#### FUNC `processing` + +```python +processing(self, mot: ModelOutputThunk, chunk: dict) +``` + +Called during generation to add information from a single ChatCompletion or ChatCompletionChunk to the ModelOutputThunk. + +For OpenAI-like APIs, tool call parsing is handled in the post processing step. + + +
+ +#### FUNC `post_processing` + +```python +post_processing(self, mot: ModelOutputThunk, conversation: list[dict], tools: dict[str, Callable], seed, _format) +``` + +Called when generation is done. + + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk] +``` + +Generates a completion text. Gives the input provided to the model without templating. + +
diff --git a/docs/docs/api/mellea/core/backend.mdx b/docs/docs/api/mellea/core/backend.mdx new file mode 100644 index 000000000..60219e403 --- /dev/null +++ b/docs/docs/api/mellea/core/backend.mdx @@ -0,0 +1,123 @@ +--- +title: "mellea.core.backend" +sidebarTitle: "mellea.core.backend" +description: "Interfaces for Backends and Generation." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `generate_walk` + +```python +generate_walk(c: CBlock | Component | ModelOutputThunk) -> list[ModelOutputThunk] +``` + + +Returns the generation walk ordering for a Span. + + +
+ +## Classes + +
+ + +### CLASS `Backend` + + +An abstract `Backend`. + + + +
+**Methods:** + +
+ +#### FUNC `generate_from_context` + +```python +generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context] +``` + +Generates a model output from a context. May not mutate the context. This must be called from a running event loop as it creates a task to run the generation request. + +**Args:** +- `action`: The last item of the context should be passed in as an `action` instead of as part of the `ctx`. See `docs/dev/generate_signature_decisions.md`. +- `ctx`: The rest of the context. +- `format`: A response format to used for structured outputs / constrained decoding. +- `model_options`: Any model options to upsert into the defaults for this call. +- `tool_calls`: If `True`, then tool calls are extracts from the `action` `Component`. Assumption\: if tool_calls is enabled, then the action `Component` has a TemplateRepresentation + +**Returns:** +- a tuple of (ModelOutputThunk, Context) where the Context is the new context after the generation has been completed. + + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]] +``` + +
+ +#### FUNC `generate_from_raw` + +```python +generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk] +``` + +Generates a model output from the provided input. Does not use context or templates. + +**Args:** +- `actions`: list of actions to generate responses for. Each action is separate. +- `ctx`: context passed to generation. Currently not used in generate_from_raw +- `format`: A response format to used for structured outputs / constrained decoding. Note\: some backends do not support this parameter. They will log warnings and continue to generate. +- `model_options`: Any model options to upsert into the defaults for this call. +- `tool_calls`: Always set to false unless supported by backend. + + +
+ +#### FUNC `do_generate_walk` + +```python +do_generate_walk(self, action: CBlock | Component | ModelOutputThunk) -> None +``` + +Does the generation walk. + + +
+ +#### FUNC `do_generate_walks` + +```python +do_generate_walks(self, actions: list[CBlock | Component | ModelOutputThunk]) -> None +``` + +Does the generation walk. + +
diff --git a/docs/docs/api/mellea/core/base.mdx b/docs/docs/api/mellea/core/base.mdx new file mode 100644 index 000000000..045e88ad0 --- /dev/null +++ b/docs/docs/api/mellea/core/base.mdx @@ -0,0 +1,479 @@ +--- +title: "mellea.core.base" +sidebarTitle: "mellea.core.base" +description: "Core Classes and Data Structures." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `blockify` + +```python +blockify(s: str | CBlock | Component) -> CBlock | Component +``` + + +`blockify` is a helper function that turns raw strings into CBlocks. + + +
+ +### FUNC `get_images_from_component` + +```python +get_images_from_component(c: Component) -> None | list[ImageBlock] +``` + + +Gets images from a `Component` if they are present and a non-empty list, otherwise returns None. + + +
+ +## Classes + +
+ + +### CLASS `CBlock` + + +A `CBlock` is a block of content that can serve as input to or output from an LLM. + + + +
+**Methods:** + +
+ +#### FUNC `value` + +```python +value(self) -> str | None +``` + +Gets the value of the block. + + +
+ +#### FUNC `value` + +```python +value(self, v: str) +``` + +Sets the value of the block. + + +
+ +### CLASS `ImageBlock` + + +A `ImageBlock` represents an image (as base64 PNG). + + + +
+**Methods:** + +
+ +#### FUNC `is_valid_base64_png` + +```python +is_valid_base64_png(s: str) -> bool +``` + +Checks if a string is a valid base64 string [AIA PAI Nc Hin R v1.0]. + + +
+ +#### FUNC `pil_to_base64` + +```python +pil_to_base64(image: PILImage.Image) -> str +``` + +Converts a PIL image to a base64 string representation. + + +
+ +#### FUNC `from_pil_image` + +```python +from_pil_image(cls, image: PILImage.Image, meta: dict[str, Any] | None = None) -> ImageBlock +``` + +Converts a PIL image to a base64 string representation. + + +
+ +### CLASS `ComponentParseError` + + +Raised by `Component.parse()` when the underlying parsing method throws an exception. + + +
+ +### CLASS `Component` + + +A `Component` is a composite data structure that is intended to be represented to an LLM. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +The set of all the constituent parts of the `Component`. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +Formats the `Component` into a `TemplateRepresentation` or string. + +Returns: a `TemplateRepresentation` or string + + +
+ +#### FUNC `parse` + +```python +parse(self, computed: ModelOutputThunk) -> S +``` + +Parse the expected type from a given `ModelOutputThunk`. + +Calls the Component's underlying `._parse` function. + + +
+ +### CLASS `GenerateType` + + +Used to track what functions can be used to extract a value from a ModelOutputThunk. + + +
+ +### CLASS `ModelOutputThunk` + + +A `ModelOutputThunk` is a special type of `CBlock` that we know came from a model's output. It is possible to instantiate one without the output being computed yet. + + + +
+**Methods:** + +
+ +#### FUNC `is_computed` + +```python +is_computed(self) +``` + +Returns true only if this Thunk has already been filled. + + +
+ +#### FUNC `value` + +```python +value(self) -> str | None +``` + +Gets the value of the block. + + +
+ +#### FUNC `value` + +```python +value(self, v: str) +``` + +Sets the value of the block. + + +
+ +#### FUNC `avalue` + +```python +avalue(self) -> str +``` + +Returns the value of the ModelOutputThunk. Can be used for both async streaming and async non-streaming. + +**Raises:** +- `Exception`: Propagates any errors from the underlying inference engine api request. +- `RuntimeError`: If called when the ModelOutputThunk's generate function is not async compatible. + + +
+ +#### FUNC `astream` + +```python +astream(self) -> str +``` + +Returns the ModelOutputThunk's partial value including the next chunk(s). Can be used for both async streaming and async non-streaming. + +Returns the value of the ModelOutputThunk if streaming is done. + +**Note**: Be careful with calling this function. Only call it from one location at a time. This means you shouldn't pass a ModelOutputThunk to +multiple coroutines/tasks and call astream from those coroutines/tasks simultaneously. We have considered solutions to this but are waiting until +we see this error happen in a real use case. + +**Raises:** +- `Exception`: Propagates any errors from the underlying inference engine api request. +- `RuntimeError`: If called when the ModelOutputThunk's generate function is not async compatible. + + +
+ +### CLASS `ContextTurn` + + +A turn of model input and model output. + + +
+ +### CLASS `Context` + + +A `Context` is used to track the state of a `MelleaSession`. + +A context is immutable. Every alteration leads to a new context. + + + +
+**Methods:** + +
+ +#### FUNC `from_previous` + +```python +from_previous(cls: type[ContextT], previous: Context, data: Component | CBlock) -> ContextT +``` + +Constructs a new context from an existing context. + + +
+ +#### FUNC `reset_to_new` + +```python +reset_to_new(cls: type[ContextT]) -> ContextT +``` + +Returns an empty context for convenience. + + +
+ +#### FUNC `is_root_node` + +```python +is_root_node(self) -> bool +``` + +Returns whether this context is the root context node. + + +
+ +#### FUNC `previous_node` + +```python +previous_node(self) -> Context | None +``` + +Returns the context node from which this context node was created. + +Internal use: Users should not need to use this property. + + +
+ +#### FUNC `node_data` + +```python +node_data(self) -> Component | CBlock | None +``` + +Returns the data associated with this context node. + +Internal use: Users should not need to use this property. + + +
+ +#### FUNC `is_chat_context` + +```python +is_chat_context(self) -> bool +``` + +Returns whether this context is a chat context. + + +
+ +#### FUNC `as_list` + +```python +as_list(self, last_n_components: int | None = None) -> list[Component | CBlock] +``` + +Returns a list of the last n components in the context sorted from FIRST TO LAST. + +If `last_n_components` is `None`, then all components are returned. + + +
+ +#### FUNC `actions_for_available_tools` + +```python +actions_for_available_tools(self) -> list[Component | CBlock] | None +``` + +Provides a list of actions to extract tools from for use with during generation, or None if that's not possible. + +Can be used to make the available tools differ from the tools of all the actions in the context. Can be overwritten by subclasses. + + +
+ +#### FUNC `last_output` + +```python +last_output(self, check_last_n_components: int = 3) -> ModelOutputThunk | None +``` + +The last output thunk of the context. + + +
+ +#### FUNC `last_turn` + +```python +last_turn(self) +``` + +The last input/output turn of the context. + +This can be partial. If the last event is an input, then the output is None. + + +
+ +#### FUNC `add` + +```python +add(self, c: Component | CBlock) -> Context +``` + +Returns a new context obtained by adding `c` to this context. + + +
+ +#### FUNC `view_for_generation` + +```python +view_for_generation(self) -> list[Component | CBlock] | None +``` + +Provides a linear list of context components to use for generation, or None if that is not possible to construct. + + +
+ +### CLASS `TemplateRepresentation` + + +Representing a component as a set of important attributes that can be consumed by the formatter. + + +
+ +### CLASS `GenerateLog` + + +A dataclass for capturing log entries. + +GenerateLog provides a structured way to include various details in log entries, making it useful for maintaining detailed +records of events or operations where context and additional data are significant. + + +
+ +### CLASS `ModelToolCall` + + +A dataclass for capturing the tool calls a model wants to make. + +Provides a unified way to call tools post generation. + + + +
+**Methods:** + +
+ +#### FUNC `call_func` + +```python +call_func(self) -> Any +``` + +A helper function for calling the function/tool represented by this object. + +
diff --git a/docs/docs/api/mellea/core/formatter.mdx b/docs/docs/api/mellea/core/formatter.mdx new file mode 100644 index 000000000..26eb27c9e --- /dev/null +++ b/docs/docs/api/mellea/core/formatter.mdx @@ -0,0 +1,40 @@ +--- +title: "mellea.core.formatter" +sidebarTitle: "mellea.core.formatter" +description: "Interfaces for Formatters." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `Formatter` + + +A Formatter converts `Component`s into strings and parses `ModelOutputThunk`s into `Component`s (or `CBlock`s). + + + +
+**Methods:** + +
+ +#### FUNC `print` + +```python +print(self, c: Component | CBlock) -> str +``` + +Renders a component for input to a model. + +
diff --git a/docs/docs/api/mellea/core/requirement.mdx b/docs/docs/api/mellea/core/requirement.mdx new file mode 100644 index 000000000..de27a8cdf --- /dev/null +++ b/docs/docs/api/mellea/core/requirement.mdx @@ -0,0 +1,149 @@ +--- +title: "mellea.core.requirement" +sidebarTitle: "mellea.core.requirement" +description: "Interface for Requirements." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `default_output_to_bool` + +```python +default_output_to_bool(x: CBlock | str) -> bool +``` + + +Checks if a given output should be marked converted to `True`. + +Checks if the output is exactly equal to "yes" or "y" (case-insensitive). If not, it will also +check if any of the words in the output are "yes" (case-insensitive). + + +
+ +## Classes + +
+ + +### CLASS `ValidationResult` + + +ValidationResults store the output of a Requirement's validation. They can be used to return additional info from validation functions, which is useful for sampling/repairing. + + + +
+**Methods:** + +
+ +#### FUNC `reason` + +```python +reason(self) -> str | None +``` + +Reason for the validation result. + + +
+ +#### FUNC `score` + +```python +score(self) -> float | None +``` + +An optional score for the validation result. + + +
+ +#### FUNC `thunk` + +```python +thunk(self) -> ModelOutputThunk | None +``` + +The ModelOutputThunk associated with the validation func if an llm was used to generate the final result. + + +
+ +#### FUNC `context` + +```python +context(self) -> Context | None +``` + +The context associated with validation if a backend was used to generate the final result. + + +
+ +#### FUNC `as_bool` + +```python +as_bool(self) -> bool +``` + +Return a boolean value based on the result. + + +
+ +### CLASS `Requirement` + + +Requirements are a special type of Component used as input to the Validate step in Instruct/Validate/Repair patterns. + + + +
+**Methods:** + +
+ +#### FUNC `validate` + +```python +validate(self, backend: Backend, ctx: Context) -> ValidationResult +``` + +Chooses the appropriate validation strategy and applies that strategy. + + +
+ +#### FUNC `parts` + +```python +parts(self) +``` + +Returns all of the constituent parts of a Requirement. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +Some object protocol magic happens here with management of the output. + +
diff --git a/docs/docs/api/mellea/core/sampling.mdx b/docs/docs/api/mellea/core/sampling.mdx new file mode 100644 index 000000000..34af1b5f0 --- /dev/null +++ b/docs/docs/api/mellea/core/sampling.mdx @@ -0,0 +1,114 @@ +--- +title: "mellea.core.sampling" +sidebarTitle: "mellea.core.sampling" +description: "Interfaces for Sampling Strategies." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `SamplingResult` + + +Stores the results from a sampling operation. This includes successful and failed samplings. + + + +
+**Methods:** + +
+ +#### FUNC `result` + +```python +result(self) -> ModelOutputThunk[S] +``` + +The final output or result from applying the sampling strategy. + + +
+ +#### FUNC `result_ctx` + +```python +result_ctx(self) -> Context +``` + +The context of the final output or result from applying the sampling strategy. + + +
+ +#### FUNC `result_action` + +```python +result_action(self) -> Component[S] +``` + +The action that generated the final output or result from applying the sampling strategy. + + +
+ +#### FUNC `result_validations` + +```python +result_validations(self) -> list[tuple[Requirement, ValidationResult]] +``` + +The validation results associated with the final output or result from applying the sampling strategy. + + +
+ +### CLASS `SamplingStrategy` + + +A SamplingStrategy class defines an abstract base class for implementing various sampling strategies. + +This class provides a template for creating concrete sampling strategies that can be used to generate model outputs based on given instructions. +It allows setting custom validation and generation functions through properties. + + + +
+**Methods:** + +
+ +#### FUNC `sample` + +```python +sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S] +``` + +This method is the abstract method for sampling a given component. + +It must be implemented by any concrete subclasses to provide specific sampling logic. + +**Args:** +- `action `: The action object to be sampled. +- `context`: The context to be passed to the sampling strategy. +- `backend`: The backend used for generating samples. +- `requirements`: List of requirements to test against (merged with global requirements). +- `validation_ctx`: Optional context to use for validation. If None, validation_ctx = ctx. +- `format`: output format for structured outputs. +- `model_options`: model options to pass to the backend during generation / validation. +- `tool_calls`: True if tool calls should be used during this sampling strategy. + +**Returns:** +- A result object indicating the success or failure of the sampling process. + +
diff --git a/docs/docs/api/mellea/core/utils.mdx b/docs/docs/api/mellea/core/utils.mdx new file mode 100644 index 000000000..6e40f7ba2 --- /dev/null +++ b/docs/docs/api/mellea/core/utils.mdx @@ -0,0 +1,109 @@ +--- +title: "mellea.core.utils" +sidebarTitle: "mellea.core.utils" +description: "Utils for Core Library." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `RESTHandler` + + +RESTHandler for logging. + + + +
+**Methods:** + +
+ +#### FUNC `emit` + +```python +emit(self, record) +``` + +Attempts to emit a record to FLOG, or silently fails. + + +
+ +### CLASS `JsonFormatter` + + +Logging formatter for JSON. + + + +
+**Methods:** + +
+ +#### FUNC `format` + +```python +format(self, record) +``` + +Formats record as a JSON serializable object. + + +
+ +### CLASS `CustomFormatter` + + +A nice custom formatter copied from [https://stackoverflow.com/questions/384076/how-can-i-color-python-logging-output](Sergey Pleshakov's post on StackOvervlow). + + + +
+**Methods:** + +
+ +#### FUNC `format` + +```python +format(self, record) +``` + +The format fn. + + +
+ +### CLASS `FancyLogger` + + +A fancy logger. + + + +
+**Methods:** + +
+ +#### FUNC `get_logger` + +```python +get_logger() +``` + +Returns a FancyLogger.logger and sets level based upon env vars. + +
diff --git a/docs/docs/api/mellea/formatters/chat_formatter.mdx b/docs/docs/api/mellea/formatters/chat_formatter.mdx new file mode 100644 index 000000000..414f80431 --- /dev/null +++ b/docs/docs/api/mellea/formatters/chat_formatter.mdx @@ -0,0 +1,40 @@ +--- +title: "mellea.formatters.chat_formatter" +sidebarTitle: "mellea.formatters.chat_formatter" +description: "ChatFormatter." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `ChatFormatter` + + +Formatter used by Legacy backends to format Contexts as Messages. + + + +
+**Methods:** + +
+ +#### FUNC `to_chat_messages` + +```python +to_chat_messages(self, cs: list[Component | CBlock]) -> list[Message] +``` + +Helper method that converts a linearized chat history into a list of messages. The purpose of this helper is to prepare a sequence of Messages for input to a chat endpoint. + +
diff --git a/docs/docs/api/mellea/formatters/template_formatter.mdx b/docs/docs/api/mellea/formatters/template_formatter.mdx new file mode 100644 index 000000000..15c0fa690 --- /dev/null +++ b/docs/docs/api/mellea/formatters/template_formatter.mdx @@ -0,0 +1,40 @@ +--- +title: "mellea.formatters.template_formatter" +sidebarTitle: "mellea.formatters.template_formatter" +description: "Template Formatter." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `TemplateFormatter` + + +Formatter that uses jinja2 templates. + + + +
+**Methods:** + +
+ +#### FUNC `print` + +```python +print(self, c: Component | CBlock) -> str +``` + +Uses a jinja2 template to pretty-print components. + +
diff --git a/docs/docs/api/mellea/helpers/async_helpers.mdx b/docs/docs/api/mellea/helpers/async_helpers.mdx new file mode 100644 index 000000000..f0e6e13ee --- /dev/null +++ b/docs/docs/api/mellea/helpers/async_helpers.mdx @@ -0,0 +1,108 @@ +--- +title: "mellea.helpers.async_helpers" +sidebarTitle: "mellea.helpers.async_helpers" +description: "Async helper functions." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `send_to_queue` + +```python +send_to_queue(co: Coroutine[Any, Any, AsyncIterator | Any] | AsyncIterator, aqueue: asyncio.Queue) -> None +``` + + +Processes the output of an async chat request by sending the output to an async queue. + + +
+ +### FUNC `wait_for_all_mots` + +```python +wait_for_all_mots(mots: list[ModelOutputThunk]) +``` + + +Helper function to make waiting for multiple ModelOutputThunks to be computed easier. + +All ModelOutputThunks must be from the same event loop. This should always be the case in sampling +functions, session functions, and top-level mellea functions. + + +
+ +### FUNC `get_current_event_loop` + +```python +get_current_event_loop() -> None | asyncio.AbstractEventLoop +``` + + +Get the current event loop without having to catch exceptions. + + +
+ +## Classes + +
+ + +### CLASS `ClientCache` + + +A simple [LRU](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_Recently_Used_(LRU)) cache. + +Used to keep track of clients for backends where the client is tied to a specific event loop. + + + +
+**Methods:** + +
+ +#### FUNC `current_size` + +```python +current_size(self) +``` + +Just return the size of the key set. This isn't necessarily safe. + + +
+ +#### FUNC `get` + +```python +get(self, key: int) -> Any | None +``` + +Gets a value from the cache. + + +
+ +#### FUNC `put` + +```python +put(self, key: int, value: Any) +``` + +Put a value into the cache. + +
diff --git a/docs/docs/api/mellea/helpers/openai_compatible_helpers.mdx b/docs/docs/api/mellea/helpers/openai_compatible_helpers.mdx new file mode 100644 index 000000000..2740046c8 --- /dev/null +++ b/docs/docs/api/mellea/helpers/openai_compatible_helpers.mdx @@ -0,0 +1,69 @@ +--- +title: "mellea.helpers.openai_compatible_helpers" +sidebarTitle: "mellea.helpers.openai_compatible_helpers" +description: "A file for helper functions that deal with OpenAI API compatible helpers." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `extract_model_tool_requests` + +```python +extract_model_tool_requests(tools: dict[str, Callable], response: dict[str, Any]) -> dict[str, ModelToolCall] | None +``` + + +Extracts tool calls from the dict representation of an OpenAI-like chat response object. + + +
+ +### FUNC `chat_completion_delta_merge` + +```python +chat_completion_delta_merge(chunks: list[dict], force_all_tool_calls_separate: bool = False) -> dict +``` + + +Takes a list of deltas from `ChatCompletionChunk`s and merges them into a single dict representing the `ChatCompletion` choice. + +**Args:** +- `chunks`: the list of dicts that represent the message deltas +- `force_all_tool_calls_separate`: if `True`, tool calls in separate message deltas will not be merged (even if their index values are the same); use when providers do not return the correct index value for tool calls. If using this option, all tool calls must be fully populated in a single delta since they won't be merged. + + +
+ +### FUNC `message_to_openai_message` + +```python +message_to_openai_message(msg: Message) +``` + + +Serializes a mellea Message object to the message format required by OpenAI compatible api providers. + + +
+ +### FUNC `messages_to_docs` + +```python +messages_to_docs(msgs: list[Message]) -> list[dict[str, str]] +``` + + +Extracts the docs from a list of messages. + +
diff --git a/docs/docs/api/mellea/stdlib/components/chat.mdx b/docs/docs/api/mellea/stdlib/components/chat.mdx new file mode 100644 index 000000000..988b6bf83 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/chat.mdx @@ -0,0 +1,108 @@ +--- +title: "mellea.stdlib.components.chat" +sidebarTitle: "mellea.stdlib.components.chat" +description: "Chat primitives." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `as_chat_history` + +```python +as_chat_history(ctx: Context) -> list[Message] +``` + + +Returns a list of Messages corresponding to a Context. + + +
+ +## Classes + +
+ + +### CLASS `Message` + + +A single Message in a Chat history. + +TODO\: we may want to deprecate this Component entirely. +The fact that some Component gets rendered as a chat message is `Formatter` miscellania. + + + +
+**Methods:** + +
+ +#### FUNC `images` + +```python +images(self) -> None | list[str] +``` + +Returns the images associated with this message as list of base 64 strings. + + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +Returns all of the constituent parts of an Instruction. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +Formats the content for a Language Model. + +**Returns:** +- The formatted output suitable for language models. + + +
+ +### CLASS `ToolMessage` + + +Adds the name field for function name. + + + +
+**Methods:** + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +The same representation as Message with a name field added to args. + +
diff --git a/docs/docs/api/mellea/stdlib/components/docs/document.mdx b/docs/docs/api/mellea/stdlib/components/docs/document.mdx new file mode 100644 index 000000000..0f651b807 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/docs/document.mdx @@ -0,0 +1,53 @@ +--- +title: "mellea.stdlib.components.docs.document" +sidebarTitle: "mellea.stdlib.components.docs.document" +description: "Document component." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `Document` + + +Documents should typically be used in a Message object. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +The set of all the constituent parts of the `Component`. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> str +``` + +Formats the `Document` into a string. + +Returns: a string + +
diff --git a/docs/docs/api/mellea/stdlib/components/docs/richdocument.mdx b/docs/docs/api/mellea/stdlib/components/docs/richdocument.mdx new file mode 100644 index 000000000..c571522e8 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/docs/richdocument.mdx @@ -0,0 +1,258 @@ +--- +title: "mellea.stdlib.components.docs.richdocument" +sidebarTitle: "mellea.stdlib.components.docs.richdocument" +description: "Representations of Docling Documents." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `RichDocument` + + +A `RichDocument` is a block of content with an underlying DoclingDocument. + +It has helper functions for working with the document and extracting parts of it. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +RichDocument has no parts. + +In the future, we should allow chunking of DoclingDocuments to correspond to parts(). + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +Return Document content as Markdown. + +No template needed here. + + +
+ +#### FUNC `docling` + +```python +docling(self) -> DoclingDocument +``` + +Get the underlying Docling Document. + + +
+ +#### FUNC `to_markdown` + +```python +to_markdown(self) +``` + +Get the full text of the document as markdown. + + +
+ +#### FUNC `get_tables` + +```python +get_tables(self) -> list[Table] +``` + +Return the `Table`s that are a part of this document. + + +
+ +#### FUNC `save` + +```python +save(self, filename: str | Path) -> None +``` + +Save the underlying DoclingDocument for reuse later. + + +
+ +#### FUNC `load` + +```python +load(cls, filename: str | Path) -> RichDocument +``` + +Load a DoclingDocument from a file. The file must already be a DoclingDocument. + + +
+ +#### FUNC `from_document_file` + +```python +from_document_file(cls, source: str | Path | DocumentStream) -> RichDocument +``` + +Process a document with Docling. + + +
+ +### CLASS `TableQuery` + + +Table-specific query. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +The list of cblocks/components on which TableQuery depends. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +Template arguments for Formatter. + + +
+ +### CLASS `TableTransform` + + +Table-specific transform. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +The parts for this component. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +Template arguments for Formatter. + + +
+ +### CLASS `Table` + + +A `Table` represents a single table within a larger Docling Document. + + + +
+**Methods:** + +
+ +#### FUNC `from_markdown` + +```python +from_markdown(cls, md: str) -> Table | None +``` + +Creates a fake document from the markdown and attempts to extract the first table found. + + +
+ +#### FUNC `parts` + +```python +parts(self) +``` + +The current implementation does not necessarily entail any string re-use, so parts is empty. + + +
+ +#### FUNC `to_markdown` + +```python +to_markdown(self) -> str +``` + +Get the `Table` as markdown. + + +
+ +#### FUNC `transpose` + +```python +transpose(self) -> Table | None +``` + +Transposes the table. Will return a new transposed `Table` if successful. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +Return Table representation for Formatter. + +
diff --git a/docs/docs/api/mellea/stdlib/components/genslot.mdx b/docs/docs/api/mellea/stdlib/components/genslot.mdx new file mode 100644 index 000000000..e26beba21 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/genslot.mdx @@ -0,0 +1,358 @@ +--- +title: "mellea.stdlib.components.genslot" +sidebarTitle: "mellea.stdlib.components.genslot" +description: "A method to generate outputs based on python functions and a Generative Slot function." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `create_response_format` + +```python +create_response_format(func: Callable[..., R]) -> type[FunctionResponse[R]] +``` + + +Create a Pydantic response format class for a given function. + +**Args:** +- `func`: A function with exactly one argument + +**Returns:** +- A Pydantic model class that inherits from FunctionResponse[T] + + +
+ +### FUNC `describe_function` + +```python +describe_function(func: Callable) -> FunctionDict +``` + + +Generates a FunctionDict given a function. + +**Args:** +- `func `: Callable function that needs to be passed to generative slot. + +**Returns:** +- Function dict of the passed function. + + +
+ +### FUNC `get_argument` + +```python +get_argument(func: Callable, key: str, val: Any) -> Argument +``` + + +Returns an argument given a parameter. + +Note: Performs additional formatting for string objects, putting them in quotes. + +**Args:** +- `func `: Callable Function +- `key `: Arg key +- `val `: Arg value + +**Returns:** +- an argument object representing the given parameter. + + +
+ +### FUNC `bind_function_arguments` + +```python +bind_function_arguments(func: Callable[P, R], *args: P.args, **kwargs: P.kwargs) -> dict[str, Any] +``` + + +Bind arguments to function parameters and return as dictionary. + +**Args:** +- `func`: The function to bind arguments for. +- `*args`: Positional arguments to bind. +- `**kwargs`: Keyword arguments to bind. + +**Returns:** +- Dictionary mapping parameter names to bound values with defaults applied. + + +
+ +### FUNC `generative` + +```python +generative(func: Callable[P, R]) -> GenerativeSlot[P, R] +``` + + +Convert a function into an AI-powered function. + +This decorator transforms a regular Python function into one that uses an LLM +to generate outputs. The function's entire signature - including its name, +parameters, docstring, and type hints - is used to instruct the LLM to imitate +that function's behavior. The output is guaranteed to match the return type +annotation using structured outputs and automatic validation. + +Notes: +- Works with async functions as well. +- Must pass all parameters for the original function as keyword args. +- Most python type-hinters will not show the default values but will correctly infer them; +this means that you can set default values in the decorated function and the only necessary values will be a session or a (context, backend). + +Tip: Write the function and docstring in the most Pythonic way possible, not +like a prompt. This ensures the function is well-documented, easily understood, +and familiar to any Python developer. The more natural and conventional your +function definition, the better the AI will understand and imitate it. + +The requirements and validation for the generative function operate over a textual representation +of the arguments / outputs (not their python objects). + +**Args:** +- `func`: Function with docstring and type hints. Implementation can be empty (...). + +**Returns:** +- An AI-powered function that generates responses using an LLM based on the +- original function's signature and docstring. + +**Raises:** +- `ValueError`: (raised by @generative) if the decorated function has a parameter name used by generative slots +- `ValidationError`: (raised when calling the generative slot) if the generated output cannot be parsed into the expected return type. Typically happens when the token limit for the generated output results in invalid json. +- `TypeError`: (raised when calling the generative slot) if any of the original function's parameters were passed as positional args +- `PreconditionException`: (raised when calling the generative slot) if the precondition validation of the args fails; catch the exception to get the validation results + +**Examples:** + +```python +>>> from mellea import generative, start_session +>>> session = start_session() +>>> @generative +... def summarize_text(text: str, max_words: int = 50) -> str: +... '''Generate a concise summary of the input text.''' +... ... +>>> +>>> summary = summarize_text(session, text="Long text...", max_words=30) + +>>> from typing import List +>>> from dataclasses import dataclass +>>> +>>> @dataclass +... class Task: +... title: str +... priority: str +... estimated_hours: float +>>> +>>> @generative +... async def create_project_tasks(project_desc: str, count: int) -> List[Task]: +... '''Generate a list of realistic tasks for a project. +... +... Args: +... project_desc: Description of the project +... count: Number of tasks to generate +... +... Returns: +... List of tasks with titles, priorities, and time estimates +... ''' +... ... +>>> +>>> tasks = await create_project_tasks(session, project_desc="Build a web app", count=5) + +>>> @generative +... def analyze_code_quality(code: str) -> Dict[str, Any]: +... '''Analyze code quality and provide recommendations. +... +... Args: +... code: Source code to analyze +... +... Returns: +... Dictionary containing: +... - score: Overall quality score (0-100) +... - issues: List of identified problems +... - suggestions: List of improvement recommendations +... - complexity: Estimated complexity level +... ''' +... ... +>>> +>>> analysis = analyze_code_quality( +... session, +... code="def factorial(n): return n * factorial(n-1)", +... model_options={"temperature": 0.3} +... ) + +>>> @dataclass +... class Thought: +... title: str +... body: str +>>> +>>> @generative +... def generate_chain_of_thought(problem: str, steps: int = 5) -> List[Thought]: +... '''Generate a step-by-step chain of thought for solving a problem. +... +... Args: +... problem: The problem to solve or question to answer +... steps: Maximum number of reasoning steps +... +... Returns: +... List of reasoning steps, each with a title and detailed body +... ''' +... ... +>>> +>>> reasoning = generate_chain_of_thought(session, problem="How to optimize a slow database query?") +``` + + +
+ +## Classes + +
+ + +### CLASS `FunctionResponse` + + +Generic base class for function response formats. + + +
+ +### CLASS `FunctionDict` + + +Return Type for a Function Component. + + +
+ +### CLASS `ArgumentDict` + + +Return Type for a Argument Component. + + +
+ +### CLASS `Argument` + + +An Argument. + + +
+ +### CLASS `Arguments` + +
+ +### CLASS `ArgPreconditionRequirement` + + +Specific requirement with template for validating precondition requirements against a set of args. + + +
+ +### CLASS `PreconditionException` + + +Exception raised when validation fails for a generative slot's arguments. + + +
+ +### CLASS `Function` + + +A Function. + + +
+ +### CLASS `ExtractedArgs` + + +Used to extract the mellea args and original function args. See @generative decorator for additional notes on these fields. + +These args must match those allowed by any overload of GenerativeSlot.__call__. + + +
+ +### CLASS `GenerativeSlot` + + +A generative slot component. + + + +
+**Methods:** + +
+ +#### FUNC `extract_args_and_kwargs` + +```python +extract_args_and_kwargs(*args, **kwargs) -> ExtractedArgs +``` + +Takes a mix of args and kwargs for both the generative slot and the original function and extracts them. Ensures the original function's args are all kwargs. + +**Returns:** +- a dataclass of the required args for mellea and the original function. +- Either session or (backend, context) will be non-None. + +**Raises:** +- `TypeError`: if any of the original function's parameters were passed as positional args + + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +Parts of Genslot. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +Formats the instruction for Formatter use. + + +
+ +### CLASS `SyncGenerativeSlot` + +
+ +### CLASS `AsyncGenerativeSlot` + + +A generative slot component that generates asynchronously and returns a coroutine. + +
diff --git a/docs/docs/api/mellea/stdlib/components/instruction.mdx b/docs/docs/api/mellea/stdlib/components/instruction.mdx new file mode 100644 index 000000000..cb0401a29 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/instruction.mdx @@ -0,0 +1,84 @@ +--- +title: "mellea.stdlib.components.instruction" +sidebarTitle: "mellea.stdlib.components.instruction" +description: "Instructions." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `Instruction` + + +The Instruction in an instruct/validate/repair loop. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) +``` + +Returns all of the constituent parts of an Instruction. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +Formats the instruction for Formatter use. + + +
+ +#### FUNC `apply_user_dict_from_jinja` + +```python +apply_user_dict_from_jinja(user_dict: dict[str, str], s: str) -> str +``` + +Treats s as a jinja string and user_dict as the template values dictionary. + + +
+ +#### FUNC `requirements` + +```python +requirements(self) -> list[Requirement] +``` + +Returns a list of Requirement instances. + + +
+ +#### FUNC `copy_and_repair` + +```python +copy_and_repair(self, repair_string: str) -> Instruction +``` + +Creates a copy of the instruction and adds/overwrites the repair string. + +
diff --git a/docs/docs/api/mellea/stdlib/components/intrinsic/intrinsic.mdx b/docs/docs/api/mellea/stdlib/components/intrinsic/intrinsic.mdx new file mode 100644 index 000000000..3b432bdbc --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/intrinsic/intrinsic.mdx @@ -0,0 +1,80 @@ +--- +title: "mellea.stdlib.components.intrinsic.intrinsic" +sidebarTitle: "mellea.stdlib.components.intrinsic.intrinsic" +description: "Module for Intrinsics." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `Intrinsic` + + +A component representing an intrinsic. + + + +
+**Methods:** + +
+ +#### FUNC `intrinsic_name` + +```python +intrinsic_name(self) +``` + +User-visible name of this intrinsic. + + +
+ +#### FUNC `adapter_types` + +```python +adapter_types(self) -> tuple[AdapterType, ...] +``` + +Tuple of available adapter types that implement this intrinsic. + + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +The set of all the constituent parts of the `Intrinsic`. + +Will need to be implemented by subclasses since not all intrinsics are output +as text / messages. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +`Intrinsic` doesn't implement `format_for_default`. + +Formats the `Intrinsic` into a `TemplateRepresentation` or string. + +Returns: a `TemplateRepresentation` or string + +
diff --git a/docs/docs/api/mellea/stdlib/components/intrinsic/rag.mdx b/docs/docs/api/mellea/stdlib/components/intrinsic/rag.mdx new file mode 100644 index 000000000..ee045475b --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/intrinsic/rag.mdx @@ -0,0 +1,176 @@ +--- +title: "mellea.stdlib.components.intrinsic.rag" +sidebarTitle: "mellea.stdlib.components.intrinsic.rag" +description: "Intrinsic functions related to retrieval-augmented generation." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `check_answerability` + +```python +check_answerability(question: str, documents: collections.abc.Iterable[Document], context: ChatContext, backend: AdapterMixin) -> float +``` + + +Test a user's question for answerability. + +Intrinsic function that checks whether the question in the last user turn of a +chat can be answered by a provided set of RAG documents. + +:param context: Chat context containing the conversation thus far +:param question: Question that the user has posed in response to the last turn in + ``context``. +:param documents: Document snippets retrieved that may or may not answer the + indicated question. +:param backend: Backend instance that supports adding the LoRA or aLoRA adapters + for answerability checks + +:return: Answerability score as a floating-point value from 0 to 1. + + +
+ +### FUNC `rewrite_question` + +```python +rewrite_question(question: str, context: ChatContext, backend: AdapterMixin) -> float +``` + + +Rewrite a user's question for retrieval. + +Intrinsic function that rewrites the question in the next user turn into a +self-contained query that can be passed to the retriever. + +:param context: Chat context containing the conversation thus far +:param question: Question that the user has posed in response to the last turn in + ``context``. +:param backend: Backend instance that supports adding the LoRA or aLoRA adapters + +:return: Rewritten version of ``question``. + + +
+ +### FUNC `find_citations` + +```python +find_citations(response: str, documents: collections.abc.Iterable[Document], context: ChatContext, backend: AdapterMixin) -> list[dict] +``` + + +Find information in documents that supports an assistant response. + +Intrinsic function that finds sentences in RAG documents that support sentences +in a potential assistant response to a user question. + +:param context: Context of the dialog between user and assistant at the point where + the user has just asked a question that will be answered with RAG documents +:param response: Potential assistant response +:param documents: Documents at were used to generate ``response``. These documents + should set the ``doc_id`` field; otherwise the intrinsic will be unable to + specify which document was the source of a given citation. +:param backend: Backend that supports one of the adapters that implements this + intrinsic. +:return: List of records with the following fields: + * ``response_begin`` + * ``response_end`` + * ``response_text`` + * ``citation_doc_id`` + * ``citation_begin`` + * ``citation_end`` + * ``citation_text`` +Begin and end offsets are character offsets into their respective UTF-8 strings. + + +
+ +### FUNC `check_context_relevance` + +```python +check_context_relevance(question: str, document: Document, context: ChatContext, backend: AdapterMixin) -> float +``` + + +Test whether a document is relevant to a user's question. + +Intrinsic function that checks whether a single document contains part or all of +the answer to a user's question. Does not consider the context in which the +question was asked. + +:param context: The chat up to the point where the user asked a question. +:param question: Question that the user has posed. +:param document: A retrieved document snippet +:param backend: Backend instance that supports the adapters that implement this + intrinsic + +:return: Context relevance score as a floating-point value from 0 to 1. + + +
+ +### FUNC `flag_hallucinated_content` + +```python +flag_hallucinated_content(response: str, documents: collections.abc.Iterable[Document], context: ChatContext, backend: AdapterMixin) -> float +``` + + +Flag potentially-hallucinated sentences in an agent's response. + +Intrinsic function that checks whether the sentences in an agent's response to a +user question are faithful to the retrieved document snippets. Sentences that do not +align with the retrieved snippets are flagged as potential hallucinations. + +:param context: A chat log that ends with a user asking a question +:param response: The assistant's response to the user's question in the last turn + of ``context`` +:param documents: Document snippets that were used to generate ``response`` +:param backend: Backend instance that supports the adapters that implement this + intrinsic + +:return: List of records with the following fields: + * response_begin + * response_end + * response_text + * faithfulness_likelihood + * explanation + + +
+ +### FUNC `rewrite_answer_for_relevance` + +```python +rewrite_answer_for_relevance(rewrite_threshold: float = 0.5) -> str +``` + + +Rewrite an assistant answer to improve relevance to the user's question. + +:param context: A chat log that ends with a user asking a question +:param response: The assistant's response to the user's question in the last turn + of ``context`` +:param documents: Document snippets that were used to generate ``response`` +:param backend: Backend instance that supports the adapters that implement this + intrinsic +:param rewrite_threshold: Number between 0.0 and 1.0 that determines how eagerly + to skip rewriting the assistant's answer for relevance. 0.0 means never rewrite + and 1.0 means always rewrite. + +:returns: Either the original response, or a rewritten version of the original + response. + +
diff --git a/docs/docs/api/mellea/stdlib/components/mify.mdx b/docs/docs/api/mellea/stdlib/components/mify.mdx new file mode 100644 index 000000000..02ab0c7ff --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/mify.mdx @@ -0,0 +1,187 @@ +--- +title: "mellea.stdlib.components.mify" +sidebarTitle: "mellea.stdlib.components.mify" +description: "Mify classes and objects." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `mify` + +```python +mify(*args, **kwargs) +``` + + +M-ify an object or class. + +Allows the object (or instances of the class) to be used in m sessions and with m functions. + +For the args below, only specify an _include or an _exclude of for fields and funcs. If both are specified, +include takes precedence. If you specify the same item to be included and excluded, nothing will be included. + +If fields_include or fields_exclude are set: +- the stringify_func will not be used to represent this object to the model +- you must specify a template field or a template in the template_order field that handles a dict with those fields as keys +- it's advised to use fields_include due to the many dunder fields and inherited fields an object/class might have + +Mify sets attributes on the object/class. If the object isn't already an mified/mobject, it will overwrite +the attributes and methods of the object/class necessary for it to be mified. + +**Args:** +- `obj`: either a class or an instance of the class +- `fields_include`: fields of the object to include in its representation to models +- `fields_exclude`: fields of the object to exclude from its representation to models +- `funcs_include`: functions of the object to include in its representation to models +- `funcs_exclude`: functions of the object to exclude from its representation to models +- `query_type`: a specific query component type to use when querying a model +- `transform_type`: a specific transform component type to use when transforming with a model +- `template`: a string representation of a jinja template; takes precedence over template_order +- `template_order`: a template ordering to use when searching for applicable templates +- `parsing_func`: not yet implemented +- `stringify_func`: used to create a string representation of the object + +**Returns:** +- An object if an object was passed in or a decorator (callable) to mify classes. +- If an object is returned, that object will be the same object that was passed in. +- For example, +- ``` +- obj = mify(obj) +- obj.format_for_llm() +- ``` +- and +- ``` +- mify(obj) +- obj.format_for_llm() +- ``` +- are equivalent. +- Most IDEs will not correctly show the type hints for the newly added functions +- for either an mify object or instances of an mified class. For IDE support, write +- ``` +- assert isinstance(obj, MifiedProtocol) +- ``` + + +
+ +## Classes + +
+ + +### CLASS `MifiedProtocol` + + +Adds additional functionality to the MObjectProtocol and modifies MObject functions so that mified objects can be more easily interacted with and modified. + +See the mify decorator for more information. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +TODO\: we need to rewrite this component to use format_for_llm and initializer correctly. + +For now an empty list is the correct behavior. + +[no-index] + + +
+ +#### FUNC `get_query_object` + +```python +get_query_object(self, query: str) -> Query +``` + +Returns the instantiated query object. + +[no-index] + +**Args:** +- `query `: The query string. + + +
+ +#### FUNC `get_transform_object` + +```python +get_transform_object(self, transformation: str) -> Transform +``` + +Returns the instantiated transform object. + +[no-index] + +**Args:** +- `transformation`: the transform string + + +
+ +#### FUNC `content_as_string` + +```python +content_as_string(self) -> str +``` + +Returns the content of the Mified object as a string. + +[no-index] + +Will use the passed in stringify function if provided. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +The representation of an object given to the backend. + +[no-index] + +Sets the TemplateRepresentation fields based on the object and the values +specified during mify. + +See mify decorator for more details. + + +
+ +#### FUNC `parse` + +```python +parse(self, computed: ModelOutputThunk) -> str +``` + +Parse the model output. Returns string value for now. + +[no-index] + +
diff --git a/docs/docs/api/mellea/stdlib/components/mobject.mdx b/docs/docs/api/mellea/stdlib/components/mobject.mdx new file mode 100644 index 000000000..e3160e354 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/mobject.mdx @@ -0,0 +1,245 @@ +--- +title: "mellea.stdlib.components.mobject" +sidebarTitle: "mellea.stdlib.components.mobject" +description: "MObject." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `Query` + + +A Query component. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +Get the parts of the query. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +Format the query for llm. + + +
+ +### CLASS `Transform` + + +A Transform component. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +Get the parts of the transform. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +Format the transform for llm. + + +
+ +### CLASS `MObjectProtocol` + + +Protocol to describe the necessary functionality of a MObject. Implementers should prefer inheriting from MObject than MObjectProtocol. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +Returns a list of parts for MObject. + + +
+ +#### FUNC `get_query_object` + +```python +get_query_object(self, query: str) -> Query +``` + +Returns the instantiated query object. + +**Args:** +- `query `: The query string. + + +
+ +#### FUNC `get_transform_object` + +```python +get_transform_object(self, transformation: str) -> Transform +``` + +Returns the instantiated transform object. + +**Args:** +- `transformation`: the transform string + + +
+ +#### FUNC `content_as_string` + +```python +content_as_string(self) -> str +``` + +Returns the content of MObject as a string. + +The default value is just `str(self)`. +Subclasses should override this method. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +The template representation used by the formatter. + +The default `TemplateRepresentation` uses an automatic +parsing for tools and fields. The content is retrieved +from `content_as_string()`. + + +
+ +### CLASS `MObject` + + +An extension of `Component` for adding query and transform operations. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +MObject has no parts because of how format_for_llm is defined. + + +
+ +#### FUNC `get_query_object` + +```python +get_query_object(self, query: str) -> Query +``` + +Returns the instantiated query object. + +**Args:** +- `query `: The query string. + + +
+ +#### FUNC `get_transform_object` + +```python +get_transform_object(self, transformation: str) -> Transform +``` + +Returns the instantiated transform object. + +**Args:** +- `transformation`: the transform string + + +
+ +#### FUNC `content_as_string` + +```python +content_as_string(self) -> str +``` + +Returns the content of MObject as a string. + +The default value is just `str(self)`. +Subclasses should override this method. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation | str +``` + +The template representation used by the formatter. + +The default `TemplateRepresentation` uses an automatic +parsing for tools and fields. The content is retrieved +from `content_as_string()`. + +
diff --git a/docs/docs/api/mellea/stdlib/components/simple.mdx b/docs/docs/api/mellea/stdlib/components/simple.mdx new file mode 100644 index 000000000..28d2c45c8 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/simple.mdx @@ -0,0 +1,73 @@ +--- +title: "mellea.stdlib.components.simple" +sidebarTitle: "mellea.stdlib.components.simple" +description: "SimpleComponent." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `SimpleComponent` + + +A Component that is make up of named spans. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) +``` + +Returns the values of the kwargs. + + +
+ +#### FUNC `make_simple_string` + +```python +make_simple_string(kwargs) +``` + +Uses <|key|>value</|key|> to represent a simple component. + + +
+ +#### FUNC `make_json_string` + +```python +make_json_string(kwargs) +``` + +Uses json. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) +``` + +Uses a string rep. + +
diff --git a/docs/docs/api/mellea/stdlib/components/unit_test_eval.mdx b/docs/docs/api/mellea/stdlib/components/unit_test_eval.mdx new file mode 100644 index 000000000..0a6e50a3f --- /dev/null +++ b/docs/docs/api/mellea/stdlib/components/unit_test_eval.mdx @@ -0,0 +1,112 @@ +--- +title: "mellea.stdlib.components.unit_test_eval" +sidebarTitle: "mellea.stdlib.components.unit_test_eval" +description: "LLM Evaluation with Unit Tests in Mellea." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `Message` + + +Schema for a message in the test data. + + +
+ +### CLASS `Example` + + +Schema for an example in the test data. + + +
+ +### CLASS `TestData` + + +Schema for test data loaded from json. + + + +
+**Methods:** + +
+ +#### FUNC `validate_examples` + +```python +validate_examples(cls, v) +``` + +Ensure examples list is not empty. + + +
+ +### CLASS `TestBasedEval` + + +Each TestBasedEval represents a single unit test. + + + +
+**Methods:** + +
+ +#### FUNC `parts` + +```python +parts(self) -> list[Component | CBlock] +``` + +The set of constituent parts of the Component. + + +
+ +#### FUNC `format_for_llm` + +```python +format_for_llm(self) -> TemplateRepresentation +``` + +Formats the test for judge evaluation. + + +
+ +#### FUNC `set_judge_context` + +```python +set_judge_context(self, input_text: str, prediction: str, targets_for_input: list[str]) +``` + +Set context for judge evaluation. + + +
+ +#### FUNC `from_json_file` + +```python +from_json_file(cls, filepath: str) -> list['TestBasedEval'] +``` + +Load test evaluations from json/jsonl file, return list of TestBasedEval instances, one per 'unit test'. + +
diff --git a/docs/docs/api/mellea/stdlib/context.mdx b/docs/docs/api/mellea/stdlib/context.mdx new file mode 100644 index 000000000..69b00d2e2 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/context.mdx @@ -0,0 +1,85 @@ +--- +title: "mellea.stdlib.context" +sidebarTitle: "mellea.stdlib.context" +description: "Basic Contexts." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `ChatContext` + + +Initializes a chat context with unbounded window_size and is_chat=True by default. + + + +
+**Methods:** + +
+ +#### FUNC `add` + +```python +add(self, c: Component | CBlock) -> ChatContext +``` + +Add a new component/cblock to the context. Returns the new context. + + +
+ +#### FUNC `view_for_generation` + +```python +view_for_generation(self) -> list[Component | CBlock] | None +``` + +Returns the context in a linearized form. Uses the window_size set during initialization. + + +
+ +### CLASS `SimpleContext` + + +A `SimpleContext` is a context in which each interaction is a separate and independent turn. The history of all previous turns is NOT saved.. + + + +
+**Methods:** + +
+ +#### FUNC `add` + +```python +add(self, c: Component | CBlock) -> SimpleContext +``` + +Add a new component/cblock to the context. Returns the new context. + + +
+ +#### FUNC `view_for_generation` + +```python +view_for_generation(self) -> list[Component | CBlock] | None +``` + +Returns an empty list. + +
diff --git a/docs/docs/api/mellea/stdlib/functional.mdx b/docs/docs/api/mellea/stdlib/functional.mdx new file mode 100644 index 000000000..88ddc539f --- /dev/null +++ b/docs/docs/api/mellea/stdlib/functional.mdx @@ -0,0 +1,280 @@ +--- +title: "mellea.stdlib.functional" +sidebarTitle: "mellea.stdlib.functional" +description: "Functions for Mellea operations like Instruct, Chat, etc..." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `act` + +```python +act(action: Component[S], context: Context, backend: Backend) -> tuple[ModelOutputThunk[S], Context] | SamplingResult[S] +``` + + +Runs a generic action, and adds both the action and the result to the context. + +**Args:** +- `action`: the Component from which to generate. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `requirements`: used as additional requirements when a sampling strategy is provided. +- `strategy`: a SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: if set, the BaseModel to use for constrained decoding. +- `model_options`: additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: if true, tool calling is enabled. + +**Returns:** +- A (ModelOutputThunk, Context) if `return_sampling_results` is `False`, else returns a `SamplingResult`. + + +
+ +### FUNC `instruct` + +```python +instruct(description: str, context: Context, backend: Backend) -> tuple[ModelOutputThunk[str], Context] | SamplingResult[str] +``` + + +Generates from an instruction. + +**Args:** +- `description`: The description of the instruction. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `requirements`: A list of requirements that the instruction can be validated against. +- `icl_examples`: A list of in-context-learning examples that the instruction can be validated against. +- `grounding_context`: A list of grounding contexts that the instruction can use. They can bind as variables using a (key\: str, value\: str | ContentBlock) tuple. +- `user_variables`: A dict of user-defined variables used to fill in Jinja placeholders in other parameters. This requires that all other provided parameters are provided as strings. +- `prefix`: A prefix string or ContentBlock to use when generating the instruction. +- `output_prefix`: A string or ContentBlock that defines a prefix for the output generation. Usually you do not need this. +- `strategy`: A SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: If set, the BaseModel to use for constrained decoding. +- `model_options`: Additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: If true, tool calling is enabled. +- `images`: A list of images to be used in the instruction or None if none. + +**Returns:** +- A (ModelOutputThunk, Context) if `return_sampling_results` is `False`, else returns a `SamplingResult`. + + +
+ +### FUNC `chat` + +```python +chat(content: str, context: Context, backend: Backend) -> tuple[Message, Context] +``` + + +Sends a simple chat message and returns the response. Adds both messages to the Context. + + +
+ +### FUNC `validate` + +```python +validate(reqs: Requirement | list[Requirement], context: Context, backend: Backend) -> list[ValidationResult] +``` + + +Validates a set of requirements over the output (if provided) or the current context (if the output is not provided). + + +
+ +### FUNC `query` + +```python +query(obj: Any, query: str, context: Context, backend: Backend) -> tuple[ModelOutputThunk, Context] +``` + + +Query method for retrieving information from an object. + +**Args:** +- `obj `: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `query`: The string representing the query to be executed against the object. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `format`: format for output parsing. +- `model_options`: Model options to pass to the backend. +- `tool_calls`: If true, the model may make tool calls. Defaults to False. + +**Returns:** +- The result of the query as processed by the backend. + + +
+ +### FUNC `transform` + +```python +transform(obj: Any, transformation: str, context: Context, backend: Backend) -> tuple[ModelOutputThunk | Any, Context] +``` + + +Transform method for creating a new object with the transformation applied. + +**Args:** +- `obj`: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `transformation`: The string representing the query to be executed against the object. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `format`: format for output parsing; usually not needed with transform. +- `model_options`: Model options to pass to the backend. + +**Returns:** +- The result of the transformation as processed by the backend. If no tools were called, +- the return type will be always be (ModelOutputThunk, Context). If a tool was called, the return type will be the return type +- of the function called, usually the type of the object passed in. + + +
+ +### FUNC `aact` + +```python +aact(action: Component[S], context: Context, backend: Backend) -> tuple[ModelOutputThunk[S], Context] | SamplingResult +``` + + +Asynchronous version of .act; runs a generic action, and adds both the action and the result to the context. + +**Args:** +- `action`: the Component from which to generate. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `requirements`: used as additional requirements when a sampling strategy is provided +- `strategy`: a SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: if set, the BaseModel to use for constrained decoding. +- `model_options`: additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: if true, tool calling is enabled. +- `silence_context_type_warning`: if called directly from an asynchronous function, will log a warning if not using a SimpleContext + +**Returns:** +- A (ModelOutputThunk, Context) if `return_sampling_results` is `False`, else returns a `SamplingResult`. + + +
+ +### FUNC `ainstruct` + +```python +ainstruct(description: str, context: Context, backend: Backend) -> tuple[ModelOutputThunk[str], Context] | SamplingResult +``` + + +Generates from an instruction. + +**Args:** +- `description`: The description of the instruction. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `requirements`: A list of requirements that the instruction can be validated against. +- `icl_examples`: A list of in-context-learning examples that the instruction can be validated against. +- `grounding_context`: A list of grounding contexts that the instruction can use. They can bind as variables using a (key\: str, value\: str | ContentBlock) tuple. +- `user_variables`: A dict of user-defined variables used to fill in Jinja placeholders in other parameters. This requires that all other provided parameters are provided as strings. +- `prefix`: A prefix string or ContentBlock to use when generating the instruction. +- `output_prefix`: A string or ContentBlock that defines a prefix for the output generation. Usually you do not need this. +- `strategy`: A SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: If set, the BaseModel to use for constrained decoding. +- `model_options`: Additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: If true, tool calling is enabled. +- `images`: A list of images to be used in the instruction or None if none. + +**Returns:** +- A (ModelOutputThunk, Context) if `return_sampling_results` is `False`, else returns a `SamplingResult`. + + +
+ +### FUNC `achat` + +```python +achat(content: str, context: Context, backend: Backend) -> tuple[Message, Context] +``` + + +Sends a simple chat message and returns the response. Adds both messages to the Context. + + +
+ +### FUNC `avalidate` + +```python +avalidate(reqs: Requirement | list[Requirement], context: Context, backend: Backend) -> list[ValidationResult] +``` + + +Asynchronous version of .validate; validates a set of requirements over the output (if provided) or the current context (if the output is not provided). + + +
+ +### FUNC `aquery` + +```python +aquery(obj: Any, query: str, context: Context, backend: Backend) -> tuple[ModelOutputThunk, Context] +``` + + +Query method for retrieving information from an object. + +**Args:** +- `obj `: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `query`: The string representing the query to be executed against the object. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `format`: format for output parsing. +- `model_options`: Model options to pass to the backend. +- `tool_calls`: If true, the model may make tool calls. Defaults to False. + +**Returns:** +- The result of the query as processed by the backend. + + +
+ +### FUNC `atransform` + +```python +atransform(obj: Any, transformation: str, context: Context, backend: Backend) -> tuple[ModelOutputThunk | Any, Context] +``` + + +Transform method for creating a new object with the transformation applied. + +**Args:** +- `obj`: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `transformation`: The string representing the query to be executed against the object. +- `context`: the context being used as a history from which to generate the response. +- `backend`: the backend used to generate the response. +- `format`: format for output parsing; usually not needed with transform. +- `model_options`: Model options to pass to the backend. + +**Returns:** +- ModelOutputThunk|Any: The result of the transformation as processed by the backend. If no tools were called, +- the return type will be always be ModelOutputThunk. If a tool was called, the return type will be the return type +- of the function called, usually the type of the object passed in. + +
diff --git a/docs/docs/api/mellea/stdlib/requirements/md.mdx b/docs/docs/api/mellea/stdlib/requirements/md.mdx new file mode 100644 index 000000000..d26e733fa --- /dev/null +++ b/docs/docs/api/mellea/stdlib/requirements/md.mdx @@ -0,0 +1,29 @@ +--- +title: "mellea.stdlib.requirements.md" +sidebarTitle: "mellea.stdlib.requirements.md" +description: "This file contains various requirements for Markdown-formatted files." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `as_markdown_list` + +```python +as_markdown_list(ctx: Context) -> list[str] | None +``` + + +Attempts to format the last_output of the given context as a markdown list. + +
diff --git a/docs/docs/api/mellea/stdlib/requirements/python_reqs.mdx b/docs/docs/api/mellea/stdlib/requirements/python_reqs.mdx new file mode 100644 index 000000000..0515662c7 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/requirements/python_reqs.mdx @@ -0,0 +1,25 @@ +--- +title: "mellea.stdlib.requirements.python_reqs" +sidebarTitle: "mellea.stdlib.requirements.python_reqs" +description: "Requirements for Python code generation validation." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `PythonExecutionReq` + + +Verifies that Python code runs without raising exceptions. + +
diff --git a/docs/docs/api/mellea/stdlib/requirements/requirement.mdx b/docs/docs/api/mellea/stdlib/requirements/requirement.mdx new file mode 100644 index 000000000..0c0c1255a --- /dev/null +++ b/docs/docs/api/mellea/stdlib/requirements/requirement.mdx @@ -0,0 +1,113 @@ +--- +title: "mellea.stdlib.requirements.requirement" +sidebarTitle: "mellea.stdlib.requirements.requirement" +description: "Requirements are a special type of Component used as input to the \"validate\" step in Instruct/Validate/Repair design patterns." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `requirement_check_to_bool` + +```python +requirement_check_to_bool(x: CBlock | str) -> bool +``` + + +Checks if a given output should be marked converted to `True`. + +By default, the requirement check alora outputs: `{"requirement_likelihood": 0.0}`. +True if >.5 + + +
+ +### FUNC `reqify` + +```python +reqify(r: str | Requirement) -> Requirement +``` + + +Maps strings to Requirements. + +This is a utility method for functions that allow you to pass in Requirements as either explicit Requirement objects or strings that you intend to be interpreted as requirements. + + +
+ +### FUNC `req` + +```python +req(*args, **kwargs) -> Requirement +``` + + +Shorthand for Requirement.__init__. + + +
+ +### FUNC `check` + +```python +check(*args, **kwargs) -> Requirement +``` + + +Shorthand for Requirement.__init__(..., check_only=True). + + +
+ +### FUNC `simple_validate` + +```python +simple_validate(fn: Callable[[str], Any]) -> Callable[[Context], ValidationResult] +``` + + +Syntactic sugar for writing validation functions that only operate over the last output from the model (interpreted as a string). + +This is useful when your validation logic only depends upon the most recent model output. For example: + +`Requirement("Answer 'yes' or 'no'", simple_validate(lambda x: x == 'yes' or x == 'no')` + +Validation functions operate over `Context`. Often you do not care about the entire context, and just want to consider the most recent output from the model. + +**Args:** +- `fn`: the simple validation function that takes a string and returns either a bool or (bool, str) +- `reason`: only used if the provided function returns a bool; if the validation function fails, a static reason for that failure to give to the llm when repairing + + +
+ +## Classes + +
+ + +### CLASS `LLMaJRequirement` + + +A requirement that always uses LLM-as-a-Judge. Any available constraint ALoRA will be ignored. + + +
+ +### CLASS `ALoraRequirement` + + +A requirement that always uses an (possibly specified) ALora. If an exception is thrown during the ALora execution path, `mellea` will fall back to LLMaJ. But that is the only case where LLMaJ will be used. + +
diff --git a/docs/docs/api/mellea/stdlib/requirements/safety/guardian.mdx b/docs/docs/api/mellea/stdlib/requirements/safety/guardian.mdx new file mode 100644 index 000000000..ebbe4fbc8 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/requirements/safety/guardian.mdx @@ -0,0 +1,87 @@ +--- +title: "mellea.stdlib.requirements.safety.guardian" +sidebarTitle: "mellea.stdlib.requirements.safety.guardian" +description: "Risk checking with Granite Guardian models via existing backends." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `GuardianRisk` + + +Risk definitions for Granite Guardian models. + +Based on https://github.com/ibm-granite/granite-guardian but updated for 3.3 8B support. + + + +
+**Methods:** + +
+ +#### FUNC `get_available_risks` + +```python +get_available_risks(cls) -> list[str] +``` + +Get list of all available risk types. + + +
+ +### CLASS `GuardianCheck` + + +Enhanced risk checking using Granite Guardian 3.3 8B with multiple backend support. + + + +
+**Methods:** + +
+ +#### FUNC `get_effective_risk` + +```python +get_effective_risk(self) -> str +``` + +Get the effective risk criteria to use for validation. + + +
+ +#### FUNC `get_available_risks` + +```python +get_available_risks(cls) -> list[str] +``` + +Get list of all available standard risk types. + + +
+ +#### FUNC `validate` + +```python +validate(self, backend: Backend, ctx: Context) -> ValidationResult +``` + +Validate conversation using Granite Guardian via selected backend. + +
diff --git a/docs/docs/api/mellea/stdlib/requirements/tool_reqs.mdx b/docs/docs/api/mellea/stdlib/requirements/tool_reqs.mdx new file mode 100644 index 000000000..7e081173f --- /dev/null +++ b/docs/docs/api/mellea/stdlib/requirements/tool_reqs.mdx @@ -0,0 +1,56 @@ +--- +title: "mellea.stdlib.requirements.tool_reqs" +sidebarTitle: "mellea.stdlib.requirements.tool_reqs" +description: "Requirements for tool-use workflows." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `uses_tool` + +```python +uses_tool(tool_name: str | Callable, check_only = False) +``` + + +Forces the model to call a given tool. + +**Args:** +- `tool_name`: The tool that must be called; this can be either the name of the tool or the Callable for the tool. +- `check_only`: Propagates to the Requirement. + +Use `tool_choice` if the OpenAI `tool_choice` model option is supported by your model and inference engine. + + +
+ +### FUNC `tool_arg_validator` + +```python +tool_arg_validator(description: str, tool_name: str | Callable | None, arg_name: str, validation_fn: Callable, check_only: bool = False) -> Requirement +``` + + +A requirement that passes only if `validation_fn` returns a True value for the *value* of the `arg_name` argument to `tool_name`. + +If `tool_name` is not specified, then this requirement is enforced for *every* tool that + +**Args:** +- `description`: The Requirement description. +- `tool_name`: The (optional) tool name for . +- `arg_name`: The argument to check. +- `validation_fn`: A validation function for validating the value of the `arg_name` argument. +- `check_only`: propagates the `check_only` flag to the requirement. + +
diff --git a/docs/docs/api/mellea/stdlib/sampling/base.mdx b/docs/docs/api/mellea/stdlib/sampling/base.mdx new file mode 100644 index 000000000..b08598064 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/sampling/base.mdx @@ -0,0 +1,253 @@ +--- +title: "mellea.stdlib.sampling.base" +sidebarTitle: "mellea.stdlib.sampling.base" +description: "Base Sampling Strategies." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `BaseSamplingStrategy` + + +Base class for multiple strategies that rejects samples based on given instructions. + + + +
+**Methods:** + +
+ +#### FUNC `repair` + +```python +repair(old_ctx: Context, new_ctx: Context, past_actions: list[Component], past_results: list[ModelOutputThunk], past_val: list[list[tuple[Requirement, ValidationResult]]]) -> tuple[Component, Context] +``` + +Repair function that is being invoked if not all requirements are fulfilled. It should return a next action component. + +**Args:** +- `old_ctx`: The context WITHOUT the last action + output. +- `new_ctx`: The context including the last action + output. +- `past_actions`: List of actions that have been executed (without success). +- `past_results`: List of (unsuccessful) generation results for these actions. +- `past_val`: List of validation results for the results. + +**Returns:** +- The next action component and context to be used for the next generation attempt. + + +
+ +#### FUNC `select_from_failure` + +```python +select_from_failure(sampled_actions: list[Component], sampled_results: list[ModelOutputThunk], sampled_val: list[list[tuple[Requirement, ValidationResult]]]) -> int +``` + +This function returns the index of the result that should be selected as `.value` iff the loop budget is exhausted and no success. + +**Args:** +- `sampled_actions`: List of actions that have been executed (without success). +- `sampled_results`: List of (unsuccessful) generation results for these actions. +- `sampled_val`: List of validation results for the results. + +**Returns:** +- The index of the result that should be selected as `.value`. + + +
+ +#### FUNC `sample` + +```python +sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S] +``` + +This method performs a sampling operation based on the given instruction. + +**Args:** +- `action `: The action object to be sampled. +- `context`: The context to be passed to the sampling strategy. +- `backend`: The backend used for generating samples. +- `requirements`: List of requirements to test against (merged with global requirements). +- `validation_ctx`: Optional context to use for validation. If None, validation_ctx = ctx. +- `format`: output format for structured outputs. +- `model_options`: model options to pass to the backend during generation / validation. +- `tool_calls`: True if tool calls should be used during this sampling strategy. +- `show_progress`: if true, a tqdm progress bar is used. Otherwise, messages will still be sent to flog. + +**Returns:** +- A result object indicating the success or failure of the sampling process. + +**Raises:** +- `AssertionError`: Asserts that all required components (repair, select_from_failure, validate, and generate) are provided before proceeding with the sampling. + + +
+ +### CLASS `RejectionSamplingStrategy` + + +Simple rejection sampling strategy that just repeats the same call on failure. + + + +
+**Methods:** + +
+ +#### FUNC `select_from_failure` + +```python +select_from_failure(sampled_actions: list[Component], sampled_results: list[ModelOutputThunk], sampled_val: list[list[tuple[Requirement, ValidationResult]]]) -> int +``` + +Always returns the 0th index. + +**Args:** +- `sampled_actions`: List of actions that have been executed (without success). +- `sampled_results`: List of (unsuccessful) generation results for these actions. +- `sampled_val`: List of validation results for the results. + +**Returns:** +- The index of the result that should be selected as `.value`. + + +
+ +#### FUNC `repair` + +```python +repair(old_ctx: Context, new_ctx: Context, past_actions: list[Component], past_results: list[ModelOutputThunk], past_val: list[list[tuple[Requirement, ValidationResult]]]) -> tuple[Component, Context] +``` + +Always returns the unedited, last action. + +**Args:** +- `old_ctx`: The context WITHOUT the last action + output. +- `new_ctx`: The context including the last action + output. +- `past_actions`: List of actions that have been executed (without success). +- `past_results`: List of (unsuccessful) generation results for these actions. +- `past_val`: List of validation results for the results. + +**Returns:** +- The next action component and context to be used for the next generation attempt. + + +
+ +### CLASS `RepairTemplateStrategy` + + +A sampling strategy that adds a repair string to the instruction object. + + + +
+**Methods:** + +
+ +#### FUNC `select_from_failure` + +```python +select_from_failure(sampled_actions: list[Component], sampled_results: list[ModelOutputThunk], sampled_val: list[list[tuple[Requirement, ValidationResult]]]) -> int +``` + +Always returns the 0th index. + +**Args:** +- `sampled_actions`: List of actions that have been executed (without success). +- `sampled_results`: List of (unsuccessful) generation results for these actions. +- `sampled_val`: List of validation results for the results. + +**Returns:** +- The index of the result that should be selected as `.value`. + + +
+ +#### FUNC `repair` + +```python +repair(old_ctx: Context, new_ctx: Context, past_actions: list[Component], past_results: list[ModelOutputThunk], past_val: list[list[tuple[Requirement, ValidationResult]]]) -> tuple[Component, Context] +``` + +Adds a description of the requirements that failed to a copy of the original instruction. + +**Args:** +- `old_ctx`: The context WITHOUT the last action + output. +- `new_ctx`: The context including the last action + output. +- `past_actions`: List of actions that have been executed (without success). +- `past_results`: List of (unsuccessful) generation results for these actions. +- `past_val`: List of validation results for the results. + +**Returns:** +- The next action component and context to be used for the next generation attempt. + + +
+ +### CLASS `MultiTurnStrategy` + + +Rejection sampling strategy with (agentic) multi-turn repair. + + + +
+**Methods:** + +
+ +#### FUNC `select_from_failure` + +```python +select_from_failure(sampled_actions: list[Component], sampled_results: list[ModelOutputThunk], sampled_val: list[list[tuple[Requirement, ValidationResult]]]) +``` + +Always returns the last index. The last message from the model will always be returned if all results are failures. + +**Args:** +- `sampled_actions`: List of actions that have been executed (without success). +- `sampled_results`: List of (unsuccessful) generation results for these actions. +- `sampled_val`: List of validation results for the results. + +**Returns:** +- The index of the result that should be selected as `.value`. + + +
+ +#### FUNC `repair` + +```python +repair(old_ctx: Context, new_ctx: Context, past_actions: list[Component], past_results: list[ModelOutputThunk], past_val: list[list[tuple[Requirement, ValidationResult]]]) -> tuple[Component, Context] +``` + +Returns a Message with a description of the failed requirements. + +**Args:** +- `old_ctx`: The context WITHOUT the last action + output. +- `new_ctx`: The context including the last action + output. +- `past_actions`: List of actions that have been executed (without success). +- `past_results`: List of (unsuccessful) generation results for these actions. +- `past_val`: List of validation results for the results. + +**Returns:** +- The next action component and context to be used for the next generation attempt. + +
diff --git a/docs/docs/api/mellea/stdlib/sampling/budget_forcing.mdx b/docs/docs/api/mellea/stdlib/sampling/budget_forcing.mdx new file mode 100644 index 000000000..2c13be2dd --- /dev/null +++ b/docs/docs/api/mellea/stdlib/sampling/budget_forcing.mdx @@ -0,0 +1,57 @@ +--- +title: "mellea.stdlib.sampling.budget_forcing" +sidebarTitle: "mellea.stdlib.sampling.budget_forcing" +description: "Sampling Strategies for budget forcing generation." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `BudgetForcingSamplingStrategy` + + +Budget forcing sampling class. + + + +
+**Methods:** + +
+ +#### FUNC `sample` + +```python +sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S] +``` + +This method performs a sampling operation based on the given instruction. + +**Args:** +- `action `: The action object to be sampled. +- `context`: The context to be passed to the sampling strategy. +- `backend`: The backend used for generating samples. +- `requirements`: List of requirements to test against (merged with global requirements). +- `validation_ctx`: Optional context to use for validation. If None, validation_ctx = ctx. +- `format`: output format for structured outputs. +- `model_options`: model options to pass to the backend during generation / validation. +- `tool_calls`: True if tool calls should be used during this sampling strategy. +- `show_progress`: if true, a tqdm progress bar is used. Otherwise, messages will still be sent to flog. + +**Returns:** +- A result object indicating the success or failure of the sampling process. + +**Raises:** +- `AssertionError`: Asserts that all required components (repair, select_from_failure, validate, and generate) are provided before proceeding with the sampling. + +
diff --git a/docs/docs/api/mellea/stdlib/sampling/majority_voting.mdx b/docs/docs/api/mellea/stdlib/sampling/majority_voting.mdx new file mode 100644 index 000000000..4ecf776a5 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/sampling/majority_voting.mdx @@ -0,0 +1,122 @@ +--- +title: "mellea.stdlib.sampling.majority_voting" +sidebarTitle: "mellea.stdlib.sampling.majority_voting" +description: "Sampling Strategies for Minimum Bayes Risk Decoding (MBRD)." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Classes + +
+ + +### CLASS `BaseMBRDSampling` + + +Abstract Minimum Bayes Risk Decoding (MBRD) Sampling Strategy. + + + +
+**Methods:** + +
+ +#### FUNC `compare_strings` + +```python +compare_strings(self, ref: str, pred: str) -> float +``` + +This method is the abstract method for MBRD similarity metric. + + +
+ +#### FUNC `maybe_apply_weighted` + +```python +maybe_apply_weighted(self, scr: np.ndarray) +``` + +Applies weights if self.weighted is True. Not Implemented. + + +
+ +#### FUNC `sample` + +```python +sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S] +``` + +Samples using majority voting. + +**Args:** +- `action `: The action object to be sampled. +- `context`: The context to be passed to the sampling strategy. +- `backend`: The backend used for generating samples. +- `requirements`: List of requirements to test against (merged with global requirements). +- `validation_ctx`: Optional context to use for validation. If None, validation_ctx = ctx. +- `format`: output format for structured outputs; ignored for this sampling strategy. +- `model_options`: model options to pass to the backend during generation / validation. +- `tool_calls`: True if tool calls should be used during this sampling strategy. +- `show_progress`: if true, a tqdm progress bar is used. Otherwise, messages will still be sent to flog. + +**Returns:** +- A result object indicating the success or failure of the sampling process. + + +
+ +### CLASS `MajorityVotingStrategyForMath` + + +MajorityVoting Sampling Strategy for Math Expressions. + + + +
+**Methods:** + +
+ +#### FUNC `compare_strings` + +```python +compare_strings(self, ref: str, pred: str) -> float +``` + +Helper function to compare strings using the math extraction metrics. + + +
+ +### CLASS `MBRDRougeLStrategy` + + +Sampling Strategy that uses RougeL to compute symbol-level distances for majority voting. + + + +
+**Methods:** + +
+ +#### FUNC `compare_strings` + +```python +compare_strings(self, ref: str, pred: str) -> float +``` + +Helper function to compare strings using the math extraction metrics. + +
diff --git a/docs/docs/api/mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.mdx b/docs/docs/api/mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.mdx new file mode 100644 index 000000000..9d7391173 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.mdx @@ -0,0 +1,46 @@ +--- +title: "mellea.stdlib.sampling.sampling_algos.budget_forcing_alg" +sidebarTitle: "mellea.stdlib.sampling.sampling_algos.budget_forcing_alg" +description: "Budget forcing implementation." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `think_budget_forcing` + +```python +think_budget_forcing(backend: OllamaModelBackend, action: CBlock | Component) -> ModelOutputThunk +``` + + +Generate with budget forcing using the completions APIs. + +This relies on raw autocompletion and assumes the model's output is structured in the following form: '<think> ... </think> summary answer' +The budget forcing method is proposed in the paper: https://arxiv.org/abs/2501.19393 +This implementation tries to follow the key outlines in the paper while ensuring stable and fail-safe operation. +This is performed via multi-step generation. The model will be called multiple times until requirements are met, in other words, the response will be assembled conditionally. + +**Args:** +- `backend`: OllamaModelBackend +- `action`: The last item of the context should be passed in as an `action` instead of as part of the `ctx`. See `docs/dev/generate_signature_decisions.md`. +- `think_max_tokens`: Budget in number of tokens allocated for the think block +- `answer_max_tokens`: Budget in number of tokens allocated for the summary and answer block, None indicates unbounded answer, generating till EoS +- `start_think_token`: String indicating start of think block, default <think> +- `end_think_token`: String indicating end of think block, default </think> +- `begin_response_token`: Used by certain models, string indicating start of response block, e.g. "<response>", default None +- `think_more_suffix`: String to append to force continued thinking, e.g. "\nWait" if set to None we will not force additional thinking. Use None for upper-bound budget case +- `answer_suffix`: String to append to force a final answer +- `model_options`: Any model options to upsert into the defaults for this call. + +
diff --git a/docs/docs/api/mellea/stdlib/sampling/sofai.mdx b/docs/docs/api/mellea/stdlib/sampling/sofai.mdx new file mode 100644 index 000000000..b3550bd11 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/sampling/sofai.mdx @@ -0,0 +1,130 @@ +--- +title: "mellea.stdlib.sampling.sofai" +sidebarTitle: "mellea.stdlib.sampling.sofai" +description: "SOFAI (Slow and Fast AI) Sampling Strategy." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +This module implements a two-solver sampling strategy that uses: +1. S1 Solver (fast model) - Iterative solving with feedback-based repair +2. S2 Solver (slow model) - Single attempt escalation when S1 fails or shows no improvement + +The strategy leverages ValidationResult.reason fields to provide targeted +feedback for repair, enabling more effective iterative improvement. + + +## Classes + +
+ + +### CLASS `SOFAISamplingStrategy` + + +SOFAI sampling strategy. + +Uses S1 Solver (fast model) in a loop with targeted feedback from validation results. +If S1 Solver fails after exhausting the budget or shows no improvement, +escalates to a single attempt with S2 Solver (slow model). + +The strategy leverages ValidationResult.reason fields to provide targeted +feedback for repair, enabling more effective iterative improvement. + + + +
+**Methods:** + +
+ +#### FUNC `repair` + +```python +repair(old_ctx: Context, new_ctx: Context, past_actions: list[Component], past_results: list[ModelOutputThunk], past_val: list[list[tuple[Requirement, ValidationResult]]]) -> tuple[Component, Context] +``` + +Create targeted feedback message from validation results. + +Extracts failed requirements and uses their ValidationResult.reason fields +to provide specific, actionable feedback for the next attempt. + +**Args:** +- `old_ctx`: The context WITHOUT the last action + output. +- `new_ctx`: The context including the last action + output. +- `past_actions`: List of actions executed. +- `past_results`: List of generation results. +- `past_val`: List of validation results. + +**Returns:** +- Tuple of (Message component with repair feedback, new context). + + +
+ +#### FUNC `select_from_failure` + +```python +select_from_failure(sampled_actions: list[Component], sampled_results: list[ModelOutputThunk], sampled_val: list[list[tuple[Requirement, ValidationResult]]]) -> int +``` + +Select the most informed attempt (last) when all fail. + +Returns the last attempt as it has benefited from the most feedback. + +**Args:** +- `sampled_actions`: List of actions that have been executed (without success). +- `sampled_results`: List of (unsuccessful) generation results for these actions. +- `sampled_val`: List of validation results for the results. + +**Returns:** +- The index of the result that should be selected as `.value`. + + +
+ +#### FUNC `sample` + +```python +sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S] +``` + +Execute SOFAI two-solver sampling strategy. + +SOFAI Flow Overview: +==================== +1. PHASE 1 - S1 Solver Loop: + - Generate candidate solution with fast S1 model + - Validate against requirements + - If success: return immediately + - If failure: generate repair feedback and iterate + - If no improvement detected: early exit to Phase 2 + +2. PHASE 2 - S2 Solver Escalation: + - Prepare context based on s2_solver_mode: + * fresh_start: clean slate with original prompt + * continue_chat: full S1 conversation history + * best_attempt: best S1 result with feedback summary + - Generate single attempt with slow S2 model + - Validate and return result (success or failure) + +**Args:** +- `action`: The component to sample (Instruction, Message, etc.). +- `context`: The session context (must be ChatContext). +- `backend`: Session backend (used for validation fallback). +- `requirements`: Requirements to validate against. +- `validation_ctx`: Optional separate validation context (unused). +- `format`: Output format for structured outputs. +- `model_options`: Model options to pass to backends. +- `tool_calls`: True if tool calls should be used. + +**Returns:** +- SamplingResult with success status and all generation history. + +
diff --git a/docs/docs/api/mellea/stdlib/session.mdx b/docs/docs/api/mellea/stdlib/session.mdx new file mode 100644 index 000000000..8f8fbe9c5 --- /dev/null +++ b/docs/docs/api/mellea/stdlib/session.mdx @@ -0,0 +1,502 @@ +--- +title: "mellea.stdlib.session" +sidebarTitle: "mellea.stdlib.session" +description: "Mellea Sessions." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `get_session` + +```python +get_session() -> MelleaSession +``` + + +Get the current session from context. + +**Raises:** +- `RuntimeError`: If no session is currently active. + + +
+ +### FUNC `backend_name_to_class` + +```python +backend_name_to_class(name: str) -> Any +``` + + +Resolves backend names to Backend classes. + + +
+ +### FUNC `start_session` + +```python +start_session(backend_name: Literal['ollama', 'hf', 'openai', 'watsonx', 'litellm'] = 'ollama', model_id: str | ModelIdentifier = IBM_GRANITE_4_MICRO_3B, ctx: Context | None = None, **backend_kwargs) -> MelleaSession +``` + + +Start a new Mellea session. Can be used as a context manager or called directly. + +This function creates and configures a new Mellea session with the specified backend +and model. When used as a context manager (with `with` statement), it automatically +sets the session as the current active session for use with convenience functions +like `instruct()`, `chat()`, `query()`, and `transform()`. When called directly, +it returns a session object that can be used directly. + +**Args:** +- `backend_name`: The backend to use. Options are\: +- "ollama"\: Use Ollama backend for local models +- "hf" or "huggingface"\: Use HuggingFace transformers backend +- "openai"\: Use OpenAI API backend +- "watsonx"\: Use IBM WatsonX backend +- "litellm"\: Use the LiteLLM backend +- `model_id`: Model identifier or name. Can be a `ModelIdentifier` from +mellea.backends.model_ids or a string model name. +- `ctx`: Context manager for conversation history. Defaults to SimpleContext(). +Use ChatContext() for chat-style conversations. +- `model_options`: Additional model configuration options that will be passed +to the backend (e.g., temperature, max_tokens, etc.). +- `**backend_kwargs`: Additional keyword arguments passed to the backend constructor. + +**Returns:** +- A session object that can be used as a context manager +- or called directly with session methods. + +**Examples:** + +```python +# Basic usage with default settings +with start_session() as session: + response = session.instruct("Explain quantum computing") + +# Using OpenAI with custom model options +with start_session("openai", "gpt-4", model_options={"temperature": 0.7}): + response = session.chat("Write a poem") + +# Using HuggingFace with ChatContext for conversations +from mellea.stdlib.base import ChatContext +with start_session("hf", "microsoft/DialoGPT-medium", ctx=ChatContext()): + session.chat("Hello!") + session.chat("How are you?") # Remembers previous message + +# Direct usage. +session = start_session() +response = session.instruct("Explain quantum computing") +session.cleanup() +``` + + +
+ +## Classes + +
+ + +### CLASS `MelleaSession` + + +Mellea sessions are a THIN wrapper around `m` convenience functions with NO special semantics. + +Using a Mellea session is not required, but it does represent the "happy path" of Mellea programming. Some nice things about ussing a `MelleaSession`: +1. In most cases you want to keep a Context together with the Backend from which it came. +2. You can directly run an instruction or a send a chat, instead of first creating the `Instruction` or `Chat` object and then later calling backend.generate on the object. +3. The context is "threaded-through" for you, which allows you to issue a sequence of commands instead of first calling backend.generate on something and then appending it to your context. + +These are all relatively simple code hygiene and state management benefits, but they add up over time. +If you are doing complicating programming (e.g., non-trivial inference scaling) then you might be better off forgoing `MelleaSession`s and managing your Context and Backend directly. + +Note: we put the `instruct`, `validate`, and other convenience functions here instead of in `Context` or `Backend` to avoid import resolution issues. + + + +
+**Methods:** + +
+ +#### FUNC `clone` + +```python +clone(self) +``` + +Useful for running multiple generation requests while keeping the context at a given point in time. + +**Returns:** +- a copy of the current session. Keeps the context, backend, and session logger. + +**Examples:** + +```python +>>> from mellea import start_session +>>> m = start_session() +>>> m.instruct("What is 2x2?") +>>> +>>> m1 = m.clone() +>>> out = m1.instruct("Multiply that by 2") +>>> print(out) +... 8 +>>> +>>> m2 = m.clone() +>>> out = m2.instruct("Multiply that by 3") +>>> print(out) +... 12 +``` + + +
+ +#### FUNC `reset` + +```python +reset(self) +``` + +Reset the context state. + + +
+ +#### FUNC `cleanup` + +```python +cleanup(self) -> None +``` + +Clean up session resources. + + +
+ +#### FUNC `act` + +```python +act(self, action: Component[S]) -> ModelOutputThunk[S] +``` + +
+ +#### FUNC `act` + +```python +act(self, action: Component[S]) -> SamplingResult[S] +``` + +
+ +#### FUNC `act` + +```python +act(self, action: Component[S]) -> ModelOutputThunk[S] | SamplingResult +``` + +Runs a generic action, and adds both the action and the result to the context. + +**Args:** +- `action`: the Component from which to generate. +- `requirements`: used as additional requirements when a sampling strategy is provided +- `strategy`: a SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: if set, the BaseModel to use for constrained decoding. +- `model_options`: additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: if true, tool calling is enabled. + +**Returns:** +- A ModelOutputThunk if `return_sampling_results` is `False`, else returns a `SamplingResult`. + + +
+ +#### FUNC `instruct` + +```python +instruct(self, description: str) -> ModelOutputThunk[str] +``` + +
+ +#### FUNC `instruct` + +```python +instruct(self, description: str) -> SamplingResult[str] +``` + +
+ +#### FUNC `instruct` + +```python +instruct(self, description: str) -> ModelOutputThunk[str] | SamplingResult +``` + +Generates from an instruction. + +**Args:** +- `description`: The description of the instruction. +- `requirements`: A list of requirements that the instruction can be validated against. +- `icl_examples`: A list of in-context-learning examples that the instruction can be validated against. +- `grounding_context`: A list of grounding contexts that the instruction can use. They can bind as variables using a (key\: str, value\: str | ContentBlock) tuple. +- `user_variables`: A dict of user-defined variables used to fill in Jinja placeholders in other parameters. This requires that all other provided parameters are provided as strings. +- `prefix`: A prefix string or ContentBlock to use when generating the instruction. +- `output_prefix`: A string or ContentBlock that defines a prefix for the output generation. Usually you do not need this. +- `strategy`: A SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: If set, the BaseModel to use for constrained decoding. +- `model_options`: Additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: If true, tool calling is enabled. +- `images`: A list of images to be used in the instruction or None if none. + + +
+ +#### FUNC `chat` + +```python +chat(self, content: str, role: Message.Role = 'user') -> Message +``` + +Sends a simple chat message and returns the response. Adds both messages to the Context. + + +
+ +#### FUNC `validate` + +```python +validate(self, reqs: Requirement | list[Requirement]) -> list[ValidationResult] +``` + +Validates a set of requirements over the output (if provided) or the current context (if the output is not provided). + + +
+ +#### FUNC `query` + +```python +query(self, obj: Any, query: str) -> ModelOutputThunk +``` + +Query method for retrieving information from an object. + +**Args:** +- `obj `: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `query`: The string representing the query to be executed against the object. +- `format`: format for output parsing. +- `model_options`: Model options to pass to the backend. +- `tool_calls`: If true, the model may make tool calls. Defaults to False. + +**Returns:** +- The result of the query as processed by the backend. + + +
+ +#### FUNC `transform` + +```python +transform(self, obj: Any, transformation: str) -> ModelOutputThunk | Any +``` + +Transform method for creating a new object with the transformation applied. + +**Args:** +- `obj `: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `transformation`: The string representing the query to be executed against the object. +- `format`: format for output parsing; usually not needed with transform. +- `model_options`: Model options to pass to the backend. + +**Returns:** +- ModelOutputThunk|Any: The result of the transformation as processed by the backend. If no tools were called, +- the return type will be always be ModelOutputThunk. If a tool was called, the return type will be the return type +- of the function called, usually the type of the object passed in. + + +
+ +#### FUNC `aact` + +```python +aact(self, action: Component[S]) -> ModelOutputThunk[S] +``` + +
+ +#### FUNC `aact` + +```python +aact(self, action: Component[S]) -> SamplingResult[S] +``` + +
+ +#### FUNC `aact` + +```python +aact(self, action: Component[S]) -> ModelOutputThunk[S] | SamplingResult +``` + +Runs a generic action, and adds both the action and the result to the context. + +**Args:** +- `action`: the Component from which to generate. +- `requirements`: used as additional requirements when a sampling strategy is provided +- `strategy`: a SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: if set, the BaseModel to use for constrained decoding. +- `model_options`: additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: if true, tool calling is enabled. + +**Returns:** +- A ModelOutputThunk if `return_sampling_results` is `False`, else returns a `SamplingResult`. + + +
+ +#### FUNC `ainstruct` + +```python +ainstruct(self, description: str) -> ModelOutputThunk[str] +``` + +
+ +#### FUNC `ainstruct` + +```python +ainstruct(self, description: str) -> SamplingResult[str] +``` + +
+ +#### FUNC `ainstruct` + +```python +ainstruct(self, description: str) -> ModelOutputThunk[str] | SamplingResult[str] +``` + +Generates from an instruction. + +**Args:** +- `description`: The description of the instruction. +- `requirements`: A list of requirements that the instruction can be validated against. +- `icl_examples`: A list of in-context-learning examples that the instruction can be validated against. +- `grounding_context`: A list of grounding contexts that the instruction can use. They can bind as variables using a (key\: str, value\: str | ContentBlock) tuple. +- `user_variables`: A dict of user-defined variables used to fill in Jinja placeholders in other parameters. This requires that all other provided parameters are provided as strings. +- `prefix`: A prefix string or ContentBlock to use when generating the instruction. +- `output_prefix`: A string or ContentBlock that defines a prefix for the output generation. Usually you do not need this. +- `strategy`: A SamplingStrategy that describes the strategy for validating and repairing/retrying for the instruct-validate-repair pattern. None means that no particular sampling strategy is used. +- `return_sampling_results`: attach the (successful and failed) sampling attempts to the results. +- `format`: If set, the BaseModel to use for constrained decoding. +- `model_options`: Additional model options, which will upsert into the model/backend's defaults. +- `tool_calls`: If true, tool calling is enabled. +- `images`: A list of images to be used in the instruction or None if none. + + +
+ +#### FUNC `achat` + +```python +achat(self, content: str, role: Message.Role = 'user') -> Message +``` + +Sends a simple chat message and returns the response. Adds both messages to the Context. + + +
+ +#### FUNC `avalidate` + +```python +avalidate(self, reqs: Requirement | list[Requirement]) -> list[ValidationResult] +``` + +Validates a set of requirements over the output (if provided) or the current context (if the output is not provided). + + +
+ +#### FUNC `aquery` + +```python +aquery(self, obj: Any, query: str) -> ModelOutputThunk +``` + +Query method for retrieving information from an object. + +**Args:** +- `obj `: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `query`: The string representing the query to be executed against the object. +- `format`: format for output parsing. +- `model_options`: Model options to pass to the backend. +- `tool_calls`: If true, the model may make tool calls. Defaults to False. + +**Returns:** +- The result of the query as processed by the backend. + + +
+ +#### FUNC `atransform` + +```python +atransform(self, obj: Any, transformation: str) -> ModelOutputThunk | Any +``` + +Transform method for creating a new object with the transformation applied. + +**Args:** +- `obj`: The object to be queried. It should be an instance of MObject or can be converted to one if necessary. +- `transformation`: The string representing the query to be executed against the object. +- `format`: format for output parsing; usually not needed with transform. +- `model_options`: Model options to pass to the backend. + +**Returns:** +- ModelOutputThunk|Any: The result of the transformation as processed by the backend. If no tools were called, +- the return type will be always be ModelOutputThunk. If a tool was called, the return type will be the return type +- of the function called, usually the type of the object passed in. + + +
+ +#### FUNC `powerup` + +```python +powerup(cls, powerup_cls: type) +``` + +Appends methods in a class object `powerup_cls` to MelleaSession. + + +
+ +#### FUNC `last_prompt` + +```python +last_prompt(self) -> str | list[dict] | None +``` + +Returns the last prompt that has been called from the session context. + +**Returns:** +- A string if the last prompt was a raw call to the model OR a list of messages (as role-msg-dicts). Is None if none could be found. + +
diff --git a/docs/docs/api/mellea/stdlib/tools/interpreter.mdx b/docs/docs/api/mellea/stdlib/tools/interpreter.mdx new file mode 100644 index 000000000..58e2a47bc --- /dev/null +++ b/docs/docs/api/mellea/stdlib/tools/interpreter.mdx @@ -0,0 +1,180 @@ +--- +title: "mellea.stdlib.tools.interpreter" +sidebarTitle: "mellea.stdlib.tools.interpreter" +description: "Code interpreter tool." +--- + + + + + +import { SidebarFix } from "/snippets/SidebarFix.mdx"; + + + +## Functions + +
+ + +### FUNC `code_interpreter` + +```python +code_interpreter(code: str) -> ExecutionResult +``` + + +Executes python code. + +**Args:** +- `code`: The Python code to execute. + + +
+ +### FUNC `local_code_interpreter` + +```python +local_code_interpreter(code: str) -> ExecutionResult +``` + + +Executes python code in the cwd. + +**Args:** +- `code`: The Python code to execute. + + +
+ +## Classes + +
+ + +### CLASS `ExecutionResult` + + +Result of code execution. + +Code execution can be aborted prior to spinning up an interpreter (e.g., if prohibited imports are used). +In these cases, the `success` flag is set to False and the `skipped` flag is set to True. + +If code is executed, then `success` is set to true iff the exit code is 0, and the `stdout` and `stderr` outputs +are set to non-None values. + +We also use the `ExecutionResult` object to communicate the result of static and dynamic analyses. Those are passed back +using the `analysis_result` field. + +TODO\: should we also be trying to pass back the value of the final expression evaluated, or the value of locals() and globals()? + + + +
+**Methods:** + +
+ +#### FUNC `to_validationresult_reason` + +```python +to_validationresult_reason(self) +``` + +Maps an ExecutionResult to a ValidationResult reason. + +TODO\: Downstream use of this method is really hacky. A far better solution is for `ExecutionResult` to implement the `ValidationResult` interface. + + +
+ +### CLASS `ExecutionEnvironment` + + +Abstract environment for executing Python code. + + + +
+**Methods:** + +
+ +#### FUNC `execute` + +```python +execute(self, code: str, timeout: int) -> ExecutionResult +``` + +Execute code and return result. + + +
+ +### CLASS `StaticAnalysisEnvironment` + + +Safe environment that validates but does not execute code. + + + +
+**Methods:** + +
+ +#### FUNC `execute` + +```python +execute(self, code: str, timeout: int) -> ExecutionResult +``` + +Validate code syntax and imports without executing. + + +
+ +### CLASS `UnsafeEnvironment` + + +Unsafe environment that executes code directly with subprocess. + + + +
+**Methods:** + +
+ +#### FUNC `execute` + +```python +execute(self, code: str, timeout: int) -> ExecutionResult +``` + +Execute code with subprocess after checking imports. + + +
+ +### CLASS `LLMSandboxEnvironment` + + +Environment using llm-sandbox for secure Docker-based execution. + + + +
+**Methods:** + +
+ +#### FUNC `execute` + +```python +execute(self, code: str, timeout: int) -> ExecutionResult +``` + +Execute code using llm-sandbox. + +
diff --git a/docs/docs/core-concept/adapters.mdx b/docs/docs/core-concept/adapters.mdx new file mode 100644 index 000000000..2274ff890 --- /dev/null +++ b/docs/docs/core-concept/adapters.mdx @@ -0,0 +1,40 @@ +--- +title: "Tool calling" +description: " Command-line tool for adapting base models like IBM Granite to custom tasks." +--- + +Mellea supports tool calling for providers/models that support it. Most session level functions support setting a tool_calls boolean. Setting this to true allows tools to be called, but there's no guarantee that a model will call them. +Tools can be made available for the model to call in a few ways: + +1. Components: components can have a TemplateRepresentation object that contains tools. +2. Context: depending on the context, the components in that context can be used as sources of additional tools in the exact same way they would if they were the current action. +3. `ModelOptions.TOOLS`: model options can include a tools parameter. The preferred way of passing these tools is as a list of function objects. + +Currently, tools are identified by the name of the function. If there are conflicts, the most recent tool with that name will be preferred. This means the tools available to the model will have the same priority listed above: + +1. Tools from the current component will always be included +2. Tools from the context will be included if there are no name conflicts. A given context can decide what tools to surface, but in most cases, tools from the most recent component in the context will take priority over tools from older requests. +3. Tools from `ModelOptions.TOOLS` will only be added if they do not conflict with any of the above functions. + +For examples on adding tools to the template representation of a component, see the `Table` object in [richdocument.py](../mellea/stdlib/docs/richdocument.py). + +Here's an example of adding a tool through model options. This can be useful when you want to add a tool like web search that should almost always be available: + +```python +from mellea.backends.types import ModelOption + +def web_search(query: str) -> str: + ... + +output = m.instruct( + "Who is the 1st President of the United States?", + model_options={ + ModelOptions.TOOLS: [web_search], + }, + tool_calls = True, +) + +assert "web_search" in output.tool_calls + +result = output.tool_calls["web_search"].call_func() +``` diff --git a/docs/docs/core-concept/agents.mdx b/docs/docs/core-concept/agents.mdx new file mode 100644 index 000000000..ed4d97e32 --- /dev/null +++ b/docs/docs/core-concept/agents.mdx @@ -0,0 +1,231 @@ +--- +title: "Agents" +description: "Building agents using Mellea." +--- + +> **Definition:** An _agent_ is a generative program in which an LLM determines the control flow of the program. + +In the generative programs we have seen so far, the developer orchestrates a sequence of LLM calls. In contrast, agentic generative programs delegate control flow to the model itself. In this chapter we will see a couple of different ways of developing agents in Mellea: + +1. **Classical Agents:** How to implement agentic loops in Mellea using the ReACT pattern. +2. **Guarded Nondeterminism:** We will return to the idea of generative slots, and see how this abstraction can help build more robust agents. + +## Case Study: Implementing ReACT in Mellea + +Let's build up to a full agent example using the ReACT pattern. We'll start with pseudocode and then incrementally build our Mellea ReACT program. + +The core idea of ReACT is to alternate between reasoning ("Thought") and acting ("Action"): + +``` +## Pseudocode +while not done: + get the model's next thought + take an action based upon the though + choose arguments for the selection action + observe the toll output + check if a final answer can be obtained +return the final answer +``` + +Let's look at how this agent is implemented in Mellea: + +````python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/agents/react.py#L99 +def react( + m: mellea.MelleaSession, + goal: str, + react_toolbox: ReactToolbox, + budget: int = 5, +): + assert m.ctx.is_chat_context, "ReACT requires a chat context." + test_ctx_lin = m.ctx.render_for_generation() + assert ( + test_ctx_lin is not None and len(test_ctx_lin) == 0 + ), "ReACT expects a fresh context." + + # Construct the system prompt for ReACT. + _sys_prompt = react_system_template.render( + {"today": datetime.date.today(), "tools": react_toolbox.tools} + ) + + # Add the system prompt and the goal to the chat history. + m.ctx.insert(mellea.stdlib.chat.Message(role="system", content=_sys_prompt)) + m.ctx.insert(mellea.stdlib.chat.Message(role="user", content=f"{goal}")) + + done = False + turn_num = 0 + while not done: + turn_num += 1 + print(f"## ReACT TURN NUMBER {turn_num}") + + print(f"### Thought") + thought = m.chat( + "What should you do next? Respond with a description of the next piece of information you need or the next action you need to take." + ) + print(thought.content) + + print("### Action") + act = m.chat( + "Choose your next action. Respond with a nothing other than a tool name.", + # model_options={mellea.backends.types.ModelOption.TOOLS: react_toolbox.tools_dict()}, + format=react_toolbox.tool_name_schema(), + ) + selected_tool: ReactTool = react_toolbox.get_tool_from_schema( + act.content) + print(selected_tool.get_name()) + + print(f"### Arguments for action") + act_args = m.chat( + "Choose arguments for the tool. Respond using JSON and include only the tool arguments in your response.", + format=selected_tool.args_schema(), + ) + print( + f"```json\n{json.dumps(json.loads(act_args.content), indent=2)}\n```") + + # TODO: handle exceptions. + print("### Observation") + tool_output = react_toolbox.call_tool(selected_tool, act_args.content) + m.ctx.insert( + mellea.stdlib.chat.Message(role="tool", content=tool_output) + ) + print(tool_output) + + is_done = IsDoneModel.model_validate_json( + m.chat( + f"Do you know the answer to the user's original query ({goal})? If so, respond with Yes. If you need to take more actions, then respond No.", + format=IsDoneModel, + ).content + ).is_done + if is_done: + print("Done. Will summarize and return output now.") + done = True + return m.chat( + f"Please provide your final answer to the original query ({goal})." + ).content + elif turn_num == budget: + return None + +```` + +## Case Study: Guarded Nondeterminism + +Recall Chapter 4, where we saw how libraries of `GenerativeSlot` components can be composed by introducing compositionality contracts. We will now build an "agentic" mechanism for automating the task of chaining together possibly-composable generative functions. Let's get started on our guarded nondeterminism agent ("guarded nondeterminism" is a bit of a mouthful, so we'll call this a a [Kripke](https://en.wikipedia.org/wiki/Saul_Kripke) agent going forward). + +The first step is to add a new `Component` that adds preconditions and postconditions to generative slots: + +```python +## file: https://github.com/generative-computing/kripke_agents/blob/main/kripke/base.py#L10-L38 # TODO: MOVE THESE TO FAKE KRIPKE +class ConstrainedGenerativeSlot(Component): + template = GEN_SLOT_TEMPLATE # the same template as is used for generative slots. + + def __init__(self, generative_slot: GenerativeSlot, preconds: list[Requirement | str], postconds: list[Requirement | str]): + self._genslot = generative_slot + self._preconds = [reqify(precond) for precond in preconds] + self._postconds = [reqify(postcond) for postcond in postconds] + + def format_for_llm(self): + return self._genslot.format_for_llm() + + def action_name(self): + return self._genslot._function._function_dict["name"] +``` + +We'll also add a decorator for convienance: + +```python +## file: https://github.com/generative-computing/kripke_agents/blob/main/kripke/base.py#L41-L44 +def constrained(preconds: list[Requirement | str], postconds: list[Requirement | str]): + def _decorator(genslot: GenerativeSlot): + return ConstrainedGenerativeSlot(genslot, preconds, postconds) + return _decorator +``` + +We can now write down constrained generative slots like so: + +```python +## file: https://github.com/generative-computing/kripke_agents/blob/main/main.py#L23-L27 +@constrained(preconds=["contains a summary of the story's theme"], postconds=["each element of the list is the title and author of a significant novel"]) +@generative +def suggest_novels_based_on_theme(summary: str) -> list[str]: + """Based upon a summary of a short story, suggests novels with similar themes.""" + ... +``` + +Notice that we have used the `Requirement` component throughout, so we now have all the power of Mellea requirement validation semantics at our disposal for defining and checking pre/post-conditions. + +We are now ready to provide the stump of our kripke agent: + +```python +## file: https://github.com/generative-computing/kripke_agents/blob/main/kripke/base.py#L54-L99 +def filter_actions(m: mellea.MelleaSession, actions: list[ConstrainedGenerativeSlot], *, output: ModelOutputThunk | None = None): + ... + + +def select_action(m: mellea.MelleaSession, actions: list[ConstrainedGenerativeSlot], goal: Requirement): + ... + + +def kripke_agent( + m: mellea.MelleaSession, + actions: list[ConstrainedGenerativeSlot], + goal: Requirement | str, + budget: int = 10 +) -> Callable[[str], str | None]: + goal = reqify(goal) + + def _agent(initial_state: str) -> str | None: + print(f"Goal: {goal.description}") + m.ctx.insert(ModelOutputThunk(initial_state)) + i = 0 + while i in tqdm.tqdm(list(range(budget))): + print(m.ctx.last_output()) + available_actions = filter_actions(m, actions) + next_action = select_action(m, available_actions, goal) + m.act(next_action) + if goal.validate(m.backend, m.ctx): + return m.ctx.last_output().value + return None + return _agent +``` + +The magic of the Kripke agent happens in `filter_actions`. The basic idea is simple: select only actions whose preconditions are implied by the current state: + +```python +## file: https://github.com/generative-computing/kripke_agents/blob/main/kripke/base.py#L47-L55 +def _check_action_preconditions(m: mellea.MelleaSession, action: ConstrainedGenerativeSlot, *, output: ModelOutputThunk | None = None) -> bool: + for precondition in action._preconds: + if not m.validate(precondition, output=output): + return False + return True + + +def filter_actions(m: mellea.MelleaSession, actions: list[ConstrainedGenerativeSlot], *, output: ModelOutputThunk | None = None): + return [act for act in actions if _check_action_preconditions(m, act, output=output)] +``` + +And we finish of the agent by defining the selection criteria, using familiar constrained decoding techniques from our react agent: + +```python +## file: https://github.com/generative-computing/kripke_agents/blob/main/kripke/base.py#L58-L71 +def select_action(m: mellea.MelleaSession, actions: list[ConstrainedGenerativeSlot], goal: Requirement): + # Setup a pydanyic model for the next action. + action_names = [action.action_name() for action in actions] + fields = dict() + fields["next_action"] = Literal[*action_names] + pydantic_model = pydantic.create_model("NextActionSelectionSchema", **fields) + # Prompt the model for the next action. + actions_list = "\n".join([f" * {action.action_name()}" for action in actions]) + action_selection_response = m.chat(f"Your ultimate goal is {goal.description}. Select the next action from the list of actions:\n{actions_list}", format=pydantic_model) + # return the selected action. + next_action_name = pydantic_model.model_validate_json(action_selection_response.content).next_action + selected_action = [a for a in actions if a.action_name() == next_action_name] + assert len(selected_action) == 1 + return selected_action[0] +``` + +We will stop here for the basic tutorial, but notice that there are several natural extensions: + +1. We have not yet used the preconditions. Kripke agents can be optimized by **pre-computing** entailments between sets of pre-conditions and post-conditions; in this way, we only have to pay the cost of figuring out permissible interleaving of actions once. +2. We can execute multiple actions at once, then prune likely unfruitful portions of the search process. + +We will dive into a full implementation of these and other Kripke agent tricks during a future deep-dive session on inference scaling with Mellea. diff --git a/docs/docs/core-concept/alora.mdx b/docs/docs/core-concept/alora.mdx new file mode 100644 index 000000000..345da55ef --- /dev/null +++ b/docs/docs/core-concept/alora.mdx @@ -0,0 +1,124 @@ +--- +title: "Mellea CLI β€” Train & Upload LoRA/aLoRA Adapters" +description: "Train and use LoRA / aLoRA adapters as requirement validators in Mellea." +sidebarTitle: "Training CLI" +--- + +Mellea provides a command-line interface for training and uploading [LoRA](https://arxiv.org/abs/2106.09685) or [aLoRA](https://github.com/IBM/alora) adapters for causal language models. This tool is useful for adapting base models like IBM Granite to custom tasks using prompt-based classification. The major goal is to help customer train a requirement validator. + +--- + +## πŸ”§ Installation + +From the root of the repository: + +```bash +pip install mellea +huggingface-cli login # Optional: only needed for uploads +``` + +--- + +## πŸ“„ Training Data Format + +Mellea expects training data in a `.jsonl` file, where each line contains: + +- `item`: A user prompt or message +- `label`: A string classification label + +### πŸ“¦ Example `data.jsonl` + +```json +{"item": "The stembolt doesn't adjust at high RPM.", "label": "F"} +{"item": "Normal sensor readings but inconsistent throttle.", "label": "T"} +{"item": "Sluggish acceleration from idle.", "label": "T"} +``` + +--- + +## πŸš€ Train a Model + +Use the `m alora train` command to fine-tune a LoRA or aLoRA adapter requirement validator. + +```bash +m alora train path/to/data.jsonl \ + --basemodel ibm-granite/granite-3.2-8b-instruct \ + --outfile ./checkpoints/alora_adapter \ + --adapter alora \ + --epochs 6 \ + --learning-rate 6e-6 \ + --batch-size 2 \ + --max-length 1024 \ + --grad-accum 4 +``` + +### πŸ“Œ Parameters + +| Flag | Type | Default | Description | +| ----------------- | ------- | ---------- | ----------------------------------------- | +| `--basemodel` | `str` | _required_ | Hugging Face model ID or local path | +| `--outfile` | `str` | _required_ | Directory to save the adapter weights | +| `--adapter` | `str` | `"alora"` | Choose between `alora` or standard `lora` | +| `--epochs` | `int` | `6` | Number of training epochs | +| `--learning-rate` | `float` | `6e-6` | Learning rate | +| `--batch-size` | `int` | `2` | Per-device batch size | +| `--max-length` | `int` | `1024` | Max tokenized input length | +| `--grad-accum` | `int` | `4` | Gradient accumulation steps | + +--- + +## ⬆️ Upload to Hugging Face + +Use the `m alora upload` command to publish your trained adapter: + +```bash +m alora upload ./checkpoints/alora_adapter \ + --name acme/carbchecker-alora +``` + +This will: + +- Create the Hugging Face model repo (if it doesn't exist) +- Upload the contents of the `outfile` directory +- Requires a valid `HF_TOKEN` via `huggingface-cli login` + +--- + +## πŸ›  Requirements + +- Python 3.8+ +- Install the following dependencies manually or via `pip install mellea`: + - `transformers` + - `trl` + - `peft` + - `datasets` + - `huggingface_hub` + - `alora` + +--- + +## πŸ§ͺ Example Datasets for Testing + +To verify the `alora-train` and `alora-upload` functionality, we tested the CLI using two well-known benchmark datasets: **TREC** and **SST-2**. These datasets are small, well-structured, and suitable for validating training pipelines. + +### πŸ“š 1. TREC (Question Classification) + +- **Link**: [Hugging Face: TREC Dataset](https://huggingface.co/datasets/trec) +- **Description**: The TREC dataset consists of open-domain, fact-based questions divided into broad semantic categories. Each example contains a question and a label such as `DESC`, `HUM`, `LOC`, etc. +- **Used format**: + ```json + { "item": "What is the capital of France?", "label": "LOC" } + ``` + +### πŸ“š 2. SST-2 (Stanford Sentiment Treebank v2) + +- **Link**: [Hugging Face: sst-2 Dataset](https://huggingface.co/datasets/stanfordnlp/sst2) +- **Description**: SST-2 is a binary sentiment classification dataset based on movie review sentences. Each entry is labeled as either `POSITIVE` or `NEGATIVE`. +- **Used format**: + ```json + { "item": "A beautiful, poetic piece of cinema.", "label": "POSITIVE" } + ``` + +## Further reading + +- [Requirement β†’ aLoRA rerouting semantics](/dev/requirement-alora-rerouting) diff --git a/docs/docs/core-concept/context-management.mdx b/docs/docs/core-concept/context-management.mdx new file mode 100644 index 000000000..3c2c3a81b --- /dev/null +++ b/docs/docs/core-concept/context-management.mdx @@ -0,0 +1,67 @@ +--- +title: "Context Management" +description: "Context management using Mellea sessions" +--- + +Mellea manages context using two complementary mechanisms: + +1. `Component`s themselves, which generally contain all of the context needed for a single-turn request. MObjects manage context using fields and methods, Instructions have a grounding_context for RAG-style requests, etc. + +2. The `Context`, which stores and represents a (sometimes partial) history of all previous requests to the LLM made during the current session. + +We have already seen a lot about how Components can be used to define the context of an LLM request, so in this chapter we will focus on the `Context` mechanism. + +When you use the `start_session()` method, you are actually instantiating a `Mellea` with a default inference engine, a default model choice, and a default context manager. The following code is equivalent to `m.start_session()`: + +```python +from mellea import MelleaSession + +m = mellea.MelleaSession( + backend=OllamaBackend(model_id=IBM_GRANITE_3_3_8B) + context=SimpleContext() +) +``` + +The `SimpleContext` -- which is the only context we have used so far -- is a context manager that resets the chat message history on each model call. That is, the model's context is entirely determined by the current Component. Mellea also provides a `ChatContext`, which behaves like a chat history. We can use the ChatContext to interact with chat models: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/context_example.py#L1-L5 +from mellea import start_session + +m = mellea.start_session(ctx=ChatContext()) +m.chat("Make up a math problem.") +m.chat("Solve your math problem.") +``` + +The `Context` object provides a few useful helpers for introspecting on the current model context; for example, you can always get the last model output: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/context_example.py#L7 +print(m.ctx.last_output()) +``` + +or the entire last turn (user query + assistant response): + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/context_example.py#L9 +print(m.ctx.last_turn()) +``` + +You can also use `session.clone()` to create a copy of a given session with its context at given point in time. This allows you to make multiple generation requests with the same objects in your context: + +```python +m = start_session(ctx=ChatContext()) +m.instruct("Multiply 2x2.") + +m1 = m.clone() +m2 = m.clone() + +## Need to run this code in an async event loop. +co1 = m1.ainstruct("Multiply that by 3") +co2 = m2.ainstruct("Multiply that by 5") + +print(await co1) # 12 +print(await co2) # 20 +``` + +In the above example, both requests have `Multiply 2x2` and the LLM's response to that (presumably `4`) in their context. By cloning the session, the new requests both operate independently on that context to get the correct answers to 4 x 3 and 4 x 5. diff --git a/docs/docs/core-concept/contribution-guide.mdx b/docs/docs/core-concept/contribution-guide.mdx new file mode 100644 index 000000000..c4bb66785 --- /dev/null +++ b/docs/docs/core-concept/contribution-guide.mdx @@ -0,0 +1,56 @@ +--- +title: "Contributor Guide" +--- + +# Contributor Guide + +### Contributor Guide: Requirements and Verifiers + +Contributing new Requirements (i.e., verifiers) is an easy way to get started contributing to Mellea. Requirements can be as general or as domain-specific as you'd like, but must encapsulate a coherent and testable property. We have seen many examples of Requirements throughout this tutorial. + +If you write a Requirement that is general-purose and likely useful to others, consider contributing your _general-purpose_ component to Mellea's standard library: + +1. Find a file in `mellea/stdlib/reqlib/` where your requirement belongs; if no file fits, create a new one. +2. Implement your requirement. Ideally, your verifier should be robust, which typically means not using the default LLMaJ behavior. If the requirement can be checked with code, you should write a validation function. See [our Markdown requirements](/core-concept/requirements) for some examples of how this works. You could also [tune (and evaluate) a well-calibrated aLoRA](/core-concept/tuning) for requirements that are not possible to implement in code. +3. Open a PR. If your Requirement uses LLMaJ, be sure to include a robust evaluation suite in your PR demonstrating that LLMaJ verification is good enough. + +One important note: if your requirement can be easily specified in terms of a grammatical constraint, then you should consider using constrained generation (by passing `format=` into your session or generate call -- see [agent implementation](/core-concept/agents) for some examples) instead of using requirements. + +### Contributor Guide: Components + +Components are the building blocks of Mellea. The point of a Component is that it has a way to represent itself to a Backend, its `format_for_llm` function. When creating a new component, you will most likely want to have `format_for_llm` return a `TemplateRepresentation`, a structured representation of itself that includes template args, tools, and the template itself. + +Components are best created when you find yourself with data/objects that you are frequently formatting and marshalling into text to interact with LLMs. + +To create a new component, you must both define it in code and (in most cases) create a template for it. Components are also runtime checkable protocols, so you need not inherit from the base class; you can simply add the required methods to an existing class as well. + +When distributing a new Component, think of the Component the same way you think about a software library. Components are self-contained, well-documented, amenable to reuse, and hopefully also composable with other Components. + +You have a couple of options for distributing your Component. You can distribute the Component as a library in user-space, or you can request that the Component is incorporated into the Mellea stdlib. Most Components are best positioned as third party libraries. You can distribute third-party generative programming components just like you distribute any third party library (github, pypi). + +For Components that implement useful and widely used patterns, inclusion in the the Mellea stdlib may make sense. These are the early days of generative programming; we expect that some contributions will have pride-of-place in the Mellea standard library. We encourage contributors to ask early and often about inclusion in the stdlib. + +### Contributor Guide: Specialized Mify + +Mifying an object is another way to make it compatible with `Mellea`. Just like with Components, there is a `MifiedProtocol` that is a runtime checkable protocol. `@mify` or `mify(object)` adds the required methods to any object. + +Since it's a protocol, you can create your own `mify` functions that wrap a class/object or add the required functionality to that class/object in any way you want. + +For instance, you may have an ORM library where most of your objects follow the same pattern and structure. To integrate that library with `Mellea`, one approach would be to write a specific `mify` function that knows about that structure. It could look something like this: + +```python +T = TypeVar("T") +def mify_orm(obj: T): + setattr(obj, "format_for_llm", obj.sql) + ... +``` + +In this way, you can define a common way to `mify` all components of this library on the fly, assuming they all have a `sql` function. + +For a specialized mify function to be added to the stdlib, it must work as both a decorator and a function that can be called directly on objects/classes. It must also be a generic but useful pattern or a pattern for a widely used library. + +### Contributor Guide: Sessions + +While a less common need, Mellea allows you to create new types of sessions. When you need fine-grained control over context, it's advised that you completely override the `MelleaSession` methods. + +To institute gates on calls that get made or modify calls without modifying the underlying context, overriding the methods but calling the `MelleaSession` supermethod is advised. diff --git a/docs/docs/core-concept/generative-slots.mdx b/docs/docs/core-concept/generative-slots.mdx new file mode 100644 index 000000000..e474fe3d5 --- /dev/null +++ b/docs/docs/core-concept/generative-slots.mdx @@ -0,0 +1,185 @@ +--- +title: "Generative Slots" +description: "A method to generate outputs based on python functions and a Generative Slot function." +--- + +In classical programming, pure (stateless) functions are a simple and powerful abstraction. A pure function takes inputs, computes outputs, and has no side effects. Generative programs can also use functions as abstraction boundaries, but in a generative program the meaning of the function can be given by an LLM instead of an interpreter or compiler. This is the idea behind a **GenerativeSlot**. + +A `GenerativeSlot` is a function whose implementation is provided by an LLM. In Mellea, you define these using the `@generative` decorator. The function signature specifies the interface, and the docstring (or type annotations) guide the LLM in producing the output. + +#### Example: Sentiment Classifier + +Let's start with a simple example: a function that classifies the sentiment of a string as "positive" or "negative". + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/sentiment_classifier.py#L1-L13 +from typing import Literal +from mellea import generative, start_session + +@generative +def classify_sentiment(text: str) -> Literal["positive", "negative"]: + """Classify the sentiment of the input text as 'positive' or 'negative'.""" + ... + +m = start_session() +sentiment = classify_sentiment(m, text="I love this!") +print("Output sentiment is:", sentiment) +``` + +Here, `classify_sentiment` is a GenerativeSlot: it looks like a normal function, but its implementation is handled by the LLM. The type annotation (`Literal["positive", "negative"]`) constrains the output, and the prompt is automatically constructed from the function signature and docstring. + +Many more examples of generative slots are provided in the `docs/examples` directory. + + + +Generative slots can also be implemented as code-generation calls instead of black-box structured output generators. This is most useful when correct code generation is difficult without some dynamic analysis (i.e., runtime information). In these cases, the problem can be solved by prompting with a FiTM code generation request, augmented with pieces of runtime state. This advanced functionality may result in untrusted code execution, and should therefore be used with caution and/or in conjunction with some combination of sandboxing and human validation prior to execution. + + + +#### Using Generative slots to Provide Compositionality Across Module Boundaries + +Instruct-validate-repair provides compositionality within a given module. As the examples listed above demonstrate, generative slots can do the same. But generative slots are not just about local validity; their real power comes from safe interoperability between independently designed systems. + +Consider the following two independently developed libraries: a **Summarizer** library that contains a set of functions for summarizing various types of documents, and a **Decision Aides** library that aides in decision making for particular situations. + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/compositionality_with_generative_slots.py#L1-L18 +from mellea import generative + +## The Summarizer Library +@generative +def summarize_meeting(transcript: str) -> str: + """Summarize the meeting transcript into a concise paragraph of main points.""" + +@generative +def summarize_contract(contract_text: str) -> str: + """Produce a natural language summary of contract obligations and risks.""" + +@generative +def summarize_short_story(story: str) -> str: + """Summarize a short story, with one paragraph on plot and one paragraph on broad themes.""" +``` + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/compositionality_with_generative_slots.py#L20-L33 +from mellea import generative + +## The Decision Aides Library +@generative +def propose_business_decision(summary: str) -> str: + """Given a structured summary with clear recommendations, propose a business decision.""" + +@generative +def generate_risk_mitigation(summary: str) -> str: + """If the summary contains risk elements, propose mitigation strategies.""" + +@generative +def generate_novel_recommendations(summary: str) -> str: + """Provide a list of novel recommendations that are similar in plot or theme to the short story summary.""" +``` + +Notice that these two libraries do not necessarily always compose -- meeting notes may or may not contain semantic content for which risk analysis even makes sense. + +To help us compose these libraries, we introduce a set of contracts that gate function composition and then use those contracts to short-circuit non-sensical compositions of library components: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/compositionality_with_generative_slots.py#L36-L52 +from mellea import generative +from typing import Literal + +## Compose the libraries. +@generative +def has_structured_conclusion(summary: str) -> Literal["yes", "no"]: + """Determine whether the summary contains a clearly marked conclusion or recommendation.""" + +@generative +def contains_actionable_risks(summary: str) -> Literal["yes", "no"]: + """Check whether the summary contains references to business risks or exposure.""" + +@generative +def has_theme_and_plot(summary: str) -> Literal["yes", "no"]: + """Check whether the summary contains both a plot and thematic elements.""" +``` + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/compositionality_with_generative_slots.py#L55-L129 +from mellea import start_session + +m = start_session() +transcript = """Meeting Transcript: Market Risk Review -- Self-Sealing Stembolts Division +Date: December 1, 3125 +Attendees: + +Karen Rojas, VP of Product Strategy + +Derek Madsen, Director of Global Procurement + +Felicia Zheng, Head of Market Research + +Tom Vega, CFO + +Luis Tran, Engineering Liaison + +Karen Rojas: +Thanks, everyone, for making time on short notice. As you've all seen, we've got three converging market risks we need to address: tariffs on micro-carburetors, increased adoption of the self-interlocking leafscrew, and, believe it or not, the "hipsterfication" of the construction industry. I need all on deck and let's not waste time. Derek, start. + +Derek Madsen: +Right. As of Monday, the 25% tariff on micro-carburetors sourced from the Pan-Alpha Centauri confederacy is active. We tried to pre-purchase a three-month buffer, but after that, our unit cost rises by $1.72. That's a 9% increase in the BOM cost of our core model 440 stembolt. Unless we find alternative suppliers or pass on the cost, we're eating into our already narrow margin. + +Tom Vega: +We cannot absorb that without consequences. If we pass the cost downstream, we risk losing key mid-tier OEM clients. And with the market already sniffing around leafscrew alternatives, this makes us more vulnerable. + +Karen: +Lets pause there. Felicia, give us the quick-and-dirty on the leafscrew. + +Felicia Zheng: +It's ugly. Sales of the self-interlocking leafscrewβ€”particularly in modular and prefab constructionβ€”are up 38% year-over-year. It's not quite a full substitute for our self-sealing stembolts, but they are close enough in function that some contractors are making the switch. Their appeal? No micro-carburetors, lower unit complexity, and easier training for install crews. We estimate we've lost about 12% of our industrial segment to the switch in the last two quarters. + +Karen: +Engineering, Luis; your take on how real that risk is? + +Luis Tran: +Technically, leafscrews are not as robust under high-vibration loads. But here's the thing: most of the modular prefab sites don not need that level of tolerance. If the design spec calls for durability over 10 years, we win. But for projects looking to move fast and hit 5-year lifespans? The leafscrew wins on simplicity and cost. + +Tom: +So they're eating into our low-end. That's our volume base. + +Karen: +Exactly. Now let's talk about this last one: the β€œhipsterfication” of construction. Felicia? + +Felicia: +So this is wild. We're seeing a cultural shift in boutique and residential constructionβ€”especially in markets like Beckley, West Sullivan, parts of Osborne County, where clients are requesting "authentic" manual fasteners. They want hand-sealed bolts, visible threads, even mismatched patinas. It's an aesthetic thing. Function is almost secondary. Our old manual-seal line from the 3180s? People are hunting them down on auction sites. + +Tom: +Well, I'm glad I don't have to live in the big cities... nothing like this would ever happen in downt-to-earth places Brooklyn, Portland, or Austin. + +Luis: +We literally got a request from a design-build firm in Keough asking if we had any bolts β€œpre-distressed.” + +Karen: +Can we spin this? + +Tom: +If we keep our vintage tooling and market it right, maybe. But that's niche. It won't offset losses in industrial and prefab. + +Karen: +Not yet. But we may need to reframe it as a prestige lineβ€”low volume, high margin. Okay, action items. Derek, map alternative micro-carburetor sources. Felicia, get me a forecast on leafscrew erosion by sector. Luis, feasibility of reviving manual seal production. Tom, let's scenario-plan cost pass-through vs. feature-based differentiation. + +Let's reconvene next week with hard numbers. Thanks, all.""" +summary = summarize_meeting(m, transcript=transcript) + +if contains_actionable_risks(m, summary=summary) == "yes": + mitigation = generate_risk_mitigation(m, summary=summary) + print(f"Mitigation: {mitigation}") +else: + print("Summary does not contain actionable risks.") +if has_structured_conclusion(m, summary=summary) == "yes": + decision = propose_business_decision(m, summary=summary) + print(f"Decision: {decision}") +else: + print("Summary lacks a structured conclusion.") +``` + +Without these Hoare-style contracts, the only way to ensure composition is to couple the libraries, either by rewriting `summarize_meeting` to conform to `propose_business_decision`, or adding Requirements to `propose_business_decision` that may silently fail if unmet. These approaches can work, but require tight coupling between these two otherwise loosely coupled libraries. + +With contracts, we **decouple** the libraries without sacrificing safe dynamic composition, by moving the coupling logic into pre- and post-condition checks. This is another LLM-native software engineering pattern: **guarded nondeterminism**. diff --git a/docs/docs/core-concept/instruct-validate-repair.mdx b/docs/docs/core-concept/instruct-validate-repair.mdx new file mode 100644 index 000000000..cca135124 --- /dev/null +++ b/docs/docs/core-concept/instruct-validate-repair.mdx @@ -0,0 +1,41 @@ +--- +title: "Instruct-Validate-Repair" +--- + +Now, we bring it all together into a first generative program using the instruct-validate-repair pattern: + +```python +import mellea +from mellea.stdlib.requirement import req, check, simple_validate +from mellea.stdlib.sampling import RejectionSamplingStrategy + +def write_email(m: mellea.MelleaSession, name: str, notes: str) -> str: + email_candidate = m.instruct( + "Write an email to {{name}} using the notes following: {{notes}}.", + requirements=[ + req("The email should have a salutation"), # == r1 + req( + "Use only lower-case letters", + validation_fn=simple_validate(lambda x: x.lower() == x), + ), # == r2 + check("Do not mention purple elephants."), # == r3 + ], + strategy=RejectionSamplingStrategy(loop_budget=5), + user_variables={"name": name, "notes": notes}, + return_sampling_results=True, + ) + if email_candidate.success: + return str(email_candidate.result) + else: + return email_candidate.sample_generations[0].value + + +m = mellea.start_session() +print(write_email(m, "Olivia", + "Olivia helped the lab over the last few weeks by organizing intern events, advertising the speaker series, and handling issues with snack delivery.")) +``` + + +The instruct() method is a convenience function that creates and then generates from an Instruction Component, req() similarly wraps the Requirement Component, etc. Chapter 2 will takes us one level deeper into understanding what happens under the hood when you call m.instruct(). + + diff --git a/docs/docs/core-concept/interoperability.mdx b/docs/docs/core-concept/interoperability.mdx new file mode 100644 index 000000000..20bae8723 --- /dev/null +++ b/docs/docs/core-concept/interoperability.mdx @@ -0,0 +1,65 @@ +--- +title: "Interoperability with Other Frameworks" +description: "Connect with Mellea programs with other (agentic) frameworks." +sidebarTitle: "Framework Interoperability" +--- + +Mellea programs are, at last, just Python programs. Mellea programs can be shared via the Model Context Protocol or via the A2A protocol. Mellea programs can also consume tools and agents that implement these protocols. + +### Simple MCP server running Mellea + +Like we mentioned, Mellea are at the end python programs. We can wrap a simple `mcp` server around a program and use the server as-is. Here is an example using [Pydantic AI's inbuild MCP server](https://ai.pydantic.dev/mcp/server/). + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/agents/mcp_example.py#L15-L40 +## Create an MCP server +mcp = FastMCP("Demo") + + +@mcp.tool() +def write_a_poem(word_limit: int) -> str: + """Write a poem with a word limit.""" + m = MelleaSession(OllamaModelBackend(model_ids.QWEN3_8B)) + wl_req = Requirement( + f"Use only {word_limit} words.", + validation_fn=simple_validate(lambda x: len(x.split(" ")) < word_limit), + ) + + res = m.instruct( + "Write a poem", + requirements=[wl_req], + strategy=RejectionSamplingStrategy(loop_budget=4), + ) + assert isinstance(res, ModelOutputThunk) + return str(res.value) + +if __name__ == '__main__': + mcp.run() +``` + +### Running Mellea programs as an openai compatible server (Experimental) + +We also provide an expiermental `m serve` utility for serving up an OpenAI-compatible **chat** endpoint. This allows you to write `m` programs that masquerade as a "model". To learn more about this functionality, run: + +```shell +m serve --help +``` + +#### Example `m serve` application + +While deploying programs using `m serve`, it is important for the programs to follow a specific structure. The programs needs a have function called `serve` with the following signature: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/agents/m_serve_example.py#L25-L29 +def serve( + input: list[ChatMessage], + model_options: None | dict = None, + **kwargs +) +``` + +the `m serve` command then subsequently takes this function and runs a server that is openai compatible. For more information, please have a look at [this file](./examples/tutorial/m_serve_example.py) for how to write an `m serve` compatible program. To run the example: + +```shell +m serve docs/examples/tutorial/m_serve_example.py +``` diff --git a/docs/docs/core-concept/mobjects.mdx b/docs/docs/core-concept/mobjects.mdx new file mode 100644 index 000000000..deb0107ae --- /dev/null +++ b/docs/docs/core-concept/mobjects.mdx @@ -0,0 +1,175 @@ +--- +title: "MObjects" +description: "Bringing object-oriented programming to LLMs with MObjects" +--- + +Object-oriented programming (OOP) is a powerful paradigm for organizing code: you group related data and the methods that operate on that data into classes. In the world of LLMs, a similar organizational principle emergesβ€”especially when you want to combine structured data with LLM-powered "tools" or operations. This is where Mellea's **MObject** abstraction comes in. + +**The MObject Pattern:** You should store data alongside its relevant operations (tools). This allows LLMs to interact with both the data and methods in a unified, structured manner. It also simplifies the process of exposing only the specific fields and methods you want the LLM to access. + +The `MOBject` pattern also provides a way of evolving existing classical codebases into generative programs. Mellea's `@mify` decorator lets you turn **any** class into an `MObject`. If needed, you can specify which fields and methods are included, and provide a template for how the object should be represented to the LLM. + +### Example: A Table as an MObject + +Suppose you have a table of sales data and want to let the LLM answer questions about it: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/table_mobject.py#L1-L31 +import mellea +from mellea.stdlib.mify import mify, MifiedProtocol +import pandas +from io import StringIO + + +@mify(fields_include={"table"}, template="{{ table }}") +class MyCompanyDatabase: + table: str = """| Store | Sales | + | ---------- | ------- | + | Northeast | $250 | + | Southeast | $80 | + | Midwest | $420 |""" + + def transpose(self): + pandas.read_csv( + StringIO(self.table), + sep='|', + skipinitialspace=True, + header=0, + index_col=False + ) + + +m = mellea.start_session() +db = MyCompanyDatabase() +assert isinstance(db, MifiedProtocol) +answer = m.query(db, "What were sales for the Northeast branch this month?") +print(str(answer)) +``` + +In this example, the `@mify` decorator transforms MyCompanyDatabase into an MObject. Only the _table_ field is incorporated into the Large Language Model (LLM) prompt, as designated by `fields_include`. The `template` describes how the object is presented to the model. The `.query()` method now enables you to pose questions about the data, allowing the LLM to utilize the table as contextual information. + +**When to use MObjects?** +MObjects offer a sophisticated and modular approach to linking structured data with operations powered by Large Language Models (LLMs). They provide precise control over what the LLM can access, allowing for the exposure of custom tools or methods. This design pattern can be particularly useful for tool-calling, document querying, and any scenario where data needs to be "wrapped" with behaviors accessible to an LLM. + +We'll see more advanced uses of MObjects -- including tool registration and custom operations -- in our next case study on working with rich-text documents. + +### Case Study: Working with Documents + +Mellea makes it easy to work with documents. For that we provide `mified` wrappers +around [docling](https://github.com/docling-project/docling) documents. + +Let's create a RichDocument from an arxiv paper: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/document_mobject.py#L1-L3 +from mellea.stdlib.docs.richdocument import RichDocument +rd = RichDocument.from_document_file("https://arxiv.org/pdf/1906.04043") +``` + +this loads the PDF file and parses it using the Docling parser into an +intermediate representation. + +From the rich document we can extract some document content, e.g. the +first table: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/document_mobject.py#L5-L8 +from mellea.stdlib.docs.richdocument import Table +table1: Table = rd.get_tables()[0] +print(table1.to_markdown()) +``` + +Output: + +```markdown +| Feature | AUC | +| ------------------------------------ | ----------- | +| Bag of Words | 0.63 Β± 0.11 | +| (Test 1 - GPT-2) Average Probability | 0.71 Β± 0.25 | +| (Test 2 - GPT-2) Top-K Buckets | 0.87 Β± 0.07 | +| (Test 1 - BERT) Average Probability | 0.70 Β± 0.27 | +| (Test 2 - BERT) Top-K Buckets | 0.85 Β± 0.09 | +``` + +The `Table` object is Mellea-ready and can be used immediately with LLMs. +Let's just get it to work: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/document_mobject.py#L10-L24 +from mellea.backends.types import ModelOption +from mellea import start_session + +m = start_session() +for seed in [x*12 for x in range(5)]: + table2 = m.transform(table1, + "Add a column 'Model' that extracts which model was used or 'None' if none.", + model_options={ModelOption.SEED: seed}) + if isinstance(table2, Table): + print(table2.to_markdown()) + break + else: + print(f"==== TRYING AGAIN after non-useful output.====") +``` + +In this example, `table1` should be transformed to have an extra column `Model` which contains the model string from the `Feature` column or `None` if there is none. Iterating through some seed values, we try to find a version which returns a parsable representation of the table. If found, print it out. + +The output for this code sample could be: + +```markdown +table1= +| Feature | AUC | +|--------------------------------------|-------------| +| Bag of Words | 0.63 Β± 0.11 | +| (Test 1 - GPT-2) Average Probability | 0.71 Β± 0.25 | +| (Test 2 - GPT-2) Top-K Buckets | 0.87 Β± 0.07 | +| (Test 1 - BERT) Average Probability | 0.70 Β± 0.27 | +| (Test 2 - BERT) Top-K Buckets | 0.85 Β± 0.09 | + +===== 18:21:00-WARNING ====== +added a tool message from transform to the context as well. + +table2= +| Feature | AUC | Model | +|--------------------------------------|-------------|---------| +| Bag of Words | 0.63 Β± 0.11 | None | +| (Test 1 - GPT-2) Average Probability | 0.71 Β± 0.25 | GPT-2 | +| (Test 2 - GPT-2) Top-K Buckets | 0.87 Β± 0.07 | GPT-2 | +| (Test 1 - BERT) Average Probability | 0.70 Β± 0.27 | BERT | +| (Test 2 - BERT) Top-K Buckets | 0.85 Β± 0.09 | BERT | +``` + +The model has done a great job at fulfilling the task and coming back with a parsable syntax. You could now call (e.g. `m.query(table2, "Are there any GPT models referenced?")`) or continue transformation (e.g. `m.transform(table2, "Transpose the table.")`). + +### MObject methods are tools + +When an object is `mified` all methods with a docstring get registered as tools for the LLM call. You can control if you only want a subset of these functions to be exposed by two parameters (`funcs_include` and `funcs_exclude`): + +```python +from mellea.stdlib.mify import mify + +@mify(funcs_include={"from_markdown"}) +class MyDocumentLoader: + def __init__(self) -> None: + self.content = "" + + @classmethod + def from_markdown(cls, text: str) -> "MyDocumentLoader": + doc = MyDocumentLoader() + # Your parsing functions here. + doc.content = text + return doc + + def do_hoops(self) -> str: + return "hoop hoop" +``` + +Above, the `mified` class `MyDocumentLoader` only exposes the `from_markdown()` method as tool to the LLM. + +Here is an example, how the methods are handled with an LLM call. Imagine the following two calls that should lead to the same result: + +```python +table1_t = m.transform(table1, "Transpose the table.") # the LLM function +table1_t2 = table1.transpose() # the table method +``` + +Every native function of `Table` is automatically registered as a tool to the transform function. I.e., here the `.transform()` function calls the LLM and the LLM will get back suggesting to use the very own `.transpose()` function to achieve the result - it will also give you a friendly warning that you could directly use the function call instead of using the transform function. diff --git a/docs/docs/core-concept/modeloptions.mdx b/docs/docs/core-concept/modeloptions.mdx new file mode 100644 index 000000000..8472665f2 --- /dev/null +++ b/docs/docs/core-concept/modeloptions.mdx @@ -0,0 +1,74 @@ +--- +title: "Model Options" +--- + +Most LLM apis allow you to specify options to modify the request: temperature, max_tokens, seed, etc... Mellea supports specifying these options during backend initialization and when calling session-level functions with the model_options parameter. + +Mellea supports many different types of inference engines (ollama, openai-compatible vllm, huggingface, etc.). These inference engines, which we call Backends, provide different and sometimes inconsistent dict keysets for specifying model options. For the most common options among model providers, Mellea provides some engine-agnostic options, which can be used by typing [`ModelOption`](https://github.com/generative-computing/mellea/blob/main/mellea/backends/types.py) in your favorite IDE; for example, temperature can be specified as `{"{ModelOption.TEMPERATURE": 0}` and this will "just work" across all inference engines. + +You can add any key-value pair supported by the backend to the model_options dictionary, and those options will be passed along to the inference engine \*even if a Mellea-specific ModelOption. Key is defined for that option. This means you can safely copy over model option parameters from exiting codebases as-is: + +```python +import mellea +from mellea.backends.types import ModelOption +from mellea.backends.ollama import OllamaModelBackend +from mellea.backends import model_ids + +m = mellea.MelleaSession(backend=OllamaModelBackend( + model_id=model_ids.IBM_GRANITE_3_2_8B, + model_options={ModelOption.SEED: 42} +)) + +answer = m.instruct( + "What is 2x2?", + model_options={ + "temperature": 0.5, + "num_predict": 5, + }, +) + +print(str(answer)) +``` + +You can always update the model options of a given backend; however, Mellea offers a few additional approaches to changing the specified options. + +- Specifying options during m.\* calls. Options specified here will update the model options previously specified for that call only. If you specify an already existing key (with either the ModelOption.OPTION version or the native name for that option for the given api), the value will be the one associated with the new key. If you specify the same key in different ways (ie ModelOption.TEMPERATURE and temperature), the ModelOption.OPTION key will take precedence. + +```python +# options passed during backend initialization +backend_model_options = { + "seed": "1", + ModelOption.MAX_NEW_TOKENS: 1, + "temperature": 1, +} + +# options passed during m.* +instruct_model_options = { + "seed": "2", + ModelOption.SEED: "3", + "num_predict": 2, +} + +# options passed to the model provider API +final_options = { + "temperature": 1, + "seed": 3, + "num_predict": 2 +} +``` + +- Pushing and popping model state. Sessions offer the ability to push and pop model state. This means you can temporarily change the model_options for a series of calls by pushing a new set of model_options and then revert those changes with a pop. + +##System Messages +In Mellea, ModelOption.SYSTEM_PROMPT is the recommended way to add/change the system message for a prompt. Setting it at the backend/session level will use the provided message as the system prompt for all future calls (just like any other model option). Similarly, you can specify the system prompt parameter for any session-level function (like m.instruct) to replace it for just that call. + +Mellea recommends applying the system message this way because some model-provider apis don't properly serialize messages with the system role and expect them as a separate parameter. + +##Conclusion +We have now worked up from a simple "Hello, World" example to our first generative programming design pattern: Instruct - Validate - Repair (IVR). + +When LLMs work well, the software developer experiences the LLM as a sort of oracle that can handle most any input and produce a sufficiently desirable output. When LLMs do not work at all, the software developer experiences the LLM as a naive markov chain that produces junk. In both cases, the LLM is just sampling from a distribution. + +The crux of generative programming is that most applications find themselves somewhere in-between these two extremes -- the LLM mostly works, enough to demo a tantilizing MVP. But failure modes are common enough and severe enough that complete automation is beyond the developer's grasp. + +Traditional software deals with failure modes by carefully describing what can go wrong and then providing precise error handling logic. When working with LLMs, however, this approach suffers a Sysiphean curse. There is always one more failure mode, one more special case, one more new feature request. In the next chapter, we will explore how to build generative programs that are compositional and that grow gracefully. diff --git a/docs/docs/core-concept/prompt-engineering.mdx b/docs/docs/core-concept/prompt-engineering.mdx new file mode 100644 index 000000000..cc3497f49 --- /dev/null +++ b/docs/docs/core-concept/prompt-engineering.mdx @@ -0,0 +1,53 @@ +--- +title: "Prompt Engineering" +--- + +Most backends operate on text. For these backends/models, Mellea has an opinionated stance on how to transform Python objects into text: the `TemplateFormatter`. + +In most cases, you will want to create templates when adding a new component to the standard library or when customizing an existing component for a new model. + +## Templates + +Mellea's `TemplateFormatter` uses jinja2 templates to format objects when passing them to models for generation. + +These templates can be stored directly in the class/object, or, more typically, the templates are stored in a directory, with each object having a specific file. For examples of the templates, see `mellea/templates/prompts/default`. +See the [customization section](/core-concept/prompt-customization) below for a description of how the formatter chooses which template to use. + +## Customization + +By writing a new template and/or changing the TemplateRepresentation of a component you can customize the textual representation. You can also customize based on the model. + +#### Choosing a Template + +Assuming a component's TemplateRepresentation contains a `template_order` field, the default TemplateFormatter grabs the relevant template by looking at the following places in order for each template in the `template_order`: + +1. the formatter's cached templates if the template has been looked up recently +2. the formatter's specified template path +3. the package that the object getting formatted is from (either 'mellea' or some third party package) + +If the default formatter searches the template path or the package, it uses the following logic: + +- look in the `.../templates/prompts/...` directory +- traverse sub-directories in that path that match the formatter's model id (ie `ibm-granite/granite-3.2-8b-instruct` will match `.../templates/prompts/granite/granite-3-2/instruct`) or default (ie `.../templates/prompts/default`) +- return the template at the deepest directory path +- the default template formatter assumes that a model will only have one match in any given directory; in other words, traversing a `templates` directory with both `prompts/granite/...` and `prompts/ibm/...` for `ibm-granite/granite-3.2-8b-instruct` should not happen + +#### Editing an Existing Class + +To customize the template and template representation of an existing class, simply create a new class that inherits from the class you want to edit. Then, override the format_for_llm function and create a new template. + +See [`mellea/docs/examples/mify/rich_document_advanced.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/mify/rich_document_advanced.py) + +## Template Representations + +Along with a template, each class/object needs to define the arguments that will be supplied when rendering the template. This happens in the component's `format_for_llm()` function. It returns either a string or a `TemplateRepresentation`. + +`string`: the simplest approach is for this method to return a string representation of the object. This avoids templating altogether. + +`TemplateRepresentation`: It can also return a `TemplateRepresentation` object. +This representation contains: - a reference to the component - a dictionary of arguments that will be passed to the template renderer - a list of tools/functions that relate to the component + +It also contains either of the following fields + +- template: a string representation of a jinja2 template that can be rendered with the provided args +- template_order: a list of strings describing the name of the template file to look up (without the ".jinja2" suffix); `*` denotes the class name. diff --git a/docs/docs/core-concept/requirements.mdx b/docs/docs/core-concept/requirements.mdx new file mode 100644 index 000000000..1ecaa8b19 --- /dev/null +++ b/docs/docs/core-concept/requirements.mdx @@ -0,0 +1,110 @@ +--- +title: "Requirements" +description: "Use pre- and post-conditions to validate your LLM outputs meet specific requirements." +--- + +But how do we know that the generated email is a good one? +Good generative programmers don't leave this up to chance -- instead, they use pre-conditions to ensure that inputs to the LLM are as expected and then check post-conditions to ensure that the LLM's outputs are fit-for-purpose. + +Suppose that in this case we want to ensure that the email has a salutation and contains only lower-case letters. We can capture these post-conditions by specifying **requirements** on the `m.instruct` call: + +```python +import mellea + +def write_email_with_requirements(m: mellea.MelleaSession, name: str, notes: str) -> str: + email = m.instruct( + "Write an email to {{name}} using the notes following: {{notes}}.", + requirements=[ + "The email should have a salutation", + "Use only lower-case letters", + ], + user_variables={"name": name, "notes": notes}, + ) + return str(email) + +m = mellea.start_session() +print(write_email_with_requirements( + m, + name="Olivia", + notes="Olivia helped the lab over the last few weeks by organizing intern events, advertising the speaker series, and handling issues with snack delivery.", +)) +``` + +We just added two requirements to the instruction which will be added to the model request. But we don't check yet if these requirements are satisfied. Let's add a **strategy** for validating the requirements: + +```python +import mellea +from mellea.stdlib.sampling import RejectionSamplingStrategy + +def write_email_with_strategy(m: mellea.MelleaSession, name: str, notes: str) -> str: + email_candidate = m.instruct( + "Write an email to {{name}} using the notes following: {{notes}}.", + requirements=[ + "The email should have a salutation", + "Use only lower-case letters", + ], + strategy=RejectionSamplingStrategy(loop_budget=5), + user_variables={"name": name, "notes": notes}, + return_sampling_results=True, + ) + if email_candidate.success: + return str(email_candidate.result) + else: + print("Expect sub-par result.") + return email_candidate.sample_generations[0].value + +m = mellea.start_session() +print( + write_email_with_strategy( + m, + "Olivia", + "Olivia helped the lab over the last few weeks by organizing intern events, advertising the speaker series, and handling issues with snack delivery.", + ) +) +``` + +A couple of things happened here. First, we added a sampling `strategy` to the instruction. +This strategy (`RejectionSamplingStrategy()`) checks if all requirements are met. +If any requirement fails, then the sampling strategy will sample a new email from the LLM. +This process will repeat until the `loop_budget` on retries is consumed or all requirements are met. + +Even with retries, sampling might not generate results that fulfill all requirements (`email_candidate.success==False`). +Mellea forces you to think about what it means for an LLM call to fail; +in this case, we handle the situation by simply returning the first sample as the final result. + + + +When using the `return_sampling_results=True` parameter, the `instruct()` function returns a `SamplingResult` object (not a `ModelOutputThunk`) which carries the full history of sampling and validation results for each sample. + + + +### Validating Requirements + +Now that we defined requirements and sampling we should have a +look into **how requirements are validated**. The default validation strategy is [LLM-as-a-judge](https://arxiv.org/abs/2306.05685). + +Let's look on how we can customize requirement definitions: + +```python +from mellea.stdlib.requirement import req, check, simple_validate + +requirements = [ + req("The email should have a salutation"), # == r1 + req("Use only lower-case letters", validation_fn=simple_validate(lambda x: x.lower() == x)), # == r2 + check("Do not mention purple elephants.") # == r3 +] +``` + +Here, the first requirement (r1) will be validated by LLM-as-a-judge on the output (last turn) of the instruction. This is the default behavior, since nothing else is specified. + +The second requirement (r2) simply uses a function that takes the output of a sampling step and returns a boolean value indicating (un-)successful validation. While the `validation_fn` parameter requires to run validation on the full session context (see [Chapter 7](#chapter-7-on-context-management)), Mellea provides a wrapper for simpler validation functions (`simple_validate(fn: Callable[[str], bool])`) that take the output string and return a boolean as seen in this case. + +The third requirement is a `check()`. Checks are only used for validation, not for generation. +Checks aim to avoid the "do not think about B" effect that often primes models (and humans) +to do the opposite and "think" about B. + + + +LLMaJ is not presumtively robust. Whenever possible, implement requirement validation using plain old Python code. When a model is necessary, it can often be a good idea to train a **calibrated** model specifically for your validation problem. [Chapter 6](#chapter-6-tuning-requirements-and-components) explains how to use Mellea's `m tune` subcommand to train your own LoRAs for requirement checking (and for other types of Mellea components as well). + + diff --git a/docs/docs/core-concept/tuning.mdx b/docs/docs/core-concept/tuning.mdx new file mode 100644 index 000000000..ca47b1000 --- /dev/null +++ b/docs/docs/core-concept/tuning.mdx @@ -0,0 +1,209 @@ +--- +title: "Tuning Requirements and Components" +sidebarTitle: "Tuning" +description: " Command-line tool for adapting base models like IBM Granite to custom tasks." +--- + +One of the main principles of generative programming is that you should prompt models in the same way that the models were aligned. But sometimes off-the-shelf models are insufficient. Here are some scenarios we have encountered: + +- you are introducing a custom Component with non-trivial semantics that are not well-covered by any existing model's training data +- off-shelf-models fail to recognize important business constraints +- you have a proprietary labeled dataset which you would like to use for improving classification, intent detection, or another requirement-like task. + +The third case is very common. In this tutorial we will explore a case-study focused on that case. we walk through fine-tuning a LoRA adapter using classification data to enhance a requirement checker. We then explain how this fine-tuned adapter can be incorporated into a Mellea program. + +### Problem Statement + +The Stembolt MFG Corporation we encountered in [Generative Slots](/core-concept/generative-slots) is now is developing an AI agent to improve its operational efficiency and resilience. A key component of this pipeline is the AutoTriage module. AutoTriage is responsible for automatically mapping free-form defect reports into categories like mini-carburetor, piston, connecting rod, flywheel, piston rings, no_failure. + +To ensure the generated output meets specific downstream system requirements, we require that each defect summary contains an identified failure mode. Unfortunately, LLMs perform poorly on this task out-of-the-box; stembolts are a niche device and detect reports are not commonly discussed on the open internet. Fortunately, over the years, Stembolt MFG has collected a large dataset mapping notes to part failures, and this is where the classifier trained via aLoRA comes in. + +Here's peak at a small subset of Stembolt MFG's carefully [dataset of stembolt failure modes](https://github.com/generative-computing/mellea/blob/main/docs/examples/aLora/stembolt_failure_dataset.jsonl): + + + +```json JSON +{"item": "Observed black soot on intake. Seal seems compromised under thermal load.", "label": "piston rings"} +{"item": "Rotor misalignment caused torsion on connecting rod. High vibration at 3100 RPM.", "label": "connecting rod"} +{"item": "Combustion misfire traced to a cracked mini-carburetor flange.", "label": "mini-carburetor"} +{"item": "stembolt makes a whistling sound and does not complete the sealing process", "label": "no_failure"} +``` + + + +Notice that the last item is labeled "no_failure", because the root cause of that issue is user error. Stembolts are difficult to use and require specialized training; approximately 20% of reported failures are actually operator error. Classifying operator error as early in the process as possible -- and with sufficient accuracy -- is an important KPI for the customer service and repairs department of the Stembolt division. + +Let's see how Stembolt MFG Corporation can use tuned LoRAs to implement the AutoTriage step in a larger Mellea application. + +### Training the aLoRA Adapter + +Mellea provides a command-line interface for training [LoRA](https://arxiv.org/abs/2106.09685) or [aLoRA](https://github.com/IBM/activated-lora) adapters. Classical LoRAs must re-process our entire context, which can get expensive for quick checks happening within an inner loop (such as requirement checking). The aLoRA method allows us to adapt a base LLM to new tasks, and then run the adapter with minimal compute overhead. The adapters are fast to train and fast to switch between. + +We will train a lightweight adapter with the `m alora train` command on this small dataset: + + + +```bash Bash +m alora train /to/stembolts_data.jsonl \ + --promtfile ./prompt_config.json \ + --basemodel ibm-granite/granite-3.2-8b-instruct \ + --outfile ./checkpoints/alora_adapter \ + --adapter alora \ + --epochs 6 \ + --learning-rate 6e-6 \ + --batch-size 2 \ + --max-length 1024 \ + --grad-accum 4 +``` + + +The default prompt format is `<|start_of_role|>check_requirement<|end_of_role|>`; this prompt should be appended to the context just before activated our newly trained aLoRA. If needed, you can customize this prompt using the `--promptfile` argument. + +#### Parameters + +While training adapters, you can easily tuning the hyper-parameters as below: + +| Flag | Type | Default | Description | +| ----------------- | ------- | ---------- | ----------------------------------------- | +| `--basemodel` | `str` | _required_ | Hugging Face model ID or local path | +| `--outfile` | `str` | _required_ | Directory to save the adapter weights | +| `--adapter` | `str` | `"alora"` | Choose between `alora` or standard `lora` | +| `--epochs` | `int` | `6` | Number of training epochs | +| `--learning-rate` | `float` | `6e-6` | Learning rate | +| `--batch-size` | `int` | `2` | Per-device batch size | +| `--max-length` | `int` | `1024` | Max tokenized input length | +| `--grad-accum` | `int` | `4` | Gradient accumulation steps | +| `--promptfile` | `str` | None | Directory to load the prompt format | + +### Upload to Hugging Face (Optional) + +To share or reuse the trained adapter, use the `m alora upload` command to publish your trained adapter: + + + +```bash Bash +m alora upload ./checkpoints/alora_adapter \ + --name stembolts/failuremode-alora +``` + + +This will: + +- Create the Hugging Face model repo (if it doesn't exist) +- Upload the contents of the `outfile` directory +- Requires a valid `HF_TOKEN` via `huggingface-cli login` + +If you get a permissions error, make sure you are logged in to Huggingface: + + + ```bash Bash huggingface-cli login # Optional: only needed for uploads ``` + + + + **Warning on Privacy:** Before uploading your trained model to the Hugging + Face Hub, review the visibility carefully. If you will be sharing your model + with the public, consider whether your training data includes any proprietary, + confidential, or sensitive information. Language models can unintentionally + memorize details, and this problem compounds when operating over small or + domain-specific datasets. + +### Integrating the Tuned Model into Mellea + +After training an aLoRA classifier for our task, we would like to use that classifier to check requirements in a Mellea program. First, we need to setup our backend for using the aLoRA classifier: + + +```python Python +backend = ... + +# assumption the `m` backend must be a Huggingface or alora-compatible vLLM backend, with the same base model from which we trained the alora. + +# ollama does NOT yet support LoRA or aLoRA adapters. + +backend.add_alora( +HFConstraintAlora( +name="stembolts_failuremode_alora", +path_or_model_id="stembolts/failuremode-alora", # can also be the checkpoint path +generation_prompt="<|start_of_role|>check_requirement<|end_of_role|>", +backend=m.backend, +) +) + +```` + +In the above arguments, `path_or_model_id` refers to the model checkpoint from last step, i.e., the `m alora train` process. + + +The `generation_prompt` passed to your `backend.add_alora` call should exactly match the prompt used for training. + +We are now ready to create a M session, define the requirement, and run the instruction: + + +```python Python +m = MelleaSession(backend, ctx=ChatContext()) +failure_check = req("The failure mode should not be none.") +res = m.instruct("Write triage summaries based on technician note.", requirements=[failure_check]) +```` + + + +To make the requirement work well with the well-trained alora model, we need also define the requirement validator function: + + + +```python Python +def validate_reqs(reqs: list[Requirement]): + """Validate the requirements against the last output in the session.""" + print("==== Validation =====") + print( + "using aLora" + if backend.default_to_constraint_checking_alora + else "using NO alora" + ) + + # helper to collect validation prompts (because validation calls never get added to session contexts). + logs: list[GenerateLog] = [] # type: ignore + + # Run the validation. No output needed, because the last output in "m" will be used. Timing added. + start_time = time.time() + val_res = m.validate(reqs, generate_logs=logs) + end_time = time.time() + delta_t = end_time - start_time + + print(f"Validation took {delta_t} seconds.") + print("Validation Results:") + + # Print list of requirements and validation results + for i, r in enumerate(reqs): + print(f"- [{val_res[i]}]: {r.description}") + + # Print prompts using the logs list + print("Prompts:") + for log in logs: + if isinstance(log, GenerateLog): + print(f" - {{prompt: {log.prompt}\n raw result: {log.result.value} }}") # type: ignore + + return end_time - start_time, val_res +``` + + +Then we can use this validator function to check the generated defect report as: + + + +```python Python +validate_reqs([failure_check]) +``` + + + +If the constraint alora is added to a model, it will be used by default. You can also force to run without alora as: + + + +```python Python +backend.default_to_constraint_checking_alora = False +``` + + +In this chapter, we have seen how a classification dataset can be used to tune a +LoRA adapter on proprietary data. We then saw how the resulting model can be incorporated into a Mellea generative program. This is the tip of a very big iceberg. diff --git a/docs/docs/dev/constrained-decoding.mdx b/docs/docs/dev/constrained-decoding.mdx new file mode 100644 index 000000000..bd24f89e1 --- /dev/null +++ b/docs/docs/dev/constrained-decoding.mdx @@ -0,0 +1,28 @@ +--- +title: "Constrained Decoding" +description: "Developer notes on Constrained Decoding." +--- + +# Constrained Decoding + +## How do constraints get defined? + +Should we be thinking bigger than pydantic? Should it be possible to pass arbitrary grammars? If so, what's the abstract interface for those? Should this be factored out into llm-io? + +## How do constraints get passed around? + +The `m` framework currently uses the `format` argument to pydantic schemas, **outside of model args**. Should we be using `@@@format@@@` within ModelArgs instead? Hendrik describes the behavior of model args like this (paraghased by Nathan): + +> If a keyword had meaning across multiple types of backends, and if it means the same thing in all of those backends but has different names, then we use the `@@@`-style args so that the user can pass these args across all backends in the same way. Otherwise, the arguments in model_args are passed along verbatim. + +This argues for `@@@format@@@` as opposed to a dedicated `format` option in the method signature. Or, in the alternative, for an entire re-think of ModelArgs. + +## Integration with grammar-targeted LLMs + +Some LLMs target generation in a particular grammar. Examples include: + * ALoRAs that target very simple grammars + * code generatorrs that target particular PLs + * models (or model modes) tuned to generate JSON + * models (or model modes) tuned to generate YAML or particular fragments of YAML (such as k8s configs) + +Should we be doing constrained decoding in these cases, or should we treat deviation from the grammar as an exception? Probably the answer is "it depends". Masataro had a nice idea of **taking the sum of logits of grammatically feasible completions** and ensuring that this sum is above some threshold. How would supporting this change the interface described in the "How do constraints get defined?" section? \ No newline at end of file diff --git a/docs/docs/dev/generate-ctx-signature.mdx b/docs/docs/dev/generate-ctx-signature.mdx new file mode 100644 index 000000000..b04a35f6e --- /dev/null +++ b/docs/docs/dev/generate-ctx-signature.mdx @@ -0,0 +1,20 @@ +--- +title: "Splitting the `head` and `tail` of the Context on generate calls" +description: "Developer notes on Splitting the `head` and `tail` of the Context on generate calls." +--- + +# Splitting the `head` and `tail` of the Context on generate calls + +We have decided to split the context into an "action" and "the rest of the context"; i.e., instead of `generate : ctx, ... -> output`, we use `generate: action, ctx, ... -> output`. + + This "car/cdr" separation of the final element from the rest is done because there are many situations where many different requests are made over the same context. Examples include multiple requirement checking, rejection sampling, and so on. + +Advantages of this approach: + * shared context is referentially equal, which makes memory management extremely simple. + * Certain types of code -- especially requirement checking -- are much easier to write. Because the Context does not have to be deep-copied. + +Disadvantages of this approach: + * This solution is extremely specific to a few examples/patterns from stdlib. When we have `span`-based backends, there could be many different points in the span from which generation could continue. The solutino to that problem will sort of rhyme -- separating the generation target from th rest of the context.t However, the current signature is NOT a good solution. So it's possible we will have to change how this works in the fture. + * Not parsimonious with how context is normally used, and perhaps confusing, particularly in the most-common situation whwere the context is "just" a normal chat history. + * It is not yet clear what meaning this will have when contexts cannot be linearized. In particular: what if there's a poset and multiple generation opportunities within that poset? How do we "place the cursor"? Does this design choice make it harder to "place the cursor"? + * Contexts are not in fact immutable, so we have to be extremely careful about when a context gets modified, and may even need to introduce semaphores. \ No newline at end of file diff --git a/docs/docs/dev/intrinsics-and-adapters.mdx b/docs/docs/dev/intrinsics-and-adapters.mdx new file mode 100644 index 000000000..9242c93b3 --- /dev/null +++ b/docs/docs/dev/intrinsics-and-adapters.mdx @@ -0,0 +1,44 @@ +--- +title: "Intrinsics and Adapters" +description: "Developer notes on Intrinsics and Adapters." +--- + +# Intrinsics and Adapters + +Note: Mellea currently only supports GraniteCommonAdapters and Intrinsics. + +## Basics +In Mellea, intrinsics are a type of Component that signals one or more of the following to a backend: +- a special adapter must be used for generation +- the input/output for generation must be transformed in a particular way +- the model options must be modified in a particular way + +These changes only happen when the intrinsic is the "action" of the request. Intrinsics should usually not be used as an item in the context of generation (in fact, by default, Intrinsics have no string representation). + +These changes are specified by the Adapter that corresponds to a given Intrinsic. Matching happens based on the adapter name and type. + +## Parts of an Intrinsic +Intrinsics specify: +- an adapter name (ie requirement_check) +- types of adapters suitable to be used (ie alora) +- any kwargs necessary (ie a requirement like "make sure the last user message is...") + +## Parts of an Adapter +Adapters specify: +- compatible backends +- adapter type +- functions for getting a path to load them + +## Using Intrinsics +Mellea Intrinsics currently utilize the granite-common package for loading adapters and formatting input/outputs (https://github.com/ibm-granite/granite-common). This means Mellea only allows intrinsics/adapters that follow this pattern. + +## Needed Future Work +### Custom Adapters / Intrinsics +Mellea should support custom intrinsic / adapter implementations. To do this: +- make backend `_generate_from_intrinsic` functions generic and utilize only common adapter functions +- adapters must specify a transformation function that encapsulates the input/output modifications necessary for their generation requests + +### Concurrency Checks +Some backends (currently only LocalHFBackend) that allow adapters to be loaded, cannot independently utilize these adapters without impacting other generation requests. + +These backends should support a generation lock that ensures requests are only performed when the correct set of adapters (or no adapters) are active. \ No newline at end of file diff --git a/docs/docs/dev/mellea-library.mdx b/docs/docs/dev/mellea-library.mdx new file mode 100644 index 000000000..050d88c30 --- /dev/null +++ b/docs/docs/dev/mellea-library.mdx @@ -0,0 +1,20 @@ +--- +title: "Mellea should be as close to a library as possible" +description: "Developer notes on Mellea should be as close to a library as possible." +--- + +# Mellea should be as close to a library as possible + +We should make it possible to use mellea as a library (as opposed to a framework). + +In the context of LLM applications, the library vs framework distinction really boils down to how you treat the backend. + +If a piece of software insists on having an exclusive handle on the backend, then that piece of software does nto compose with any other piece of software that also insists on an exclusive handle. They both want to be privileged with respect to the backend, so they cannot "play well" together. The `outlines` library is a good example of software that could've been a library but instead acts like a framework. Even `granite-io` takes on a framework-like role when it decides to actually call the backend, as opposed to operating over strings (or perhaps chat histories). + +Writing LLM libraries is kind of difficult. There is a very strong instinct to try to grab control of the backend. Mellea is no exception. In the "intro path", mellea definitely behaves like a framework. We hide the actual backend objects (`PretrainedModel`, `openai.Client`, etc.) from the user. + +But should try to make it easy for certain parts of mellea to be used as a library. There are many ways in which we could allow mellea to compose with other librares: + +1. We could have a `m.start_session_with_shared_backend(client:openai.Client)` and similarly for local ollama models and transformers models. Everything would work mostly the same after that, except we would have to make much weaker assumptions about the state of the backend (e.g., cache and LoRAs). +2. We could strive to keep the `Formatter` logic completely separate from Backend-specific code, and the legacy model behavior should treat each Component like a standalone user message. This way people could use `mellea` components without using the `mellea` backend and context managemetn code. +3. We could trive to keep the `Cache` strategies agnostic to the rest of the code base, and figure out what their interface should be with respect to various backend sdks (and transformers in particular) \ No newline at end of file diff --git a/docs/docs/dev/mify.mdx b/docs/docs/dev/mify.mdx new file mode 100644 index 000000000..ccf345554 --- /dev/null +++ b/docs/docs/dev/mify.mdx @@ -0,0 +1,78 @@ +--- +title: "mify" +description: "Developer notes on mify." +--- + +# mify + +In classical programming, object-orientation provides a way to couple data and functionality. +Classes have fields and methods. Fields store data and methods operate over that data. + +The mellea library allows you to interface with objects in the same way, but with the added benefit that an LLM can perform operations for you. + +```python +import mellea + +m = mellea.start_session() + + +class Circle: + """A circle is defined by its center and a radius.""" + center_x: float + center_y: float + radius: float + + +c = Circle(1, 0, 1) + +mify(c) + +## .query is used to compute things. +circumference: float = m.query(c, "compute the circumference of the circle", + format=float) + +## .transform is used to create a new class of the same type but mutated. +flipped_circle = m.transform(c, "Mirror the circle across the y axis.") +``` + +Let's consider a slightly more complicated example. + +```python +class Customer: + customer_id: int + name: str + age: int + email_addr: str + employer: str + meeting_notes: List[str] + + def __init__(customer_id: int): + ... + + def send_email(subject: str, body: str): + ... + + def get_meeting_notes() -> List[str]: + ... +``` + +... + +```python +ctx = mellea.SingleShotContext(backend=WatsonX("ibm/granite4")) + +customer = Customer(customer_id=42) +mify(c) + +meetings_summary = m.query(c, "Summarize the last three interactions with this customer.") + +email_body = ctx.instruct("Based upon the summary of notes from recent meetings, write an email body encouraging the customer to purchase three cases of self-sealing stembolts", grouning_context={"meetings_summary": meetings_summary}) + +email_subject = ctx.instruct("Write a subject for this sales email.", grounding_context={"email_body": email_body}) + +customer.execute("send an email.", email_body, email_subject) +``` + +For more examples and information, see +- [Mify Examples](../examples/mify.py) +- [Mify Implementation](../../mellea/stdlib/mify.py) \ No newline at end of file diff --git a/docs/docs/dev/requirement-alora-rerouting.mdx b/docs/docs/dev/requirement-alora-rerouting.mdx new file mode 100644 index 000000000..d5fad43f3 --- /dev/null +++ b/docs/docs/dev/requirement-alora-rerouting.mdx @@ -0,0 +1,76 @@ +--- +title: "Rerouting Requirement Actions in `Backend.generate_*` calls" +description: "Developer notes on Rerouting Requirement Actions in `Backend.generate_*` calls." +--- + +# Rerouting Requirement Actions in `Backend.generate_*` calls + +Backend will often re-route a `generate` call where `action : Requirement` to an ALora. This document explains how and why that happens. + +## The Requirement Rerouting Rule + +## The Simple Rule + +The simplest version of the Requirement Rerouting Rule is: + +> The most specific constraint checking method will be used when validating generic `Requirement`s. + +The actual rule is slightly more complicated. + +## The Actual Rule + +If a `Requirement` is validated using a backend that could either use a `requirement_check` aLoRA or perform an LLMaJ prompt on the underlying model, then the aLoRA is used for validation, even if the `backend.generate_from_context` method is called instead of the `backend._generate_from_intrinsic` method. + +There are three exceptions to this rule: +1. `Backend.default_to_constraint_checking_alora` is set to `False` (this parameter defaults to `True`). +2. The `Requirement` has a more specific subtype that indicates a more specific intent (`LLMaJRequirement`). +3. The `ALoRA` requirement checker throws an exception. + +There is an exception (or disambiguation) to the first exception: If the user provides an `ALoRARequirement`, then the `backend.generate_from_context` call is rerouted to the constraint checking LoRA, regardless of the value of `default_to_constraint_checking_alora`. + +## Decision Rationale + +### Background and Problem Statement + +The `stdlib` has a `Requirement` class whose `validate` behavior is an LLMaJ call. + +Suppose that the user creates a backend and then adds a generic constraint checking aLoRA: + +```python +from mellea import start_session +from mellea.stdlib.requirement import Requirement + +m = start_session( + "huggingface.LocalHFBackend:ibm-granite/granite-3.2-8b-instruct") + +## By default, the AloraRequirement uses a GraniteCommonAdapter with "requirement_check". +m.backend.add_adapter(GraniteCommonAdapter("ibm-granite/rag-intrinsics-lib", "requirement_check", base_model_name="granite-3.2-8b-instruct")) + +m.instruct( + "Corporate wants you to find the difference between these two strings:\n\naaa\naba") +assert m.validate(Requirement( + description="The answer should mention that one of the strings has the letter b while the other doesn't.")) +``` + +Both the underlying model and the aLoRA adapter know how to validate this requirement, so which should be used? + +## Alternatives to the Proposed Rule + +1. Avoid the problem by forcing the user to be more explicit. +2. Respect control flow in the backends/alora mixins, and have the MelleaSession or the user explicitly implement the appropriate control flow. +3. Have the `Requirement.validate` implementation specify whatever control flow is desired for that particular requirement. + +### Advantages + +1. Reduced cognitive load. To first approximation, there is a simple rule that produces unsurprising results. The exceptions are rare and require explicit intervention from the user. If these exceptions are used, the user almost certainly knows exactly what they are doing. +2. Control is retained. If the user wants to specify the precise semantics of their validate call, then they can use the mpore specific `LLMaJRequirement` and `ALoraRequirement` classes. +3. The backend is the one that needs to make the choice about whether to handle KV cache. + + +### Disadvantages + +All backends that implement the aLoRA mixin need to implement this semantics. + + * This might be a blessing in disguise. It's actually not clear that ALora context construction can be done WLOG outside of the specific backend. + * That code is written rarely in any case. + * Depending on the truth of the first bullet point's conjecture, we can mitigate by implementing this routing in `m.validate` so that even if a backend contributor gets this wrong the proper behavior is still usually observed by most users. \ No newline at end of file diff --git a/docs/docs/dev/spans.mdx b/docs/docs/dev/spans.mdx new file mode 100644 index 000000000..e34faef99 --- /dev/null +++ b/docs/docs/dev/spans.mdx @@ -0,0 +1,24 @@ +--- +title: "Design Document for Spans" +description: "Developer notes on Design Document for Spans." +--- + +# Design Document for Spans + +## Span Contexts + +We will introduce a SpanContext which will behave kind of like a heap but with transformer-running-on-GPU memory primitives instead of malloc/realloc/free. The public interface to a SpanContext will roughly correspond to the sort of stuff you can do in Span algebras, if you've seen some of that work. + +## Mapping STDLIB to Spans + +There are two broad philosophies to choose from for Spans. + +### The Span Representation Approach + +All Components and CBlocks get a __span_repr__ which maps the all things to a Span representation. The Component owner is responsible for saying how something gets represented as a Span, and is also responsible for defining caching boundaries (via a cache_boundary tag). + +### The Span Formatter Approach + +There is a Formatter which maps Components and CBlocks to Spans, as a pure function. Similar to how the TemplateFormatter works today. + +We need to document which approach we choose and discuss why it was chosen. \ No newline at end of file diff --git a/docs/docs/dev/tool-calling.mdx b/docs/docs/dev/tool-calling.mdx new file mode 100644 index 000000000..b8c455cd9 --- /dev/null +++ b/docs/docs/dev/tool-calling.mdx @@ -0,0 +1,78 @@ +--- +title: "Tool Calling" +description: "Developer notes on Tool Calling." +--- + +# Tool Calling + +## Problem Statement + +Context management and execution of tool calls are inextricably linked, because most +models expect the output of a tool call to be added to the context at the +moment when the too lcall happens. This means that the `Session` must own the +code that actual performs a tool call. + +This is annoying because *what to do with a tool call* -- or even *how to +implement a tool call* -- is going to vary from application to application. + +We are then faced with two options: + +1. Provide some sort of object protocol for handling tool calls, whereby the + client responsible for tool calling is also responsible for executing a + callback on the session which appropriately modifies the session's context + in light of the tool response; or, +2. Come up with a small number of ways in which a tool may be called, and + expose those in the session. Anyone who wants to do something more complex + must then extend the Session class and implement their own too lcalling + logic. + +## Proposals + + +### Tool Calling Protocol Option + +Basically (2). + +Certain things such as `transform` have a default semantics in the +`MelleaSession` base class. + +For anyone who wants to do free-form tool calling, +there is a `MelleaSessionToolProtocol` mixin which must be inherited from and +implemented. + +### Nothing Fancy Option + +Pass back the `ModelOutputThunk` with tool calls, and do nothing else. + +Note that we already have a `ctx.insert` function, si instead of a mixin with +a protocol, the user is just supposed to know what they are supposed to do and +then use `m.ctx.insert` to implement the relevant logic. + +This is what's done with openai sdk in the status quo anyways. + +### Compromise? + +Can this be implemented such that if you don't specify a tool calling protocol +implementation then the behavior is equivalent to the Nothing Fancy Option? +Probably so. + + +## Final Proposal + +The ModelOutputThunk has a `tools` field where parsed tool calls are surfaced +to the user. This already exists and probably does not need additional +modification. + +1. For certain special tool calling protocols, the Session handles things + automatically for the user. E.g., `m.transform` and `m.query`. We need to + specify the precise semantics for what happens when a user provides tools + in the model_options when using `m.transform` -- probably, you flow through + into the next two cases. +2. If the `Session` has a `SessionToolCallingProtocol` implemented, then the + `def tool_call_result(...)` on that protocol must be called by the user + after a tool is executed. When that method is called, the context is + updated appropriately. We can also provide a `def call_tool(tool)` method + for convienance, which does both the tool call and the context management + for the user. +3. Otherwise, nothing happens. The user is responsible for updating their + context as needed. \ No newline at end of file diff --git a/docs/docs/docs.json b/docs/docs/docs.json new file mode 100644 index 000000000..9ff7304e4 --- /dev/null +++ b/docs/docs/docs.json @@ -0,0 +1,256 @@ +{ + "$schema": "https://leaves.mintlify.com/schema/docs.json", + "theme": "maple", + "name": "Mellea", + "colors": { + "primary": "#000000", + "light": "#c5c5c5", + "dark": "#000000" + }, + "fonts": { + "heading": { + "family": "IBM Plex Sans" + }, + "body": { + "family": "IBM Plex Sans" + } + }, + "favicon": "/images/favicon.svg", + "navigation": { + "tabs": [ + { + "tab": "Docs", + "groups": [ + { + "group": "Introduction", + "pages": [ + "overview/mellea-welcome", + "overview/architecture", + "overview/generative-programming" + ] + }, + { + "group": "Quick Start", + "pages": [ + "overview/overview", + "core-concept/requirements", + "core-concept/instruct-validate-repair", + "core-concept/modeloptions" + ] + }, + { + "group": "Core Concepts", + "pages": [ + "core-concept/generative-slots", + "core-concept/mobjects", + "core-concept/context-management", + "core-concept/agents", + "core-concept/prompt-engineering" + ] + }, + { + "group": "Extending Mellea", + "pages": [ + "core-concept/tuning", + "core-concept/adapters", + "core-concept/alora", + "core-concept/interoperability" + ] + } + ] + }, + { + "tab": "API Reference", + "pages": [ + { + "group": "mellea", + "pages": [ + { + "group": "backends", + "pages": [ + "api/mellea/backends/backend", + "api/mellea/backends/cache", + "api/mellea/backends/dummy", + "api/mellea/backends/huggingface", + "api/mellea/backends/kv_block_helpers", + "api/mellea/backends/litellm", + "api/mellea/backends/model_ids", + "api/mellea/backends/model_options", + "api/mellea/backends/ollama", + "api/mellea/backends/openai", + "api/mellea/backends/tools", + "api/mellea/backends/utils", + "api/mellea/backends/vllm", + "api/mellea/backends/watsonx", + { + "group": "adapters", + "pages": [ + "api/mellea/backends/adapters/adapter", + "api/mellea/backends/adapters/catalog" + ] + } + ] + }, + { + "group": "core", + "pages": [ + "api/mellea/core/backend", + "api/mellea/core/base", + "api/mellea/core/formatter", + "api/mellea/core/requirement", + "api/mellea/core/sampling", + "api/mellea/core/utils" + ] + }, + { + "group": "formatters", + "pages": [ + "api/mellea/formatters/chat_formatter", + "api/mellea/formatters/template_formatter" + ] + }, + { + "group": "helpers", + "pages": [ + "api/mellea/helpers/async_helpers", + "api/mellea/helpers/openai_compatible_helpers" + ] + }, + { + "group": "stdlib", + "pages": [ + "api/mellea/stdlib/context", + "api/mellea/stdlib/functional", + "api/mellea/stdlib/session", + { + "group": "components", + "pages": [ + "api/mellea/stdlib/components/chat", + "api/mellea/stdlib/components/genslot", + "api/mellea/stdlib/components/instruction", + "api/mellea/stdlib/components/mify", + "api/mellea/stdlib/components/mobject", + "api/mellea/stdlib/components/simple", + "api/mellea/stdlib/components/unit_test_eval", + { + "group": "docs", + "pages": [ + "api/mellea/stdlib/components/docs/document", + "api/mellea/stdlib/components/docs/richdocument" + ] + }, + { + "group": "intrinsic", + "pages": [ + "api/mellea/stdlib/components/intrinsic/intrinsic", + "api/mellea/stdlib/components/intrinsic/rag" + ] + } + ] + }, + { + "group": "requirements", + "pages": [ + "api/mellea/stdlib/requirements/md", + "api/mellea/stdlib/requirements/python_reqs", + "api/mellea/stdlib/requirements/requirement", + "api/mellea/stdlib/requirements/tool_reqs", + { + "group": "safety", + "pages": [ + "api/mellea/stdlib/requirements/safety/guardian" + ] + } + ] + }, + { + "group": "sampling", + "pages": [ + "api/mellea/stdlib/sampling/base", + "api/mellea/stdlib/sampling/budget_forcing", + "api/mellea/stdlib/sampling/majority_voting", + "api/mellea/stdlib/sampling/sofai", + { + "group": "sampling_algos", + "pages": [ + "api/mellea/stdlib/sampling/sampling_algos/budget_forcing_alg" + ] + } + ] + }, + { + "group": "tools", + "pages": [ + "api/mellea/stdlib/tools/interpreter" + ] + } + ] + } + ] + }, + { + "group": "cli", + "pages": [ + "api/cli/m", + { + "group": "alora", + "pages": [ + "api/cli/alora/commands", + "api/cli/alora/train", + "api/cli/alora/upload" + ] + }, + { + "group": "decompose", + "pages": [ + "api/cli/decompose/decompose", + "api/cli/decompose/pipeline", + "api/cli/decompose/utils" + ] + }, + { + "group": "eval", + "pages": [ + "api/cli/eval/commands", + "api/cli/eval/runner" + ] + } + ] + } + ] + } + ] + }, + "logo": { + "light": "/logo/logo-light.svg", + "dark": "/logo/logo-dark.svg" + }, + "navbar": { + "primary": { + "type": "button", + "label": "GitHub", + "href": "https://github.com/generative-computing/mellea" + }, + "links": [ + { + "label": "Blog", + "href": "https://generative-computing.github.io/blog/" + }, + { + "label": "Community", + "href": "https://github.com/generative-computing/mellea/discussions" + }, + { + "label": "Contribution Guide", + "href": "/core-concept/contribution-guide" + }, + { + "label": "Support", + "href": "https://github.com/generative-computing/mellea/issues" + } + ] + }, + "search": { + "prompt": "Search documentation..." + } +} diff --git a/docs/docs/images/Gen_Comp_Explainer.png b/docs/docs/images/Gen_Comp_Explainer.png new file mode 100644 index 000000000..d68b61530 Binary files /dev/null and b/docs/docs/images/Gen_Comp_Explainer.png differ diff --git a/docs/docs/images/architecture/architecture.png b/docs/docs/images/architecture/architecture.png new file mode 100644 index 000000000..f13a60971 Binary files /dev/null and b/docs/docs/images/architecture/architecture.png differ diff --git a/docs/docs/images/architecture/complex.png b/docs/docs/images/architecture/complex.png new file mode 100644 index 000000000..ed31ca7ab Binary files /dev/null and b/docs/docs/images/architecture/complex.png differ diff --git a/docs/docs/images/architecture/formatter.png b/docs/docs/images/architecture/formatter.png new file mode 100644 index 000000000..ed11745a1 Binary files /dev/null and b/docs/docs/images/architecture/formatter.png differ diff --git a/docs/docs/images/architecture/instruction_with_requirement_validation.png b/docs/docs/images/architecture/instruction_with_requirement_validation.png new file mode 100644 index 000000000..028ec6d7a Binary files /dev/null and b/docs/docs/images/architecture/instruction_with_requirement_validation.png differ diff --git a/docs/docs/images/architecture/simple.png b/docs/docs/images/architecture/simple.png new file mode 100644 index 000000000..cb591e4cb Binary files /dev/null and b/docs/docs/images/architecture/simple.png differ diff --git a/docs/docs/images/cheat-sheet.png b/docs/docs/images/cheat-sheet.png new file mode 100644 index 000000000..e29e22fad Binary files /dev/null and b/docs/docs/images/cheat-sheet.png differ diff --git a/docs/docs/images/checks-passed.png b/docs/docs/images/checks-passed.png new file mode 100644 index 000000000..3303c7736 Binary files /dev/null and b/docs/docs/images/checks-passed.png differ diff --git a/docs/docs/images/favicon.ico b/docs/docs/images/favicon.ico new file mode 100644 index 000000000..ab4edbbfe Binary files /dev/null and b/docs/docs/images/favicon.ico differ diff --git a/docs/docs/images/favicon.svg b/docs/docs/images/favicon.svg new file mode 100644 index 000000000..f49ae90aa --- /dev/null +++ b/docs/docs/images/favicon.svg @@ -0,0 +1 @@ +bee \ No newline at end of file diff --git a/docs/docs/images/gif/getting-started.gif b/docs/docs/images/gif/getting-started.gif new file mode 100644 index 000000000..ba9e90ab6 Binary files /dev/null and b/docs/docs/images/gif/getting-started.gif differ diff --git a/docs/docs/images/granite-icon.svg b/docs/docs/images/granite-icon.svg new file mode 100644 index 000000000..b541d1f21 --- /dev/null +++ b/docs/docs/images/granite-icon.svg @@ -0,0 +1,17 @@ + + + + + + + + + + + + + + + + + diff --git a/docs/docs/images/granite-pictogram.svg b/docs/docs/images/granite-pictogram.svg new file mode 100644 index 000000000..725f1a9c2 --- /dev/null +++ b/docs/docs/images/granite-pictogram.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/docs/images/hero-dark.png b/docs/docs/images/hero-dark.png new file mode 100644 index 000000000..a61cbb125 Binary files /dev/null and b/docs/docs/images/hero-dark.png differ diff --git a/docs/docs/images/hero-light.png b/docs/docs/images/hero-light.png new file mode 100644 index 000000000..68c712d6d Binary files /dev/null and b/docs/docs/images/hero-light.png differ diff --git a/docs/docs/images/langflow_example.jpg b/docs/docs/images/langflow_example.jpg new file mode 100644 index 000000000..5a3ceeace Binary files /dev/null and b/docs/docs/images/langflow_example.jpg differ diff --git a/docs/docs/images/mellea_draft_logo_300.png b/docs/docs/images/mellea_draft_logo_300.png new file mode 100644 index 000000000..daf5a45a0 Binary files /dev/null and b/docs/docs/images/mellea_draft_logo_300.png differ diff --git a/docs/docs/logo/logo-dark-32.svg b/docs/docs/logo/logo-dark-32.svg new file mode 100644 index 000000000..613b86700 --- /dev/null +++ b/docs/docs/logo/logo-dark-32.svg @@ -0,0 +1,16 @@ + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/docs/logo/logo-dark.svg b/docs/docs/logo/logo-dark.svg new file mode 100644 index 000000000..c03cfc8c3 --- /dev/null +++ b/docs/docs/logo/logo-dark.svg @@ -0,0 +1,16 @@ + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/docs/logo/logo-light-32.svg b/docs/docs/logo/logo-light-32.svg new file mode 100644 index 000000000..ffa1d35cd --- /dev/null +++ b/docs/docs/logo/logo-light-32.svg @@ -0,0 +1,9 @@ + + + + + + + + + \ No newline at end of file diff --git a/docs/docs/logo/logo-light.svg b/docs/docs/logo/logo-light.svg new file mode 100644 index 000000000..c79b9d250 --- /dev/null +++ b/docs/docs/logo/logo-light.svg @@ -0,0 +1,9 @@ + + + + + + + + + \ No newline at end of file diff --git a/docs/docs/overview/architecture.mdx b/docs/docs/overview/architecture.mdx new file mode 100644 index 000000000..7a2635ff2 --- /dev/null +++ b/docs/docs/overview/architecture.mdx @@ -0,0 +1,49 @@ +--- +title: "Overview of the Standard Library" +sidebarTitle: "Standard Library" +--- + +Before going any further, we need to overview the architecture of Mellea. + +Mellea's core abstraction is called a `Component`. A `Component` is a structured object that represents a unit of interaction with an LLM. The Mellea `stdlib` contains a set of useful components, but you can also define your own. We have already seen some components -- `Instruction` and `Requirement` are both `Component`s. + +Components are composite data structures; that is, a `Component` can be made up of many other parts. Each of those parts is either a `CBlock` or another `Component`. `CBlock`s, or "content blocks", are an atomic unit of text or data. CBlocks hold raw text (or sometimes parsed representations) and can be used as leaves in the Component DAG. + +Backends are the engine that actually run the LLM. Backends consume Components, format the Component, pass the formatted input to an LLM, and return model outputs, which are then parsed back into CBlocks or Components. + +During the course of an interaction with an LLM, several Components and CBlocks may be created. Logic for handling this trace of interactions is provided by a `Context` object. Some book-keeping needs to be done in order for Contexts to approporiately handle a trace of Components and CBlocks. The `MelleaSession` class, which is created by `mellea.start_session()`, does this book-keeping a simple wrapper around Contexts and Backends. + +When we call `m.instruct()`, the `MelleaSession.instruct` method creates a component called an `Instruction`. Instructions are part of the Mellea standard library. + +So far we have seen Instructions with descriptions and requirements, but an Instruction can also have in-context learning examples and grounding_context (for RAG): + +```python +class Instruction(Component): + """The Instruction in an instruct/validate/repair loop.""" + + def __init__( + self, + description: str | CBlock | None = None, + requirements: list[Requirement | str] | None = None, + icl_examples: list[str | CBlock] | None = None, + grounding_context: dict[str, str | CBlock | Component] | None = None, + user_variables: dict[str, str] | None = None, + prefix: str | CBlock | None = None, + output_prefix: str | CBlock | None = None, + ): +``` + +The following Cheat Sheet concisely visualizes the relationship between Components/CBlocks, Backends, Contexts, and Sessions. + +TODO INSERT HENDRIK'S CHEAT SHEET + +M's standard library contains four basic types of Components: + +1. [Instructions](#chapter-2-getting-started-with-generative-programming-in-mellea), which we have already seen. +2. [Requirements](#chapter-2-getting-started-with-generative-programming-in-mellea), which we have already seen and will continue to use heavily throughout the remainder of the tutorial. +3. [Generative Slots](#chapter-4-generative-slots), which treat LLM calls as functions. +4. [MObjects](#chapter-5-mobjects), which help with context engineering for tool use by placing tools next to the data that those tools most reasonably operate over. + +This is not an exhaustive list of possible component types. New components can be created as [user libraries or as stdlib contributions](#appendix-contributing-to-m). Where it makes sense, you can also back new components by [fine-tuned models designed especially to work with your Component types](#chapter-6-tuning-requirements-and-components). + +But before getting into these advanced modalities, let's finish our overview of the standard library of Components that ship with Mellea. diff --git a/docs/docs/overview/generative-programming.mdx b/docs/docs/overview/generative-programming.mdx new file mode 100644 index 000000000..73efad3df --- /dev/null +++ b/docs/docs/overview/generative-programming.mdx @@ -0,0 +1,27 @@ +--- +title: "Generative Programming" +description: "Mellea is a library for writing generative programs." +--- + +This tutorial is about Mellea. Mellea helps you write better generative programs. + +A _generative program_ is any computer program that contains calls to an LLM. As we will see throughout the tutorial, LLMs can be incorporated into software in a wide variety of ways. Some ways of incorporating LLMs into programs tend to result in robust and performant systems, while others result in software that is brittle and error-prone. + +Generative programs are distinguished from classical programs by their use of functions that invoke generative models. These generative calls can produce many different data types -- strings, booleans, structured data, code, images/video, and so on. The model(s) and software underlying generative calls can be combined and composed in certain situations and in certain ways (e.g., LoRA adapters as a special case). In addition to invoking generative calls, generative programs can invoke other functions, written in languages that do not have an LLM in their base, so that we can, for example, pass the output of a generative function into a DB retrieval system and feed the output of that into another generator. Writing generative programs is difficult because generative programs interleave deterministic and stochastic operations. + +Requirement verification plays an important role in circumscribing periods of nondeterminism in a generative program. We can implement validators that produce boolean or other outputs, and repeat loops until the validator says yes, or perhaps the iteration count gets too high and we trigger some exception handling process. Thus we can determine the degree of certainty in the output of a generative function and then act based upon the amount of certainty. Verification can happen in a variety of ways -- from querying a generative function, to precise programmatic checks, and a variety of combinations besides. + +In programs that contain long computation paths -- including most that contain iteration or recursion -- incremental accrual of uncertainty is multiplicative, and therefore must itself be occasionally circumscribed by incremental requirement verification throughout the generative program's execution. These incremental checks can be used to establish patterns of variation, or properties which are invariant, both of which can help ensure that the execution converges to a desired state and does not "go wrong". The construction of these incremental checks is one of the important tasks in generative programming, and can itself be treated as a task amenable to generative programming. Like other requirement checks, these variants and invariants may be explicit and programmatic or can be solved via a generative function. In any case, each generative program results in a trace of computations -- some successful, others failures. + +Figuring out what to do about failure paths is yet another crux faced by authors of generative programs. Successful traces can be collected, leading to a final high-confidence result; alternatively, traces with some failures or low-confidence answers can accumulate. Generative programs then try to repair these failed validations. The repair process can be manual, or automated, or offer a combination of user interactions and automated repair mechanisms. As a generative program executes in this way, context accrues. The accrual of ever-larger contexts becomes a challenge unto itself. + +Memory management therefore plays an important role in context engineering. Mellea therefore provides a mechanism for mapping components of KV Cache onto developer and user-facing abstractions, and for automating the construction of context and handling of cached keys and values. + +As the Mellea developers built this library for generative programming, we found some useful principles that you will see re-occur throughout this tutorial: + +- **circumscribe LLM calls with requirement verifiers.** We will see variations on this principle throughout the tutorial. +- **Generative programs should use simple and composable prompting styles.** Mellea takes a middle-ground between the "framework chooses the prompt" and "client code chooses the prompt" paradigms. By keeping prompts small and self-contained, then chaining together many such prompts, we can usually get away with one of a few prompt styles. When a new prompt style is needed, that prompt should be co-designed with the software that will use the prompt. In Mellea, we encourage this by decomposing generative programs into _Components_; more on this in [Chapter 3](#chapter-3-overview-of-the-standard-library). +- **Generative models and infererence-time programs should be co-designed.** Ideally, the style and domain of prompting used at inference time should match the style and domain of prompting using in pretraining, mid-training, and/or post-training. And, similarly, models should be built with runtime components and use-patterns in mind. We will see some early examples of this in [Chapter 6](#chapter-6-tuning-requirements-and-components). +- **Generative programs should carefully manage context.** Each Component manages context of a single call, as we see in Chapters [2](#chapter-2-getting-started-with-generative-programming-in-mellea), [3](#chapter-3-overview-of-the-standard-library), [4](#chapter-4-generative-slots), and [5](#chapter-5-mobjects). Additionally, Mellea provides some useful mechanisms for re-using context across multiple calls ([Chapter 7](#chapter-7-on-context-management)). + +Although good generative programs can be written in any language and framework, getting it right is not trivial. Mellea is just one point in the design space of LLM libraries, but we think it is a good one. Our hope is that Mellea will help you write generative programs that are robust, performant, and fit-for-purpose. diff --git a/docs/docs/overview/mellea-welcome.mdx b/docs/docs/overview/mellea-welcome.mdx new file mode 100644 index 000000000..017d0f2a7 --- /dev/null +++ b/docs/docs/overview/mellea-welcome.mdx @@ -0,0 +1,27 @@ +--- +title: "Welcome" +description: "Mellea is a library for writing generative programs." +--- + + +Welcome! This project takes us **back to the future** of computing by formally introducing the concept of **generative programs**β€”software systems that strategically integrate calls to Large Language Models (LLMs)β€”and the demanding engineering required to make them reliable. The fundamental challenge we address is how to safely and predictably harness the powerful but inherently **stochastic** operations of LLMs within traditionally deterministic codebases. This documentation establishes a rigorous framework, emphasizing core techniques like **requirement verification** to circumscribe periods of non-determinism, mechanisms for repairing **failure traces**, and advanced **context management**. Ultimately, this work outlines essential principles and architectural patterns needed to construct robust, high-confidence generative software that effectively merges the capabilities of LLMs with reliable computational predictability. + +But let's get started! Choose your path: + + + + Set up your project with our quickstart guide. + + + Browse through some examples (on Github) + + + Explore endpoints, parameters, and examples for our API. + + + Read more about the ideas of Generative Programming + + + + + diff --git a/docs/docs/overview/overview.mdx b/docs/docs/overview/overview.mdx new file mode 100644 index 000000000..8fbd60deb --- /dev/null +++ b/docs/docs/overview/overview.mdx @@ -0,0 +1,148 @@ +--- +title: "Overview" +description: "Get up and running with Mellea" +--- + +Before we get started, you will need to download and install [ollama](https://ollama.com/). Mellea can work with many different types of backends, but everything in this tutorial will "just work" on a Macbook running IBM's Granite 4 Micro 3B model. + +We also recommend that you download and install [uv](https://docs.astral.sh/uv/#installation). You can run any of the examples in the tutorial with: + +```bash +uv run example_name.py --with mellea +``` + + + +If running on an Intel mac, you may get errors related to torch/torchvision versions. Conda maintains updated versions of these packages. You will need to create a conda environment and run `conda install 'torchvision>=0.22.0'` (this should also install pytorch and torchvision-extra). Then, you should be able to run `uv pip install mellea`. To run the examples, you will need to use `python ` inside the conda environment instead of `uv run --with mellea `. + + + + + +If you are using python >= 3.13, you may encounter an issue where outlines cannot be installed due to rust compiler issues (`error: can't find Rust compiler`). You can either downgrade to python 3.12 or install the [rust compiler](https://www.rust-lang.org/tools/install) to build the wheel for outlines locally. + + + +Once you have ollama installed and running, we can get started with our first generative piece of code: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/simple_email.py#L1-L8 +import mellea + +## INFO: this line will download IBM's Granite 4 Micro 3B model. +m = mellea.start_session() + +email = m.instruct("Write an email inviting interns to an office party at 3:30pm.") +print(str(email)) +``` + +Here, we initialized a backend running Ollama on a local machine using the granite3.3-chat model. +We then ask the model to generate an email and print it to the console. + + + +Mellea supports many other models and backends. By default, a new Mellea session will run IBM's capable Granite 8B model on your own laptop. This is a good (and free!) way to get started. If you would like to try out other models or backends, you can explicitly specify the backend and model in the start_session method. For example, `mellea.start_session(backend_name="ollama", model_id=mellea.model_ids.IBM_GRANITE_3_3_8B)`. + + + +Before continuing, let's wrap this call into a function with some arguments: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/simple_email.py#L13-L27 +import mellea + +def write_email(m: mellea.MelleaSession, name: str, notes: str) -> str: + email = m.instruct( + "Write an email to {{name}} using the notes following: {{notes}}.", + user_variables={"name": name, "notes": notes}, + ) + return email.value # str(email) also works. + +m = mellea.start_session() +print(write_email(m, "Olivia", + "Olivia helped the lab over the last few weeks by organizing intern events, advertising the speaker series, and handling issues with snack delivery.")) +``` + +Voila, we now have an email-writing function! + +Notice how the instruct method can take a dictionary of variables as `user_variables`. These are filled by treating the instruction description as a jinja template. + +The `m.instruct()` function returns a `ModelOutputThunk` per default, which has the model output string bound to the field `.value`. + +# + +## ModelOptions + +Most LLM apis allow you to specify options to modify the request: temperature, max_tokens, seed, etc... Mellea supports specifying these options during backend initialization and when calling session-level functions with the `model_options` parameter. + +Mellea supports many different types of inference engines (ollama, openai-compatible vllm, huggingface, etc.). These inference engines, which we call `Backend`s, provide different and sometimes inconsistent dict keysets for specifying model options. For the most common options among model providers, Mellea provides some engine-agnostic options, which can be used by typing [`ModelOption.`](../mellea/backends/types.py) in your favorite IDE; for example, temperature can be specified as `{"{ModelOption.TEMPERATURE": 0}` and this will "just work" across all inference engines. + +You can add any key-value pair supported by the backend to the `model_options` dictionary, and those options will be passed along to the inference engine \*even if a Mellea-specific `ModelOption.` is defined for that option. This means you can safely copy over model option parameters from exiting codebases as-is: + +```python +## file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/model_options_example.py#L1-L16 +import mellea +from mellea.backends.types import ModelOption +from mellea.backends.ollama import OllamaModelBackend +from mellea.backends import model_ids + +m = mellea.MelleaSession(backend=OllamaModelBackend( + model_id=model_ids.IBM_GRANITE_3_2_8B, + model_options={ModelOption.SEED: 42} +)) + +answer = m.instruct( + "What is 2x2?", + model_options={ + "temperature": 0.5, + "num_predict": 5, + }, +) + +print(str(answer)) +``` + +You can always update the model options of a given backend; however, Mellea offers a few additional approaches to changing the specified options. + +1. **Specifying options during `m.*` calls**. Options specified here will update the model options previously specified for that call only. If you specify an already existing key (with either the `ModelOption.OPTION` version or the native name for that option for the given api), the value will be the one associated with the new key. If you specify the same key in different ways (ie `ModelOption.TEMPERATURE` and `temperature`), the `ModelOption.OPTION` key will take precedence. + +```python +## options passed during backend initialization +backend_model_options = { + "seed": "1", + ModelOption.MAX_NEW_TOKENS: 1, + "temperature": 1, +} + +## options passed during m.* +instruct_model_options = { + "seed": "2", + ModelOption.SEED: "3", + "num_predict": 2, +} + +## options passed to the model provider API +final_options = { + "temperature": 1, + "seed": 3, + "num_predict": 2 +} +``` + +2. **Pushing and popping model state**. Sessions offer the ability to push and pop model state. This means you can temporarily change the `model_options` for a series of calls by pushing a new set of `model_options` and then revert those changes with a pop. + +#### System Messages + +In Mellea, `ModelOption.SYSTEM_PROMPT` is the recommended way to add/change the system message for a prompt. Setting it at the backend/session level will use the provided message as the system prompt for all future calls (just like any other model option). Similarly, you can specify the system prompt parameter for any session-level function (like `m.instruct`) to replace it for just that call. + +Mellea recommends applying the system message this way because some model-provider apis don't properly serialize messages with the `system` role and expect them as a separate parameter. + +### Conclusion + +We have now worked up from a simple "Hello, World" example to our first generative programming design pattern: **Instruct - Validate - Repair (IVR)**. + +When LLMs work well, the software developer experiences the LLM as a sort of oracle that can handle most any input and produce a sufficiently desirable output. When LLMs do not work at all, the software developer experiences the LLM as a naive markov chain that produces junk. In both cases, the LLM is just sampling from a distribution. + +The crux of generative programming is that most applications find themselves somewhere in-between these two extremes -- the LLM mostly works, enough to demo a tantilizing MVP. But failure modes are common enough and severe enough that complete automation is beyond the developer's grasp. + +Traditional software deals with failure modes by carefully describing what can go wrong and then providing precise error handling logic. When working with LLMs, however, this approach suffers a Sysiphean curse. There is always one more failure mode, one more special case, one more new feature request. In the next chapter, we will explore how to build generative programs that are compositional and that grow gracefully. diff --git a/docs/docs/snippets/Divider.mdx b/docs/docs/snippets/Divider.mdx new file mode 100644 index 000000000..f2e6a5a08 --- /dev/null +++ b/docs/docs/snippets/Divider.mdx @@ -0,0 +1,3 @@ +export const Divider = () => ( +
+); diff --git a/docs/docs/snippets/SidebarFix.mdx b/docs/docs/snippets/SidebarFix.mdx new file mode 100644 index 000000000..ddb73710c --- /dev/null +++ b/docs/docs/snippets/SidebarFix.mdx @@ -0,0 +1,89 @@ +export const SidebarFix = () => ( +