diff --git a/README.md b/README.md index 651f707..fcbbb37 100644 --- a/README.md +++ b/README.md @@ -8,9 +8,9 @@ A web application for the ensemble is available at https://chebifier.hastingslab Not all models can be installed automatically at the moment: - `chebai-graph` and its dependencies. To install them, follow -the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/python-chebai-graph). +the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/python-chebai-graph). - `chemlog-extra` can be installed with `pip install git+https://github.com/ChEB-AI/chemlog-extra.git` -- The automatically installed version of `c3p` may not work under Windows. If you want to run chebifier on Windows, we +- The automatically installed version of `c3p` may not work under Windows. If you want to run chebifier on Windows, we recommend using this forked version: `pip install git+https://github.com/sfluegel05/c3p.git` @@ -38,11 +38,26 @@ The package provides a command-line interface (CLI) for making predictions using The ensemble configuration is given by a configuration file (by default, this is `chebifier/ensemble.yml`). If you want to change which models are included in the ensemble or how they are weighted, you can create your own configuration file. -Model weights for deep learning models are downloaded automatically from [Hugging Face](https://huggingface.co/chebai). +Model weights for deep learning models are automatically downloaded from [Hugging Face](https://huggingface.co/chebai). +To use specific model weights from Hugging face, add the `load_model` key in your configuration file. For example: + +```yaml +my_electra: + type: electra + load_model: "electra_chebi50_v241" +``` + +### Available model weights: + +* `electra_chebi50_v241` +* `resgated_chebi50_v241` +* `c3p_with_weights` + + However, you can also supply your own model checkpoints (see `configs/example_config.yml` for an example). ```bash -# Make predictions +# Make predictions python -m chebifier predict --smiles "CC(=O)OC1=CC=CC=C1C(=O)O" --smiles "C1=CC=C(C=C1)C(=O)O" # Make predictions using SMILES from a file @@ -96,7 +111,7 @@ Currently, the following models are supported: | `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470) | [c3p](https://github.com/chemkg/c3p) | In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class -matched by a SMILES string. This is not activated by default, but can be included by adding +matched by a SMILES string. This is not activated by default, but can be included by adding ```yaml chebi_lookup: type: chebi_lookup @@ -109,7 +124,7 @@ to your configuration file. Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows: 1. Get predictions from each model $m_i$ for the sample. -2. For each class $c$, aggregate predictions $p_c^{m_i}$ from all models that made a prediction for that class. +2. For each class $c$, aggregate predictions $p_c^{m_i}$ from all models that made a prediction for that class. The aggregation happens separately for all positive predictions (i.e., $p_c^{m_i} \geq 0.5$) and all negative predictions ($p_c^{m_i} < 0.5$). If the aggregated value is larger for the positive predictions than for the negative predictions, the ensemble makes a positive prediction for class $c$: @@ -117,7 +132,7 @@ the ensemble makes a positive prediction for class $c$: image