Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion client/go/kserve-api/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34
RUN go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.4.0

# Compile API
RUN wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/src/kfserving_api/grpc_predict_v2.proto
RUN wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/src/kfserving_api/grpc_predict_v2.proto
RUN echo 'option go_package = "./grpc-client";' >> grpc_predict_v2.proto
RUN protoc --go_out="./" --go-grpc_out="./" ./grpc_predict_v2.proto

Expand Down
2 changes: 1 addition & 1 deletion client/java/kserve-api/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
</goals>
<configuration>
<url>
https://raw.githubusercontent.com/openvinotoolkit/model_server/main/src/kfserving_api/grpc_predict_v2.proto</url>
https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/src/kfserving_api/grpc_predict_v2.proto</url>
<outputFileName>grpc_predict_v2.proto</outputFileName>
<outputDirectory>src/main/proto</outputDirectory>
</configuration>
Expand Down
14 changes: 7 additions & 7 deletions demos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ OpenVINO Model Server demos have been created to showcase the usage of the model
|[VLM Text Generation with continuous batching](continuous_batching/vlm/README.md)|Generate text with VLM models and continuous batching pipeline|
|[OpenAI API text embeddings ](embeddings/README.md)|Get text embeddings via endpoint compatible with OpenAI API|
|[Reranking with Cohere API](rerank/README.md)| Rerank documents via endpoint compatible with Cohere|
|[RAG with OpenAI API endpoint and langchain](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/rag/rag_demo.ipynb)| Example how to use RAG with model server endpoints|
|[RAG with OpenAI API endpoint and langchain](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/rag/rag_demo.ipynb)| Example how to use RAG with model server endpoints|
|[LLM on NPU](./llm_npu/README.md)| Generate text with LLM models and NPU acceleration|
|[VLM on NPU](./vlm_npu/README.md)| Generate text with VLM models and NPU acceleration|
|[Long context LLMs](./continuous_batching/long_context/README.md)| Recommendations for handling very long context in LLM models|
Expand All @@ -67,7 +67,7 @@ Check out the list below to see complete step-by-step examples of using OpenVINO
| Demo | Description |
|---|---|
|[Image Classification](image_classification/python/README.md)|Run prediction on a JPEG image using image classification model via gRPC API.|
|[Using ONNX Model](using_onnx_model/python/README.md)|Run prediction on a JPEG image using image classification ONNX model via gRPC API in two preprocessing variants. This demo uses [pipeline](../docs/dag_scheduler.md) with [image_transformation custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/image_transformation). |
|[Using ONNX Model](using_onnx_model/python/README.md)|Run prediction on a JPEG image using image classification ONNX model via gRPC API in two preprocessing variants. This demo uses [pipeline](../docs/dag_scheduler.md) with [image_transformation custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/image_transformation). |
|[Using TensorFlow Model](image_classification_using_tf_model/python/README.md)|Run image classification using directly imported TensorFlow model. |
|[Age gender recognition](age_gender_recognition/python/README.md) | Run prediction on a JPEG image using age gender recognition model via gRPC API.|
|[Face Detection](face_detection/python/README.md)|Run prediction on a JPEG image using face detection model via gRPC API.|
Expand Down Expand Up @@ -95,13 +95,13 @@ Check out the list below to see complete step-by-step examples of using OpenVINO
## With DAG Pipelines
| Demo | Description |
|---|---|
|[Horizontal Text Detection in Real-Time](horizontal_text_detection/python/README.md) | Run prediction on camera stream using a horizontal text detection model via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [horizontal_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/horizontal_ocr) and [demultiplexer](../docs/demultiplexing.md). |
|[Optical Character Recognition Pipeline](optical_character_recognition/python/README.md) | Run prediction on a JPEG image using a pipeline of text recognition and text detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [east_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/east_ocr) and [demultiplexer](../docs/demultiplexing.md). |
|[Horizontal Text Detection in Real-Time](horizontal_text_detection/python/README.md) | Run prediction on camera stream using a horizontal text detection model via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [horizontal_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/horizontal_ocr) and [demultiplexer](../docs/demultiplexing.md). |
|[Optical Character Recognition Pipeline](optical_character_recognition/python/README.md) | Run prediction on a JPEG image using a pipeline of text recognition and text detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [east_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/east_ocr) and [demultiplexer](../docs/demultiplexing.md). |
|[Single Face Analysis Pipeline](single_face_analysis_pipeline/python/README.md)|Run prediction on a JPEG image using a simple pipeline of age-gender recognition and emotion recognition models via gRPC API to analyze image with a single face. This demo uses [pipeline](../docs/dag_scheduler.md) |
|[Multi Faces Analysis Pipeline](multi_faces_analysis_pipeline/python/README.md)|Run prediction on a JPEG image using a pipeline of age-gender recognition and emotion recognition models via gRPC API to extract multiple faces from the image and analyze all of them. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection) and [demultiplexer](../docs/demultiplexing.md) |
|[Multi Faces Analysis Pipeline](multi_faces_analysis_pipeline/python/README.md)|Run prediction on a JPEG image using a pipeline of age-gender recognition and emotion recognition models via gRPC API to extract multiple faces from the image and analyze all of them. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection) and [demultiplexer](../docs/demultiplexing.md) |
|[Model Ensemble Pipeline](model_ensemble/python/README.md)|Combine multiple image classification models into one [pipeline](../docs/dag_scheduler.md) and aggregate results to improve classification accuracy. |
|[Face Blur Pipeline](face_blur/python/README.md)|Detect faces and blur image using a pipeline of object detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [face_blur custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/face_blur). |
|[Vehicle Analysis Pipeline](vehicle_analysis_pipeline/python/README.md)|Detect vehicles and recognize their attributes using a pipeline of vehicle detection and vehicle attributes recognition models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection). |
|[Face Blur Pipeline](face_blur/python/README.md)|Detect faces and blur image using a pipeline of object detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [face_blur custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/face_blur). |
|[Vehicle Analysis Pipeline](vehicle_analysis_pipeline/python/README.md)|Detect vehicles and recognize their attributes using a pipeline of vehicle detection and vehicle attributes recognition models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection). |

## With C++ Client
| Demo | Description |
Expand Down
2 changes: 1 addition & 1 deletion demos/age_gender_recognition/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Install python dependencies:
```console
pip3 install -r requirements.txt
```
Run [age_gender_recognition.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/age_gender_recognition/python/age_gender_recognition.py) script to make an inference:
Run [age_gender_recognition.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/age_gender_recognition/python/age_gender_recognition.py) script to make an inference:
```console
python age_gender_recognition.py --image_input_path age-gender-recognition-retail-0001.jpg --rest_port 8000
```
Expand Down
12 changes: 6 additions & 6 deletions demos/audio/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ Check supported [Speech Recognition Models](https://openvinotoolkit.github.io/op
### Prepare speaker embeddings
When generating speech you can use default speaker voice or you can prepare your own speaker embedding file. Here you can see how to do it with downloaded file from online repository, but you can try with your own speech recording as well:
```bash
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/requirements.txt
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/audio/requirements.txt
mkdir -p audio_samples
curl --output audio_samples/audio.wav "https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0032_8k.wav"
mkdir -p models
mkdir -p models/speakers
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/create_speaker_embedding.py -o create_speaker_embedding.py
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/audio/create_speaker_embedding.py -o create_speaker_embedding.py
python create_speaker_embedding.py audio_samples/audio.wav models/speakers/voice1.bin
```

Expand All @@ -41,8 +41,8 @@ Execution parameters will be defined inside the `graph.pbtxt` file.

Download export script, install it's dependencies and create directory for the models:
```console
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```

Expand Down Expand Up @@ -180,8 +180,8 @@ Execution parameters will be defined inside the `graph.pbtxt` file.

Download export script, install it's dependencies and create directory for the models:
```console
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```

Expand Down
2 changes: 1 addition & 1 deletion demos/benchmark/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,4 +379,4 @@ docker run -v ${PWD}/workspace:/workspace --network host benchmark_client -a loc
```
Many other client options together with benchmarking examples are presented in
[an additional PDF document](https://github.com/openvinotoolkit/model_server/blob/main/docs/python-benchmarking-client-16feb.pdf).
[an additional PDF document](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/python-benchmarking-client-16feb.pdf).
2 changes: 1 addition & 1 deletion demos/bert_question_answering/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

This document demonstrates how to run inference requests for [BERT model](https://github.com/openvinotoolkit/open_model_zoo/tree/2022.1.0/models/intel/bert-small-uncased-whole-word-masking-squad-int8-0002) with OpenVINO Model Server. It provides questions answering functionality.

In this example docker container with [bert-client image](https://github.com/openvinotoolkit/model_server/blob/main/demos/bert_question_answering/python/Dockerfile) runs the script [bert_question_answering.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/bert_question_answering/python/bert_question_answering.py). It runs inference request for each paragraph on a given page in order to answer the provided question. Since each paragraph can have different size the functionality of dynamic shape is used.
In this example docker container with [bert-client image](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/bert_question_answering/python/Dockerfile) runs the script [bert_question_answering.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/bert_question_answering/python/bert_question_answering.py). It runs inference request for each paragraph on a given page in order to answer the provided question. Since each paragraph can have different size the functionality of dynamic shape is used.

NOTE: With `min_request_token_num` parameter you can specify the minimum size of the request. If the paragraph has too short, it is concatenated with the next one until it has required length. When there is no paragraphs left to concatenate request is created with the remaining content.

Expand Down
2 changes: 1 addition & 1 deletion demos/code_local_assistant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Models which are not published in OpenVINO format can be exported and quantized
```
mkdir models
python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU
curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja
curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/extras/chat_template_examples/chat_template_devstral.jinja

ovms --model_repository_path models --source_model unsloth/Devstral-Small-2507 --task text_generation --target_device GPU --tool_parser devstral --rest_port 8000 --cache_dir .ovcache
```
Expand Down
4 changes: 2 additions & 2 deletions demos/continuous_batching/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,11 +271,11 @@ Check the demo [AI agent with MCP server and OpenVINO acceleration](./agentic_ai

The service deployed above can be used in RAG chain using `langchain` library with OpenAI endpoint as the LLM engine.

Check the example in the [RAG notebook](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/rag/rag_demo.ipynb)
Check the example in the [RAG notebook](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/rag/rag_demo.ipynb)

## Scaling the Model Server

Check this simple [text generation scaling demo](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/scaling/README.md).
Check this simple [text generation scaling demo](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/scaling/README.md).

## Use Speculative Decoding

Expand Down
2 changes: 1 addition & 1 deletion demos/continuous_batching/accuracy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Use [Berkeley function call leaderboard ](https://github.com/ShishirPatil/gorill
git clone https://github.com/ShishirPatil/gorilla
cd gorilla/berkeley-function-call-leaderboard
git checkout 9b8a5202544f49a846aced185a340361231ef3e1
curl -s https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/continuous_batching/accuracy/gorilla.patch | git apply -v
curl -s https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/continuous_batching/accuracy/gorilla.patch | git apply -v
pip install -e . --extra-index-url "https://download.pytorch.org/whl/cpu"
```
The commands below assumes the models is deployed with the name `ovms-model`. It must match the name set in the `bfcl_eval/constants/model_config.py`.
Expand Down
Loading