diff --git a/client/go/kserve-api/Dockerfile b/client/go/kserve-api/Dockerfile
index 6679cbda29..085668c6c2 100644
--- a/client/go/kserve-api/Dockerfile
+++ b/client/go/kserve-api/Dockerfile
@@ -26,7 +26,7 @@ RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34
RUN go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.4.0
# Compile API
-RUN wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/src/kfserving_api/grpc_predict_v2.proto
+RUN wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/src/kfserving_api/grpc_predict_v2.proto
RUN echo 'option go_package = "./grpc-client";' >> grpc_predict_v2.proto
RUN protoc --go_out="./" --go-grpc_out="./" ./grpc_predict_v2.proto
diff --git a/client/java/kserve-api/pom.xml b/client/java/kserve-api/pom.xml
index 189dbb8da5..1a3bcfcbb5 100644
--- a/client/java/kserve-api/pom.xml
+++ b/client/java/kserve-api/pom.xml
@@ -84,7 +84,7 @@
- https://raw.githubusercontent.com/openvinotoolkit/model_server/main/src/kfserving_api/grpc_predict_v2.proto
+ https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/src/kfserving_api/grpc_predict_v2.proto
grpc_predict_v2.proto
src/main/proto
diff --git a/demos/README.md b/demos/README.md
index c8a808fa1c..9dda623a8e 100644
--- a/demos/README.md
+++ b/demos/README.md
@@ -52,7 +52,7 @@ OpenVINO Model Server demos have been created to showcase the usage of the model
|[VLM Text Generation with continuous batching](continuous_batching/vlm/README.md)|Generate text with VLM models and continuous batching pipeline|
|[OpenAI API text embeddings ](embeddings/README.md)|Get text embeddings via endpoint compatible with OpenAI API|
|[Reranking with Cohere API](rerank/README.md)| Rerank documents via endpoint compatible with Cohere|
-|[RAG with OpenAI API endpoint and langchain](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/rag/rag_demo.ipynb)| Example how to use RAG with model server endpoints|
+|[RAG with OpenAI API endpoint and langchain](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/rag/rag_demo.ipynb)| Example how to use RAG with model server endpoints|
|[LLM on NPU](./llm_npu/README.md)| Generate text with LLM models and NPU acceleration|
|[VLM on NPU](./vlm_npu/README.md)| Generate text with VLM models and NPU acceleration|
|[Long context LLMs](./continuous_batching/long_context/README.md)| Recommendations for handling very long context in LLM models|
@@ -67,7 +67,7 @@ Check out the list below to see complete step-by-step examples of using OpenVINO
| Demo | Description |
|---|---|
|[Image Classification](image_classification/python/README.md)|Run prediction on a JPEG image using image classification model via gRPC API.|
-|[Using ONNX Model](using_onnx_model/python/README.md)|Run prediction on a JPEG image using image classification ONNX model via gRPC API in two preprocessing variants. This demo uses [pipeline](../docs/dag_scheduler.md) with [image_transformation custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/image_transformation). |
+|[Using ONNX Model](using_onnx_model/python/README.md)|Run prediction on a JPEG image using image classification ONNX model via gRPC API in two preprocessing variants. This demo uses [pipeline](../docs/dag_scheduler.md) with [image_transformation custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/image_transformation). |
|[Using TensorFlow Model](image_classification_using_tf_model/python/README.md)|Run image classification using directly imported TensorFlow model. |
|[Age gender recognition](age_gender_recognition/python/README.md) | Run prediction on a JPEG image using age gender recognition model via gRPC API.|
|[Face Detection](face_detection/python/README.md)|Run prediction on a JPEG image using face detection model via gRPC API.|
@@ -95,13 +95,13 @@ Check out the list below to see complete step-by-step examples of using OpenVINO
## With DAG Pipelines
| Demo | Description |
|---|---|
-|[Horizontal Text Detection in Real-Time](horizontal_text_detection/python/README.md) | Run prediction on camera stream using a horizontal text detection model via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [horizontal_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/horizontal_ocr) and [demultiplexer](../docs/demultiplexing.md). |
-|[Optical Character Recognition Pipeline](optical_character_recognition/python/README.md) | Run prediction on a JPEG image using a pipeline of text recognition and text detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [east_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/east_ocr) and [demultiplexer](../docs/demultiplexing.md). |
+|[Horizontal Text Detection in Real-Time](horizontal_text_detection/python/README.md) | Run prediction on camera stream using a horizontal text detection model via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [horizontal_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/horizontal_ocr) and [demultiplexer](../docs/demultiplexing.md). |
+|[Optical Character Recognition Pipeline](optical_character_recognition/python/README.md) | Run prediction on a JPEG image using a pipeline of text recognition and text detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [east_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/east_ocr) and [demultiplexer](../docs/demultiplexing.md). |
|[Single Face Analysis Pipeline](single_face_analysis_pipeline/python/README.md)|Run prediction on a JPEG image using a simple pipeline of age-gender recognition and emotion recognition models via gRPC API to analyze image with a single face. This demo uses [pipeline](../docs/dag_scheduler.md) |
-|[Multi Faces Analysis Pipeline](multi_faces_analysis_pipeline/python/README.md)|Run prediction on a JPEG image using a pipeline of age-gender recognition and emotion recognition models via gRPC API to extract multiple faces from the image and analyze all of them. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection) and [demultiplexer](../docs/demultiplexing.md) |
+|[Multi Faces Analysis Pipeline](multi_faces_analysis_pipeline/python/README.md)|Run prediction on a JPEG image using a pipeline of age-gender recognition and emotion recognition models via gRPC API to extract multiple faces from the image and analyze all of them. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection) and [demultiplexer](../docs/demultiplexing.md) |
|[Model Ensemble Pipeline](model_ensemble/python/README.md)|Combine multiple image classification models into one [pipeline](../docs/dag_scheduler.md) and aggregate results to improve classification accuracy. |
-|[Face Blur Pipeline](face_blur/python/README.md)|Detect faces and blur image using a pipeline of object detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [face_blur custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/face_blur). |
-|[Vehicle Analysis Pipeline](vehicle_analysis_pipeline/python/README.md)|Detect vehicles and recognize their attributes using a pipeline of vehicle detection and vehicle attributes recognition models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection). |
+|[Face Blur Pipeline](face_blur/python/README.md)|Detect faces and blur image using a pipeline of object detection models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [face_blur custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/face_blur). |
+|[Vehicle Analysis Pipeline](vehicle_analysis_pipeline/python/README.md)|Detect vehicles and recognize their attributes using a pipeline of vehicle detection and vehicle attributes recognition models with a custom node for intermediate results processing via gRPC API. This demo uses [pipeline](../docs/dag_scheduler.md) with [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection). |
## With C++ Client
| Demo | Description |
diff --git a/demos/age_gender_recognition/python/README.md b/demos/age_gender_recognition/python/README.md
index e3d0929e00..594019de27 100644
--- a/demos/age_gender_recognition/python/README.md
+++ b/demos/age_gender_recognition/python/README.md
@@ -53,7 +53,7 @@ Install python dependencies:
```console
pip3 install -r requirements.txt
```
-Run [age_gender_recognition.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/age_gender_recognition/python/age_gender_recognition.py) script to make an inference:
+Run [age_gender_recognition.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/age_gender_recognition/python/age_gender_recognition.py) script to make an inference:
```console
python age_gender_recognition.py --image_input_path age-gender-recognition-retail-0001.jpg --rest_port 8000
```
diff --git a/demos/audio/README.md b/demos/audio/README.md
index 9e49fc0ddc..ffb2672a6a 100644
--- a/demos/audio/README.md
+++ b/demos/audio/README.md
@@ -19,12 +19,12 @@ Check supported [Speech Recognition Models](https://openvinotoolkit.github.io/op
### Prepare speaker embeddings
When generating speech you can use default speaker voice or you can prepare your own speaker embedding file. Here you can see how to do it with downloaded file from online repository, but you can try with your own speech recording as well:
```bash
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/requirements.txt
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/audio/requirements.txt
mkdir -p audio_samples
curl --output audio_samples/audio.wav "https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0032_8k.wav"
mkdir -p models
mkdir -p models/speakers
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/create_speaker_embedding.py -o create_speaker_embedding.py
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/audio/create_speaker_embedding.py -o create_speaker_embedding.py
python create_speaker_embedding.py audio_samples/audio.wav models/speakers/voice1.bin
```
@@ -41,8 +41,8 @@ Execution parameters will be defined inside the `graph.pbtxt` file.
Download export script, install it's dependencies and create directory for the models:
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
@@ -180,8 +180,8 @@ Execution parameters will be defined inside the `graph.pbtxt` file.
Download export script, install it's dependencies and create directory for the models:
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
diff --git a/demos/benchmark/python/README.md b/demos/benchmark/python/README.md
index 09e6ffddda..d60efd7340 100644
--- a/demos/benchmark/python/README.md
+++ b/demos/benchmark/python/README.md
@@ -379,4 +379,4 @@ docker run -v ${PWD}/workspace:/workspace --network host benchmark_client -a loc
```
Many other client options together with benchmarking examples are presented in
-[an additional PDF document](https://github.com/openvinotoolkit/model_server/blob/main/docs/python-benchmarking-client-16feb.pdf).
+[an additional PDF document](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/python-benchmarking-client-16feb.pdf).
diff --git a/demos/bert_question_answering/python/README.md b/demos/bert_question_answering/python/README.md
index 8f51fe7136..ab5cba8046 100644
--- a/demos/bert_question_answering/python/README.md
+++ b/demos/bert_question_answering/python/README.md
@@ -4,7 +4,7 @@
This document demonstrates how to run inference requests for [BERT model](https://github.com/openvinotoolkit/open_model_zoo/tree/2022.1.0/models/intel/bert-small-uncased-whole-word-masking-squad-int8-0002) with OpenVINO Model Server. It provides questions answering functionality.
-In this example docker container with [bert-client image](https://github.com/openvinotoolkit/model_server/blob/main/demos/bert_question_answering/python/Dockerfile) runs the script [bert_question_answering.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/bert_question_answering/python/bert_question_answering.py). It runs inference request for each paragraph on a given page in order to answer the provided question. Since each paragraph can have different size the functionality of dynamic shape is used.
+In this example docker container with [bert-client image](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/bert_question_answering/python/Dockerfile) runs the script [bert_question_answering.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/bert_question_answering/python/bert_question_answering.py). It runs inference request for each paragraph on a given page in order to answer the provided question. Since each paragraph can have different size the functionality of dynamic shape is used.
NOTE: With `min_request_token_num` parameter you can specify the minimum size of the request. If the paragraph has too short, it is concatenated with the next one until it has required length. When there is no paragraphs left to concatenate request is created with the remaining content.
diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md
index 9479818eaa..5a837f32e6 100644
--- a/demos/code_local_assistant/README.md
+++ b/demos/code_local_assistant/README.md
@@ -123,7 +123,7 @@ Models which are not published in OpenVINO format can be exported and quantized
```
mkdir models
python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU
-curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja
+curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/extras/chat_template_examples/chat_template_devstral.jinja
ovms --model_repository_path models --source_model unsloth/Devstral-Small-2507 --task text_generation --target_device GPU --tool_parser devstral --rest_port 8000 --cache_dir .ovcache
```
diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md
index ab8c7be951..eeb8dfcae6 100644
--- a/demos/continuous_batching/README.md
+++ b/demos/continuous_batching/README.md
@@ -271,11 +271,11 @@ Check the demo [AI agent with MCP server and OpenVINO acceleration](./agentic_ai
The service deployed above can be used in RAG chain using `langchain` library with OpenAI endpoint as the LLM engine.
-Check the example in the [RAG notebook](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/rag/rag_demo.ipynb)
+Check the example in the [RAG notebook](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/rag/rag_demo.ipynb)
## Scaling the Model Server
-Check this simple [text generation scaling demo](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/scaling/README.md).
+Check this simple [text generation scaling demo](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/scaling/README.md).
## Use Speculative Decoding
diff --git a/demos/continuous_batching/accuracy/README.md b/demos/continuous_batching/accuracy/README.md
index 7625b7d874..0c465a649f 100644
--- a/demos/continuous_batching/accuracy/README.md
+++ b/demos/continuous_batching/accuracy/README.md
@@ -115,7 +115,7 @@ Use [Berkeley function call leaderboard ](https://github.com/ShishirPatil/gorill
git clone https://github.com/ShishirPatil/gorilla
cd gorilla/berkeley-function-call-leaderboard
git checkout 9b8a5202544f49a846aced185a340361231ef3e1
-curl -s https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/continuous_batching/accuracy/gorilla.patch | git apply -v
+curl -s https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/continuous_batching/accuracy/gorilla.patch | git apply -v
pip install -e . --extra-index-url "https://download.pytorch.org/whl/cpu"
```
The commands below assumes the models is deployed with the name `ovms-model`. It must match the name set in the `bfcl_eval/constants/model_config.py`.
diff --git a/demos/continuous_batching/agentic_ai/README.md b/demos/continuous_batching/agentic_ai/README.md
index 50d54c7698..ed53874a9d 100644
--- a/demos/continuous_batching/agentic_ai/README.md
+++ b/demos/continuous_batching/agentic_ai/README.md
@@ -33,7 +33,7 @@ pip install python-dateutil mcp_weather_server
Install the application requirements
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/continuous_batching/agentic_ai/openai_agent.py -O -L
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/continuous_batching/agentic_ai/openai_agent.py -O -L
pip install openai-agents openai
```
@@ -421,7 +421,7 @@ Let me know if you'd like forecast details or anything else!
### Deploying in a docker container on NPU
The case of NPU is similar to GPU, but `--device` should be set to `/dev/accel`, `--group-add` parameter should be the same.
-Running `docker run` command, use the image with GPU support. Export the models with precision matching the [NPU capacity](https://docs.openvino.ai/nightly/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html) and adjust pipeline configuration.
+Running `docker run` command, use the image with GPU support. Export the models with precision matching the [NPU capacity](https://docs.openvino.ai/2026/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html) and adjust pipeline configuration.
It can be applied using the commands below:
::::{tab-set}
@@ -481,7 +481,7 @@ docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v ${HOME}/models:/mode
You can try also similar implementation based on llama_index library working the same way like openai-agent:
```bash
pip install llama-index-llms-openai-like==0.5.3 llama-index-core==0.14.5 llama-index-tools-mcp==0.4.2
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/continuous_batching/agentic_ai/llama_index_agent.py -o llama_index_agent.py
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/continuous_batching/agentic_ai/llama_index_agent.py -o llama_index_agent.py
python llama_index_agent.py --query "What is the current weather in Tokyo?" --model OpenVINO/Qwen3-8B-int4-ov --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather --stream --enable-thinking
```
@@ -512,8 +512,8 @@ Use those steps to convert the model from HuggingFace Hub to OpenVINO format and
```text
# Download export script, install its dependencies and create directory for the models
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
Run `export_model.py` script to download and quantize the model:
@@ -529,4 +529,4 @@ curl -L -o models/meta-llama/Llama-3.2-3B-Instruct/chat_template.jinja https://r
```text
python export_model.py text_generation --source_model meta-llama/Llama-3.2-3B-Instruct --weight-format nf4 --config_file_path models/config.json --model_repository_path models --tool_parser llama3 --extra_quantization_params "--library transformers --sym group_size -1"
```
-For more details, see [OpenVINO GenAI on NPU](https://docs.openvino.ai/nightly/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html).
\ No newline at end of file
+For more details, see [OpenVINO GenAI on NPU](https://docs.openvino.ai/2026/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html).
\ No newline at end of file
diff --git a/demos/continuous_batching/long_context/README.md b/demos/continuous_batching/long_context/README.md
index 2ab0d64181..49dc971394 100644
--- a/demos/continuous_batching/long_context/README.md
+++ b/demos/continuous_batching/long_context/README.md
@@ -25,8 +25,8 @@ Let's demonstrate all the optimizations combined and test it with the real life
Export the model Qwen/Qwen2.5-7B-Instruct-1M which has the max context length of 1 million tokens!
```bash
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
python export_model.py text_generation --source_model Qwen/Qwen2.5-7B-Instruct-1M --weight-format int4 --config_file_path models/config.json --model_repository_path models
```
@@ -41,7 +41,7 @@ docker run -it --rm -u $(id -u) -p 8000:8000 -v $(pwd)/models/:/models:rw openvi
To test the performance using vllm benchmarking script, let's create a custom dataset with long shared context and a set of questions in each request. That way we can create a dataset with identical very long context with different queries related to the context. That is a common scenario for RAG applications which generates response based on a complete knowledge base. To make this experiment similar to real live, the context is not synthetic but build with the content of Don Quixote story with 10 different questions related to the story. Because the context is reused, it is a perfect case for benefitting from prefix caching.
```bash
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/continuous_batching/long_context/custom_dataset.py -o custom_dataset.py
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/continuous_batching/long_context/custom_dataset.py -o custom_dataset.py
pip install requests transformers
python custom_dataset.py --limit_context_tokens 50000
```
diff --git a/demos/continuous_batching/rag/README.md b/demos/continuous_batching/rag/README.md
index a7951647de..d1cc7929da 100644
--- a/demos/continuous_batching/rag/README.md
+++ b/demos/continuous_batching/rag/README.md
@@ -75,8 +75,8 @@ docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/mo
**Required:** OpenVINO Model Server package - see [deployment instructions](../../../docs/deploying_server_baremetal.md) for details.
```bat
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
-pip3 install -q -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/continuous_batching/rag/requirements.txt
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
+pip3 install -q -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/continuous_batching/rag/requirements.txt
mkdir models
set HF_HOME=C:\hf_home\cache # export HF_HOME=/hf_home/cache if using linux
ovms --pull --model_repository_path models --source_model meta-llama/Meta-Llama-3-8B-Instruct --task text_generation --weight-format int8
@@ -96,8 +96,8 @@ ovms --add_to_config --config_path /models/config.json --model_name BAAI/bge-rer
Use this procedure for all the models outside of OpenVINO organization in HuggingFace Hub.
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
python export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format int8 --kv_cache_precision u8 --config_file_path models/config.json --model_repository_path models
@@ -126,4 +126,4 @@ sc start ovms
```
## Using RAG
-When the model server is deployed and serving all 3 endpoints, run the [jupyter notebook](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/rag/rag_demo.ipynb) to use RAG chain with a fully remote execution.
+When the model server is deployed and serving all 3 endpoints, run the [jupyter notebook](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/rag/rag_demo.ipynb) to use RAG chain with a fully remote execution.
diff --git a/demos/continuous_batching/scaling/README.md b/demos/continuous_batching/scaling/README.md
index 47ee62b879..41b65232fe 100644
--- a/demos/continuous_batching/scaling/README.md
+++ b/demos/continuous_batching/scaling/README.md
@@ -28,8 +28,8 @@ NUMA node5 CPU(s): 160-191,352-383
Download the export_model.py script and install python dependencies:
```bash
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
Use the export_model.py script:
@@ -86,7 +86,7 @@ git clone --branch v0.7.3 --depth 1 https://github.com/vllm-project/vllm
cd vllm
pip3 install -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
cd benchmarks
-curl -L https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json -o ShareGPT_V3_unfiltered_cleaned_split.json
+curl -L https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/releases/2026/1/ShareGPT_V3_unfiltered_cleaned_split.json -o ShareGPT_V3_unfiltered_cleaned_split.json
```
Then, we can run the benchmark script:
```bash
diff --git a/demos/continuous_batching/speculative_decoding/README.md b/demos/continuous_batching/speculative_decoding/README.md
index 9b3398677e..b1620fc251 100644
--- a/demos/continuous_batching/speculative_decoding/README.md
+++ b/demos/continuous_batching/speculative_decoding/README.md
@@ -31,8 +31,8 @@ both in INT4 precision.
Python environment setup:
```console
# Install regular requirements for OVMS export script
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
# Override optimum-intel with version supporting eagle3
python -m pip install git+https://github.com/xufang-lisa/optimum-intel.git@xufang/add_eagle3_draft_model_conversion
@@ -185,8 +185,8 @@ LLM engine parameters will be defined inside the `graph.pbtxt` file.
Download export script, install its dependencies and create directory for the models:
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
diff --git a/demos/continuous_batching/vlm/README.md b/demos/continuous_batching/vlm/README.md
index 0de93e27b8..f26ffa4b68 100644
--- a/demos/continuous_batching/vlm/README.md
+++ b/demos/continuous_batching/vlm/README.md
@@ -123,7 +123,7 @@ curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/js
```console
pip3 install requests
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/static/images/zebra.jpeg -o zebra.jpeg
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/static/images/zebra.jpeg -o zebra.jpeg
```
```python
import requests
@@ -234,7 +234,7 @@ Check [VLM usage with NPU acceleration](../../vlm_npu/README.md)
## References
-- [Export models to OpenVINO format](../common/export_models/README.md)
+- [Export models to OpenVINO format](../../../demos/common/export_models/README.md)
- [Supported VLM models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/#visual-language-models-vlms)
- [Chat Completions API](../../../docs/model_server_rest_api_chat.md)
- [Writing client code](../../../docs/clients_genai.md)
diff --git a/demos/embeddings/README.md b/demos/embeddings/README.md
index 814e505515..a5aea8f115 100644
--- a/demos/embeddings/README.md
+++ b/demos/embeddings/README.md
@@ -95,8 +95,8 @@ That ensures faster initialization time, better performance and lower memory con
Download export script, install it's dependencies and create directory for the models:
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
@@ -475,14 +475,14 @@ Average document length: 66.568 tokens
## RAG with Model Server
Embeddings endpoint can be applied in RAG chains to delegated text feature extraction both for documented vectorization and in context retrieval.
-Check this demo to see the langchain code example which is using OpenVINO Model Server both for text generation and embedding endpoint in [RAG application demo](https://github.com/openvinotoolkit/model_server/tree/main/demos/continuous_batching/rag)
+Check this demo to see the langchain code example which is using OpenVINO Model Server both for text generation and embedding endpoint in [RAG application demo](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/demos/continuous_batching/rag)
## Testing the model accuracy over serving API
A simple method of testing the response accuracy is via comparing the response for a sample prompt from the model server and with local python execution based on HuggingFace python code.
-The script [compare_results.py](https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/embeddings/compare_results.py) can assist with such experiment.
+The script [compare_results.py](https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/embeddings/compare_results.py) can assist with such experiment.
```bash
popd
cd model_server/demos/embeddings
@@ -517,7 +517,7 @@ Difference score with HF AutoModel: 0.020293646680283224
It is easy also to run model evaluation using [MTEB](https://github.com/embeddings-benchmark/mteb) framework using a custom class based on openai model:
```console
pip install "mteb<2" einops openai --extra-index-url "https://download.pytorch.org/whl/cpu"
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/embeddings/ovms_mteb.py -o ovms_mteb.py
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/embeddings/ovms_mteb.py -o ovms_mteb.py
python ovms_mteb.py --model BAAI/bge-large-en-v1.5 --service_url http://localhost:8000/v3/embeddings
```
Results will be stored in `results` folder:
diff --git a/demos/face_blur/python/README.md b/demos/face_blur/python/README.md
index 10d73bae2e..fe25be8f0d 100644
--- a/demos/face_blur/python/README.md
+++ b/demos/face_blur/python/README.md
@@ -1,6 +1,6 @@
# Face Blur Pipeline Demo with OVMS {#ovms_demo_face_blur_pipeline}
-This document demonstrates how to create pipelines using object detection models from OpenVINO Model Zoo in order to blur the image. As an example, we will use [face-detection-retail-0004](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/face-detection-retail-0004/README.md) to detect multiple faces on the image. Then, for each detected face we will blur it using [face_blur](https://github.com/openvinotoolkit/model_server/blob/main/src/custom_nodes/face_blur) example custom node.
+This document demonstrates how to create pipelines using object detection models from OpenVINO Model Zoo in order to blur the image. As an example, we will use [face-detection-retail-0004](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/face-detection-retail-0004/README.md) to detect multiple faces on the image. Then, for each detected face we will blur it using [face_blur](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/custom_nodes/face_blur) example custom node.
## Pipeline Configuration Graph
@@ -10,7 +10,7 @@ Below is depicted graph implementing face blur pipeline execution.
It include the following Nodes:
- Model `face-detection-retail-0004` - deep learning model which takes user image as input. Its output contain information about faces coordinates and confidence levels.
-- Custom node `face_blur` - it includes C++ implementation of image blurring. By analysing the output it produces image blurred in spots detected by object detection model based on the configurable score level threshold. Custom node also resizes it to the target resolution. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [face_blur custom node](https://github.com/openvinotoolkit/model_server/blob/main/src/custom_nodes/face_blur).
+- Custom node `face_blur` - it includes C++ implementation of image blurring. By analysing the output it produces image blurred in spots detected by object detection model based on the configurable score level threshold. Custom node also resizes it to the target resolution. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [face_blur custom node](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/custom_nodes/face_blur).
- Response - image blurred in spots detected by object detection model.
## Prepare workspace to run the demo
diff --git a/demos/face_detection/python/README.md b/demos/face_detection/python/README.md
index 96496170dd..05232025ce 100644
--- a/demos/face_detection/python/README.md
+++ b/demos/face_detection/python/README.md
@@ -8,7 +8,7 @@
## Overview
-The script [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/face_detection/python/face_detection.py) runs face detection inference requests for all the images
+The script [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/face_detection/python/face_detection.py) runs face detection inference requests for all the images
saved in `input_images_dir` directory.
The script can adjust the input image size and change the batch size in the request. It demonstrates how to use
diff --git a/demos/image_classification/go/README.md b/demos/image_classification/go/README.md
index 0615d092e6..775c3ec9af 100644
--- a/demos/image_classification/go/README.md
+++ b/demos/image_classification/go/README.md
@@ -15,7 +15,7 @@ To run end to end flow and get correct results, please download `resnet-50` mode
For example:
```bash
-curl --fail -L --create-dirs https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet50-caffe2-v1-9.onnx -o models/resnet/1/resnet50-caffe2-v1-9.onnx
+curl --fail -L --create-dirs https://github.com/onnx/models/raw/releases/2026/1/validated/vision/classification/resnet/model/resnet50-caffe2-v1-9.onnx -o models/resnet/1/resnet50-caffe2-v1-9.onnx
chmod -R 755 models
```
diff --git a/demos/image_classification/python/README.md b/demos/image_classification/python/README.md
index f2e57539dc..53f4ce9fe0 100644
--- a/demos/image_classification/python/README.md
+++ b/demos/image_classification/python/README.md
@@ -2,7 +2,7 @@
## Overview
-The script [image_classification.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/image_classification/python/image_classification.py) reads all images and their labels specified in the text file. It then classifies them with [ResNet50](https://github.com/openvinotoolkit/open_model_zoo/blob/releases/2023/1/models/intel/resnet50-binary-0001/README.md) model and presents accuracy results.
+The script [image_classification.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/image_classification/python/image_classification.py) reads all images and their labels specified in the text file. It then classifies them with [ResNet50](https://github.com/openvinotoolkit/open_model_zoo/blob/releases/2023/1/models/intel/resnet50-binary-0001/README.md) model and presents accuracy results.
## Download ResNet50 model
diff --git a/demos/image_classification_using_tf_model/python/README.md b/demos/image_classification_using_tf_model/python/README.md
index 29c60f6d20..d8429367fc 100644
--- a/demos/image_classification_using_tf_model/python/README.md
+++ b/demos/image_classification_using_tf_model/python/README.md
@@ -32,7 +32,7 @@ chmod -R 755 model
docker run -d -v $PWD/model:/models -p 9000:9000 openvino/model_server:latest --model_path /models --model_name resnet --port 9000
```
-Alternatively see (instructions)[https://github.com/openvinotoolkit/model_server/blob/main/docs/deploying_server_baremetal.md] for deployment on bare metal.
+Alternatively see (instructions)[https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/deploying_server_baremetal.md] for deployment on bare metal.
Make sure to:
diff --git a/demos/image_classification_with_string_output/README.md b/demos/image_classification_with_string_output/README.md
index fb2711f1bf..7855be6acf 100644
--- a/demos/image_classification_with_string_output/README.md
+++ b/demos/image_classification_with_string_output/README.md
@@ -31,7 +31,7 @@ model
docker run -d -u $(id -u):$(id -g) -v $(pwd):/workspace -p 8000:8000 openvino/model_server:latest \
--model_path /workspace/model --model_name mobile_net --rest_port 8000
```
-Alternatively see (instructions)[https://github.com/openvinotoolkit/model_server/blob/main/docs/deploying_server_baremetal.md] for deployment on bare metal.
+Alternatively see (instructions)[https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/deploying_server_baremetal.md] for deployment on bare metal.
Make sure to:
diff --git a/demos/image_generation/README.md b/demos/image_generation/README.md
index f37257e2de..72ce9028c5 100644
--- a/demos/image_generation/README.md
+++ b/demos/image_generation/README.md
@@ -182,8 +182,8 @@ Image generation pipeline parameters will be defined inside the `graph.pbtxt` fi
Download export script, install it's dependencies and create directory for the models:
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
diff --git a/demos/integration_with_OpenWebUI/README.md b/demos/integration_with_OpenWebUI/README.md
index 38420b27de..a3b42beaf8 100644
--- a/demos/integration_with_OpenWebUI/README.md
+++ b/demos/integration_with_OpenWebUI/README.md
@@ -98,7 +98,7 @@ Click **New Chat** and select the model to start chatting
### (optional) Step 3: Set request parameters
-There are multiple configurable parameters in OVMS, all of them for `/v3/chat/completions` endpoint are accessible in [chat api documentation](https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_rest_api_chat.md#request).
+There are multiple configurable parameters in OVMS, all of them for `/v3/chat/completions` endpoint are accessible in [chat api documentation](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/model_server_rest_api_chat.md#request).
To configure them in *OpenWebUI* with an example of turning off reasoning:
1. Go to **Admin Panel** -> **Settings** -> **Models** ([http://localhost:8080/admin/settings/models](http://localhost:8080/admin/settings/models))
@@ -109,7 +109,7 @@ To configure them in *OpenWebUI* with an example of turning off reasoning:

### Reference
-[https://docs.openwebui.com/getting-started/quick-start/starting-with-openai-compatible](https://docs.openwebui.com/getting-started/quick-start/starting-with-openai-compatible/#step-2-connect-your-server-to-open-webui)
+[https://docs.openwebui.com/getting-started/quick-start/starting-with-openai-compatible](https://docs.openwebui.com/getting-started/quick-start/)
---
@@ -271,7 +271,7 @@ Method 2:
### Reference
[https://docs.openvino.ai/2026/model-server/ovms_demos_image_generation.html](https://docs.openvino.ai/2026/model-server/ovms_demos_image_generation.html#export-model-for-cpu)
-[https://docs.openwebui.com/features/image-generation-and-editing](https://docs.openwebui.com/features/image-generation-and-editing/openai)
+[https://docs.openwebui.com/features/media-generation/image-generation-and-editing](https://docs.openwebui.com/features/media-generation/image-generation-and-editing/usage)
---
## VLM
@@ -351,7 +351,7 @@ mcpo --port 9000 -- python -m mcp_weather_server

### Reference
-[https://docs.openwebui.com/features/plugin/tools/openapi-servers/open-webui](https://docs.openwebui.com/features/plugin/tools/openapi-servers/open-webui#step-2-connect-tool-server-in-open-webui)
+[https://docs.openwebui.com/features/extensibility/plugin/tools/openapi-servers/open-webui](https://docs.openwebui.com/features/extensibility/plugin/tools/openapi-servers/open-webui#step-2-connect-tool-server-in-open-webui)
## Audio
@@ -362,8 +362,8 @@ mcpo --port 9000 -- python -m mcp_weather_server
Start by downloading `export_model.py` script and run it to download and quantize the model for speech generation:
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
python export_model.py text2speech --source_model microsoft/speecht5_tts --weight-format fp32 --model_name microsoft/speecht5_tts --config_file_path models/config.json --model_repository_path models --vocoder microsoft/speecht5_hifigan
```
diff --git a/demos/llm_npu/README.md b/demos/llm_npu/README.md
index f2d60c2660..6fabf84b92 100644
--- a/demos/llm_npu/README.md
+++ b/demos/llm_npu/README.md
@@ -286,7 +286,7 @@ The three main tourist attractions in Paris are the Eiffel Tower, the Louvre, an
## Testing the model accuracy over serving API
-Check the [guide of using lm-evaluation-harness](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/accuracy/README.md)
+Check the [guide of using lm-evaluation-harness](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/accuracy/README.md)
> **Note:** Text generation on NPU is not returning the log_probs which are required to calculate some of the metrics. Only the tasks of type `generate_until` can be used.
For example `--tasks leaderboard_ifeval`.
diff --git a/demos/mediapipe/holistic_tracking/README.md b/demos/mediapipe/holistic_tracking/README.md
index a5476120d4..403178eeb7 100644
--- a/demos/mediapipe/holistic_tracking/README.md
+++ b/demos/mediapipe/holistic_tracking/README.md
@@ -114,5 +114,5 @@ Results saved to :image_0.jpg
## Real time stream analysis
-For demo featuring real time stream application see [real_time_stream_analysis](https://github.com/openvinotoolkit/model_server/tree/main/demos/real_time_stream_analysis/python)
+For demo featuring real time stream application see [real_time_stream_analysis](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/demos/real_time_stream_analysis/python)
diff --git a/demos/mediapipe/iris_tracking/README.md b/demos/mediapipe/iris_tracking/README.md
index 99e2dc7510..07e58b489b 100644
--- a/demos/mediapipe/iris_tracking/README.md
+++ b/demos/mediapipe/iris_tracking/README.md
@@ -51,7 +51,7 @@ ovms --config_path config_iris.json --port 9000
```console
pip install -r requirements.txt
# download a sample image for analysis
-wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/people/people2.jpeg
+wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/common/static/images/people/people2.jpeg
echo people2.jpeg>input_images.txt
# launch the client
python mediapipe_iris_tracking.py --grpc_port 9000 --images_list input_images.txt
diff --git a/demos/mediapipe/multi_model_graph/README.md b/demos/mediapipe/multi_model_graph/README.md
index 50ff80c266..df5c9ad314 100644
--- a/demos/mediapipe/multi_model_graph/README.md
+++ b/demos/mediapipe/multi_model_graph/README.md
@@ -99,7 +99,7 @@ xcopy /s /e /q /y ..\..\..\src\test\dummy .\dummyAdd\dummy\
## Server Deployment
:::{dropdown} **Deploying with Docker**
-Prepare virtualenv according to [kserve samples readme](https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md)
+Prepare virtualenv according to [kserve samples readme](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/kserve-api/samples/README.md)
```bash
docker run -d -v $PWD:/mediapipe -p 9000:9000 openvino/model_server:latest --config_path /mediapipe/config.json --port 9000
```
diff --git a/demos/mediapipe/object_detection/README.md b/demos/mediapipe/object_detection/README.md
index 81a50a42c5..67d825c96b 100644
--- a/demos/mediapipe/object_detection/README.md
+++ b/demos/mediapipe/object_detection/README.md
@@ -61,4 +61,4 @@ Received images with bounding boxes will be located in ./results directory.
## Real time stream analysis
-For demo featuring real time stream application see [real_time_stream_analysis](https://github.com/openvinotoolkit/model_server/tree/main/demos/real_time_stream_analysis/python)
+For demo featuring real time stream application see [real_time_stream_analysis](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/demos/real_time_stream_analysis/python)
diff --git a/demos/model_ensemble/python/README.md b/demos/model_ensemble/python/README.md
index 6a7e768b67..9a702563bf 100644
--- a/demos/model_ensemble/python/README.md
+++ b/demos/model_ensemble/python/README.md
@@ -24,7 +24,7 @@ make
The steps in `Makefile` are:
1. Download and use the models from [open model zoo](https://github.com/openvinotoolkit/open_model_zoo).
-2. Use [python script](https://github.com/openvinotoolkit/model_server/blob/main/tests/models/argmax_sum.py) located in this repository. Since it uses tensorflow to create models in _saved model_ format, hence tensorflow pip package is required.
+2. Use [python script](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/tests/models/argmax_sum.py) located in this repository. Since it uses tensorflow to create models in _saved model_ format, hence tensorflow pip package is required.
3. Prepare argmax model with `(1, 1001)` input shapes to match output of the googlenet and resnet output shapes. The generated model will sum inputs and calculate the index with the highest value. The model output will indicate the most likely predicted class from the ImageNet* dataset.
4. Convert models to IR format and [prepare models repository](../../../docs/models_repository.md).
@@ -54,7 +54,7 @@ models
## Step 2: Define required models and pipeline
Pipelines need to be defined in the configuration file to use them. The same configuration file is used to define served models and served pipelines.
-Use the [config.json located here](https://github.com/openvinotoolkit/model_server/blob/main/demos/model_ensemble/python/config.json), the content is as follows:
+Use the [config.json located here](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/model_ensemble/python/config.json), the content is as follows:
```bash
cat config.json
{
diff --git a/demos/multi_faces_analysis_pipeline/python/README.md b/demos/multi_faces_analysis_pipeline/python/README.md
index e6148ef29f..9654e7cc2e 100644
--- a/demos/multi_faces_analysis_pipeline/python/README.md
+++ b/demos/multi_faces_analysis_pipeline/python/README.md
@@ -1,7 +1,7 @@
# Multi Faces Analysis Pipeline Demo {#ovms_demo_multi_faces_analysis_pipeline}
-This document demonstrates how to create complex pipelines using object detection and object recognition models from OpenVINO Model Zoo. As an example, we will use [face-detection-retail-0004](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/face-detection-retail-0004/README.md) to detect multiple faces on the image. Then, for each detected face we will crop it using [model_zoo_intel_object_detection](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection) example custom node. Finally, each image face image will be forwarded to [age-gender-recognition-retail-0013](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/age-gender-recognition-retail-0013/README.md) and [emotion-recognition-retail-0003](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/emotions-recognition-retail-0003/README.md) models.
+This document demonstrates how to create complex pipelines using object detection and object recognition models from OpenVINO Model Zoo. As an example, we will use [face-detection-retail-0004](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/face-detection-retail-0004/README.md) to detect multiple faces on the image. Then, for each detected face we will crop it using [model_zoo_intel_object_detection](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection) example custom node. Finally, each image face image will be forwarded to [age-gender-recognition-retail-0013](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/age-gender-recognition-retail-0013/README.md) and [emotion-recognition-retail-0003](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/emotions-recognition-retail-0003/README.md) models.

@@ -20,7 +20,7 @@ Below is depicted graph implementing faces analysis pipeline execution.
It includes the following Nodes:
- Model `face-detection` - deep learning model which takes user image as input. Its outputs contain information about face coordinates and confidence levels.
- Custom node `model_zoo_intel_object_detection` - it includes C++ implementation of common object detection models results processing. By analysing the output it produces cropped face images based on the configurable score level threshold. Custom node also resizes them to the target resolution and combines into a single output of a dynamic batch size. The output batch size is determined by the number of detected
-boxes according to the configured criteria. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection).
+boxes according to the configured criteria. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection).
- demultiplexer - outputs from the custom node model_zoo_intel_object_detection have variable batch size. In order to match it with the sequential recognition models, data is split into individual images with each batch size equal to 1.
Such smaller requests can be submitted for inference in parallel to the next Model Nodes. Learn more about the [demultiplexing](../../../docs/demultiplexing.md).
- Model `age-gender-recognition` - this model recognizes age and gender on given face image
@@ -111,7 +111,7 @@ docker run -p 9000:9000 -d -v ${PWD}/workspace:/workspace openvino/model_server
## Requesting the Service
-Exemplary client [multi_faces_analysis_pipeline.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/multi_faces_analysis_pipeline/python/multi_faces_analysis_pipeline.py) can be used to request pipeline deployed in previous step.
+Exemplary client [multi_faces_analysis_pipeline.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/multi_faces_analysis_pipeline/python/multi_faces_analysis_pipeline.py) can be used to request pipeline deployed in previous step.
```bash
pip3 install -r requirements.txt
diff --git a/demos/optical_character_recognition/python/README.md b/demos/optical_character_recognition/python/README.md
index fb410488ef..651092a29b 100644
--- a/demos/optical_character_recognition/python/README.md
+++ b/demos/optical_character_recognition/python/README.md
@@ -18,7 +18,7 @@ It includes the following nodes:
- Custom node east_ocr - it includes C++ implementation of east-resnet50 model results processing. It analyses the detected boxes coordinates, filters the results
based on the configurable score level threshold and and applies non-max suppression algorithm to remove overlapping boxes. Finally the custom node east-ocr crops all detected boxes
from the original image, resize them to the target resolution and combines into a single output of a dynamic batch size. The output batch size is determined by the number of detected
-boxes according to the configured criteria. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [east_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/east_ocr)
+boxes according to the configured criteria. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [east_ocr custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/east_ocr)
- demultiplexer - output from the Custom node east_ocr have variable batch size. In order to match it with the sequential text detection model, the data is split into individual images with batch size 1 each.
Such smaller requests can be submitted for inference in parallel to the next Model Node. Learn more about the [demultiplexing](../../../docs/demultiplexing.md)
- Model text-recognition - this model recognizes characters included in the input image.
@@ -103,11 +103,11 @@ text-recognition model will have the following interface:
## Building the Custom Node "east_ocr" Library
-Custom nodes are loaded into OVMS as dynamic library implementing OVMS API from [custom_node_interface.h](https://github.com/openvinotoolkit/model_server/blob/main/src/custom_node_interface.h).
+Custom nodes are loaded into OVMS as dynamic library implementing OVMS API from [custom_node_interface.h](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/custom_node_interface.h).
It can use OpenCV libraries included in OVMS or it could use other third party components.
The custom node east_ocr can be built inside a docker container via the following procedure:
-- go to the directory with custom node examples [src/custom_node](https://github.com/openvinotoolkit/model_server/blob/main/src/custom_nodes)
+- go to the directory with custom node examples [src/custom_node](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/custom_nodes)
- run `make` command:
```bash
@@ -131,7 +131,7 @@ cp -R EAST/IR/1 OCR/east_fp32/1
## OVMS Configuration File
-The configuration file for running the OCR demo is stored in [config.json](https://github.com/openvinotoolkit/model_server/blob/main/demos/optical_character_recognition/python/config.json)
+The configuration file for running the OCR demo is stored in [config.json](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/optical_character_recognition/python/config.json)
Copy this file along with the model files and the custom node library like presented below:
```bash
cp model_server/demos/optical_character_recognition/python/config.json OCR
diff --git a/demos/real_time_stream_analysis/python/README.md b/demos/real_time_stream_analysis/python/README.md
index 4c57021cd9..3e20c1071a 100644
--- a/demos/real_time_stream_analysis/python/README.md
+++ b/demos/real_time_stream_analysis/python/README.md
@@ -30,7 +30,7 @@ In the demo will be used two gRPC communication patterns which might be advantag
## gRPC streaming with MediaPipe graphs
gRPC stream connection is allowed for served [MediaPipe graphs](../../../docs/mediapipe.md). It allows sending asynchronous calls to the endpoint all linked in a single session context. Responses are sent back via a stream and processed in the callback function.
-The helper class [StreamClient](https://github.com/openvinotoolkit/model_server/blob/main/demos/common/stream_client/stream_client.py) provides a mechanism for flow control and tracking the sequence of the requests and responses. In the StreamClient initialization the streaming mode is set via the parameter `streaming_api=True`.
+The helper class [StreamClient](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/common/stream_client/stream_client.py) provides a mechanism for flow control and tracking the sequence of the requests and responses. In the StreamClient initialization the streaming mode is set via the parameter `streaming_api=True`.
Using the streaming API has the following advantages:
- good performance thanks to asynchronous calls and sharing the graph execution for multiple calls
@@ -39,7 +39,7 @@ Using the streaming API has the following advantages:
### Preparing the model server for gRPC streaming with a Holistic graph
-The [holistic graph](https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/holistic_tracking/holistic_tracking.pbtxt) is expecting and IMAGE object on the input and returns an IMAGE on the output.
+The [holistic graph](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/mediapipe/holistic_tracking/holistic_tracking.pbtxt) is expecting and IMAGE object on the input and returns an IMAGE on the output.
As such it doesn't require any preprocessing and postprocessing. In this demo the returned stream will be just visualized or sent to the target sink.
The model server with the holistic use case can be deployed using steps from [this](../../mediapipe/holistic_tracking/README.md#server-deployment) article.
diff --git a/demos/rerank/README.md b/demos/rerank/README.md
index 32379ddcaf..3536f80016 100644
--- a/demos/rerank/README.md
+++ b/demos/rerank/README.md
@@ -15,8 +15,8 @@ That ensures faster initialization time, better performance and lower memory con
Download export script, install it's dependencies and create directory for the models:
```console
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
-pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/export_model.py -o export_model.py
+pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
mkdir models
```
diff --git a/demos/single_face_analysis_pipeline/python/README.md b/demos/single_face_analysis_pipeline/python/README.md
index 30f31b4272..0cc35767ae 100644
--- a/demos/single_face_analysis_pipeline/python/README.md
+++ b/demos/single_face_analysis_pipeline/python/README.md
@@ -75,7 +75,7 @@ ovms --config_path workspace/config.json --port 9001
:::
## Requesting the Service
-Exemplary client [single_face_analysis_pipeline.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/single_face_analysis_pipeline/python/single_face_analysis_pipeline.py) can be used to request pipeline deployed in previous step.
+Exemplary client [single_face_analysis_pipeline.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/single_face_analysis_pipeline/python/single_face_analysis_pipeline.py) can be used to request pipeline deployed in previous step.
```console
pip3 install -r requirements.txt
diff --git a/demos/universal-sentence-encoder/README.md b/demos/universal-sentence-encoder/README.md
index ad36dcbcb0..76caba73d6 100644
--- a/demos/universal-sentence-encoder/README.md
+++ b/demos/universal-sentence-encoder/README.md
@@ -46,7 +46,7 @@ Check the container logs to confirm successful start:
docker logs ovms
```
-Alternatively see (instructions)[https://github.com/openvinotoolkit/model_server/blob/main/docs/deploying_server_baremetal.md] for deployment on bare metal.
+Alternatively see (instructions)[https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/deploying_server_baremetal.md] for deployment on bare metal.
Make sure to:
diff --git a/demos/vehicle_analysis_pipeline/python/README.md b/demos/vehicle_analysis_pipeline/python/README.md
index eb19a9b04d..531aeb44de 100644
--- a/demos/vehicle_analysis_pipeline/python/README.md
+++ b/demos/vehicle_analysis_pipeline/python/README.md
@@ -1,5 +1,5 @@
# Vehicle Analysis Pipeline Demo {#ovms_demo_vehicle_analysis_pipeline}
-This document demonstrates how to create complex pipelines using object detection and object recognition models from OpenVINO Model Zoo. As an example, we will use [vehicle-detection-0202](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/vehicle-detection-0202/README.md) to detect multiple vehicles on the image. Then, for each detected vehicle we will crop it using [model_zoo_intel_object_detection](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection) example custom node. Finally, each vehicle image will be forwarded to [vehicle-attributes-recognition-barrier-0042](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/vehicle-attributes-recognition-barrier-0042/README.md) model.
+This document demonstrates how to create complex pipelines using object detection and object recognition models from OpenVINO Model Zoo. As an example, we will use [vehicle-detection-0202](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/vehicle-detection-0202/README.md) to detect multiple vehicles on the image. Then, for each detected vehicle we will crop it using [model_zoo_intel_object_detection](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection) example custom node. Finally, each vehicle image will be forwarded to [vehicle-attributes-recognition-barrier-0042](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/vehicle-attributes-recognition-barrier-0042/README.md) model.

@@ -14,7 +14,7 @@ Below is depicted graph implementing vehicles analysis pipeline execution.
It includes the following Nodes:
- Model `vehicle_detection` - deep learning model which takes user image as input. Its outputs contain information about vehicle coordinates and confidence levels.
- Custom node `model_zoo_intel_object_detection` - it includes C++ implementation of common object detection models results processing. By analysing the output it produces cropped vehicle images based on the configurable score level threshold. Custom node also resizes them to the target resolution and combines into a single output of a dynamic batch size. The output batch size is determined by the number of detected
-boxes according to the configured criteria. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection).
+boxes according to the configured criteria. All operations on the images employ OpenCV libraries which are preinstalled in the OVMS. Learn more about the [model_zoo_intel_object_detection custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection).
- demultiplexer - outputs from the custom node model_zoo_intel_object_detection have variable batch size. In order to match it with the sequential recognition models, data is split into individual images with each batch size equal to 1.
Such smaller requests can be submitted for inference in parallel to the next Model Nodes. Learn more about the [demultiplexing](../../../docs/demultiplexing.md).
- Model `vehicle_attributes_recognition` - this model recognizes type and color for given vehicle image
diff --git a/demos/vlm_npu/README.md b/demos/vlm_npu/README.md
index c38144b08f..fe1511c2ab 100644
--- a/demos/vlm_npu/README.md
+++ b/demos/vlm_npu/README.md
@@ -77,9 +77,9 @@ curl http://localhost:8000/v3/models
```console
pip3 install requests
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/static/images/zebra.jpeg -o zebra.jpeg
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/static/images/zebra.jpeg -o zebra.jpeg
```
-
+
:::{dropdown} **Unary call with curl using image from local filesystem**
@@ -237,7 +237,7 @@ python benchmark_serving.py --backend openai-chat --dataset-name hf --dataset-pa
## Testing the model accuracy over serving API
-Check the [guide of using lm-evaluation-harness](https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/accuracy/README.md)
+Check the [guide of using lm-evaluation-harness](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/continuous_batching/accuracy/README.md)
## Limitations
diff --git a/docs/binary_input_kfs.md b/docs/binary_input_kfs.md
index 94339f39bc..a0cf2d55a9 100644
--- a/docs/binary_input_kfs.md
+++ b/docs/binary_input_kfs.md
@@ -23,7 +23,7 @@ KServe API also allows sending encoded images via HTTP interface to the model or
For binary inputs, the `parameters` map in the JSON part contains `binary_data_size` field for each binary input that indicates the size of the data on the input. Since there's no strict limitations on image resolution and format (as long as it can be loaded by OpenCV), images might be of different sizes. To send a batch of images you need to precede data of every batch by 4 bytes(little endian) containing size of this batch and specify their combined size in `binary_data_size`. For example, if batch would contain three images of sizes 370, 480, 500 bytes the content of input buffer inside binary extension would look like this:
<0x72010000 (=370)><370 bytes of first image><0xE0010000 (=480)><480 bytes of second image> <0xF4010000 (=500)><500 bytes of third image>
And in that case binary_data_size would be 1350(370 + 480 + 500)
-Function set_data_from_numpy in triton client lib that we use in our [REST sample](https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/http_infer_binary_resnet.py) automatically converts given images to this format.
+Function set_data_from_numpy in triton client lib that we use in our [REST sample](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/kserve-api/samples/http_infer_binary_resnet.py) automatically converts given images to this format.
If the request contains only one input `binary_data_size` parameter can be omitted - in this case whole buffer is treated as a input image.
@@ -48,8 +48,8 @@ For the Raw Data binary inputs `binary_data_size` parameter can be omitted since
## Usage examples
-Sample clients that use binary inputs via KFS API can be found here ([REST sample](https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/http_infer_binary_resnet.py))/([GRPC sample](https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/grpc_infer_binary_resnet.py))
-Also, see the ([README](https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md))
+Sample clients that use binary inputs via KFS API can be found here ([REST sample](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/kserve-api/samples/http_infer_binary_resnet.py))/([GRPC sample](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/kserve-api/samples/grpc_infer_binary_resnet.py))
+Also, see the ([README](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/kserve-api/samples/README.md))
## Recommendations:
diff --git a/docs/clients_kfs.md b/docs/clients_kfs.md
index d08d6bd4f9..f90bf205b7 100644
--- a/docs/clients_kfs.md
+++ b/docs/clients_kfs.md
@@ -8,7 +8,7 @@ hidden:
gRPC API
RESTful API
-Examples
+Examples
```
## Python Client
@@ -821,4 +821,4 @@ client.stop_stream()
:::
::::
-For complete usage examples see [Kserve samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples).
+For complete usage examples see [Kserve samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples).
diff --git a/docs/clients_tfs.md b/docs/clients_tfs.md
index b288420fcd..a617c390c6 100644
--- a/docs/clients_tfs.md
+++ b/docs/clients_tfs.md
@@ -8,7 +8,7 @@ hidden:
gRPC API
RESTful API
-Examples
+Examples
```
## Python Client
diff --git a/docs/custom_model_loader.md b/docs/custom_model_loader.md
index 0cfd1cd409..17778d8a71 100644
--- a/docs/custom_model_loader.md
+++ b/docs/custom_model_loader.md
@@ -37,7 +37,7 @@ To enable a particular model to load using custom loader, add extra parameter in
### C++ API Interface for custom loader:
-A base class **CustomLoaderInterface** along with interface API is defined in [src/customloaderinterface.hpp](https://github.com/openvinotoolkit/model_server/blob/main/src/customloaderinterface.hpp)
+A base class **CustomLoaderInterface** along with interface API is defined in [src/customloaderinterface.hpp](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/customloaderinterface.hpp)
Refer to this file for API details.
diff --git a/docs/custom_node_development.md b/docs/custom_node_development.md
index 6e797aef6d..9c9302ee1b 100644
--- a/docs/custom_node_development.md
+++ b/docs/custom_node_development.md
@@ -11,7 +11,7 @@ developed in C++ or C to perform arbitrary data transformations.
## Custom Node API
-The custom node library must implement the API interface defined in [custom_node_interface.h](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_node_interface.h).
+The custom node library must implement the API interface defined in [custom_node_interface.h](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_node_interface.h).
The interface is defined in `C` to simplify compatibility with various compilers. The library could use third party components
linked statically or dynamically. OpenCV is a built in component in OVMS which could be used to perform manipulation on the image
data.
@@ -67,7 +67,7 @@ Note that during the function execution all the output data buffers need to be a
the request processing is completed and returned to the user. The cleanup is triggered by calling the `release` function
which also needs to be implemented in the custom library.
-In some cases, dynamic allocation in `execute` call might be a performance bottleneck or cause memory fragmentation. Starting from 2022.1 release, it is possible to preallocate memory during DAG initialization and reuse it in subsequent inference requests. Refer to `initialize` and `deinitialize` functions below. Those can be used to implement preallocated memory pool. Example implementation can be seen in [custom node example source](https://github.com/openvinotoolkit/model_server/blob/main/src/custom_nodes/add_one/add_one.cpp#L141).
+In some cases, dynamic allocation in `execute` call might be a performance bottleneck or cause memory fragmentation. Starting from 2022.1 release, it is possible to preallocate memory during DAG initialization and reuse it in subsequent inference requests. Refer to `initialize` and `deinitialize` functions below. Those can be used to implement preallocated memory pool. Example implementation can be seen in [custom node example source](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/custom_nodes/add_one/add_one.cpp#L141).
Execute function returns an integer value that defines the success (`0` value) or failure (other than 0). When the function
reports error, the pipeline execution is stopped and the error is returned to the user.
@@ -147,7 +147,7 @@ would be converted to ["String_123", "", "zebra"].
## Building
Custom node library can be compiled using any tool. It is recommended to follow the example based
-a docker container with all build dependencies included. It is described in this [Makefile](https://github.com/openvinotoolkit/model_server/blob/main/src/custom_nodes/Makefile).
+a docker container with all build dependencies included. It is described in this [Makefile](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/custom_nodes/Makefile).
## Testing
The recommended method for testing the custom library is via OVMS execution:
@@ -157,7 +157,7 @@ The recommended method for testing the custom library is via OVMS execution:
- Submit a request to OVMS endpoint using a gRPC or REST client.
- Analyse the logs on the OVMS server.
-For debugging steps, refer to the OVMS [developer guide](https://github.com/openvinotoolkit/model_server/blob/main/docs/developer_guide.md)
+For debugging steps, refer to the OVMS [developer guide](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/developer_guide.md)
## Built-in custom nodes
@@ -167,16 +167,16 @@ Below you can see the list of fully functional custom nodes embedded in the mode
| Custom Node | Location in the container |
| :--- | :---- |
-| [east-resnet50 OCR custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/east_ocr) | `/ovms/lib/custom_nodes/libcustom_node_east_ocr.so`|
-| [horizontal OCR custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/horizontal_ocr) | `/ovms/lib/custom_nodes/libcustom_node_horizontal_ocr.so`|
-| [model zoo intel object detection custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/model_zoo_intel_object_detection) | `/ovms/lib/custom_nodes/libcustom_node_model_zoo_intel_object_detection.so`|
-| [image transformation custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/image_transformation) | `/ovms/lib/custom_nodes/libcustom_node_image_transformation.so`|
-| [add one custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/add_one) | `/ovms/lib/custom_nodes/libcustom_node_add_one.so`|
-| [face blur custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/face_blur) | `/ovms/lib/custom_nodes/libcustom_node_face_blur.so`|
+| [east-resnet50 OCR custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/east_ocr) | `/ovms/lib/custom_nodes/libcustom_node_east_ocr.so`|
+| [horizontal OCR custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/horizontal_ocr) | `/ovms/lib/custom_nodes/libcustom_node_horizontal_ocr.so`|
+| [model zoo intel object detection custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/model_zoo_intel_object_detection) | `/ovms/lib/custom_nodes/libcustom_node_model_zoo_intel_object_detection.so`|
+| [image transformation custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/image_transformation) | `/ovms/lib/custom_nodes/libcustom_node_image_transformation.so`|
+| [add one custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/add_one) | `/ovms/lib/custom_nodes/libcustom_node_add_one.so`|
+| [face blur custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/face_blur) | `/ovms/lib/custom_nodes/libcustom_node_face_blur.so`|
**Example:**
-Including built-in [horizontal OCR custom node](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/horizontal_ocr) in the `config.json` would look like:
+Including built-in [horizontal OCR custom node](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/horizontal_ocr) in the `config.json` would look like:
```json
...
"custom_node_library_config_list": [
@@ -191,8 +191,8 @@ Including built-in [horizontal OCR custom node](https://github.com/openvinotoolk
The custom node is already available under this path. No need to build anything and mounting to the container.
Additional examples are included in the unit tests:
-- [node_add_sub.c](https://github.com/openvinotoolkit/model_server/tree/main/src/test/custom_nodes/node_add_sub.c)
-- [node_choose_maximum.cpp](https://github.com/openvinotoolkit/model_server/tree/main/src/test/custom_nodes/node_choose_maximum.cpp)
-- [node_missing_implementation.c](https://github.com/openvinotoolkit/model_server/tree/main/src/test/custom_nodes/node_missing_implementation.c)
-- [node_perform_different_operations.cpp](https://github.com/openvinotoolkit/model_server/tree/main/src/test/custom_nodes/node_perform_different_operations.cpp)
+- [node_add_sub.c](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/test/custom_nodes/node_add_sub.c)
+- [node_choose_maximum.cpp](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/test/custom_nodes/node_choose_maximum.cpp)
+- [node_missing_implementation.c](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/test/custom_nodes/node_missing_implementation.c)
+- [node_perform_different_operations.cpp](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/test/custom_nodes/node_perform_different_operations.cpp)
diff --git a/docs/dag_scheduler.md b/docs/dag_scheduler.md
index 14a888268a..64c5e54108 100644
--- a/docs/dag_scheduler.md
+++ b/docs/dag_scheduler.md
@@ -44,7 +44,7 @@ There are two special kinds of nodes - Request and Response node. Both of them a
### Custom node type
* custom - that node can be used to implement all operations on the data which can not be handled by the neural network model. It is represented by
-a C++ dynamic library implementing OVMS API defined in [custom_node_interface.h](https://github.com/openvinotoolkit/model_server/blob/main/src/custom_node_interface.h). Custom nodes can run the data
+a C++ dynamic library implementing OVMS API defined in [custom_node_interface.h](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/custom_node_interface.h). Custom nodes can run the data
processing using OpenCV, which is included in OVMS, or include other third-party components. Custom node libraries are loaded into OVMS
by adding their definition to the pipeline configuration. The configuration includes a path to the compiled binary with the `.so` extension.
Custom nodes are not versioned, meaning one custom node library is bound to one name. To load another version, another name needs to be used.
diff --git a/docs/deploying_server_docker.md b/docs/deploying_server_docker.md
index 097834f5ed..6896d38ff5 100644
--- a/docs/deploying_server_docker.md
+++ b/docs/deploying_server_docker.md
@@ -43,8 +43,8 @@ docker run -u $(id -u) -v $(pwd)/models:/models -p 9000:9000 openvino/model_serv
##### 2.2 Download input files: an image and a label mapping file
```bash
-wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/zebra.jpeg
-wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/python/classes.py
+wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/common/static/images/zebra.jpeg
+wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/common/python/classes.py
```
##### 2.3 Install the Python-based ovmsclient package
diff --git a/docs/deploying_server_kubernetes.md b/docs/deploying_server_kubernetes.md
index 264627bffc..2be8304e51 100644
--- a/docs/deploying_server_kubernetes.md
+++ b/docs/deploying_server_kubernetes.md
@@ -5,7 +5,7 @@ The recommended deployment method in Kubernetes is via Kserve operator for Kuber
## ServingRuntime configuration:
```
-curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/kserve/kserve-openvino.yaml -O
+curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/extras/kserve/kserve-openvino.yaml -O
sed -i 's/openvino\/model_server:replace/openvino\/model_server:2025.4-py/' kserve-openvino.yaml
kubectl apply -f kserve-openvino.yaml
```
diff --git a/docs/dynamic_bs_auto_reload.md b/docs/dynamic_bs_auto_reload.md
index 67bd917291..a5c968e162 100644
--- a/docs/dynamic_bs_auto_reload.md
+++ b/docs/dynamic_bs_auto_reload.md
@@ -7,7 +7,7 @@ This guide shows how to configure a model to accept input data with different ba
Enabling dynamic batch size via model reload is as simple as setting the `batch_size` parameter to `auto`. To configure and use the dynamic batch size, take advantage of:
-- An example client in Python [grpc_predict_resnet.py](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/grpc_predict_resnet.py) that can be used to request inference with the desired batch size.
+- An example client in Python [grpc_predict_resnet.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/grpc_predict_resnet.py) that can be used to request inference with the desired batch size.
- A sample [resnet](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/resnet50-binary-0001/README.md) model.
diff --git a/docs/dynamic_bs_demultiplexer.md b/docs/dynamic_bs_demultiplexer.md
index d00a9754f5..3e69bc807f 100644
--- a/docs/dynamic_bs_demultiplexer.md
+++ b/docs/dynamic_bs_demultiplexer.md
@@ -9,7 +9,7 @@ More information about this feature can be found in [dynamic batch size in demul
> **NOTE**: Only one dynamic demultiplexer (`demultiply_count` with value `-1`) can exist in the pipeline.
-- Example client in python [grpc_predict_resnet.py](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/grpc_predict_resnet.py) can be used to request the pipeline. Use `--dag-batch-size-auto` flag to add an additional dimension to the input shape which is required for demultiplexing feature.
+- Example client in python [grpc_predict_resnet.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/grpc_predict_resnet.py) can be used to request the pipeline. Use `--dag-batch-size-auto` flag to add an additional dimension to the input shape which is required for demultiplexing feature.
- The example uses model [resnet](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/models/intel/resnet50-binary-0001/README.md).
diff --git a/docs/dynamic_input.md b/docs/dynamic_input.md
index b4191601d5..17fd510c02 100644
--- a/docs/dynamic_input.md
+++ b/docs/dynamic_input.md
@@ -43,5 +43,5 @@ OpenVINO Model Server accepts several data types that can be handled on [MediaPi
- Next node in the graph uses a calculator that can decode raw KServe request. In such case dynamic input handling must be implemented as part of the calculator logic since model server passes the request to the calculator as-is. Such node expects input stream with a tag starting with `REQUEST` prefix.
-- Next node in the graph uses `PythonExecutorCalculator`. In such case data in the KServe request will be available to the user as input argument of their Python [execute function](https://github.com/openvinotoolkit/model_server/blob/main/docs/python_support/reference.md#ovmspythonmodel-class). Such node expects input stream with a tag starting with `OVMS_PY_TENSOR` prefix.
+- Next node in the graph uses `PythonExecutorCalculator`. In such case data in the KServe request will be available to the user as input argument of their Python [execute function](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/python_support/reference.md#ovmspythonmodel-class). Such node expects input stream with a tag starting with `OVMS_PY_TENSOR` prefix.
diff --git a/docs/dynamic_shape_auto_reload.md b/docs/dynamic_shape_auto_reload.md
index dd2e256b59..91c6b4b1d3 100644
--- a/docs/dynamic_shape_auto_reload.md
+++ b/docs/dynamic_shape_auto_reload.md
@@ -7,7 +7,7 @@ This guide explains how to configure a model to accept input data in different s
Enable dynamic shape via model reloading by setting the `shape` parameter to `auto`. To configure and use the dynamic batch size, take advantage of:
-- Example client in Python [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/face_detection/python/face_detection.py) that can be used to request inference with the desired input shape.
+- Example client in Python [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/face_detection/python/face_detection.py) that can be used to request inference with the desired input shape.
- An example [face_detection_retail_0004](https://github.com/openvinotoolkit/open_model_zoo/blob/releases/2021/4/models/intel/face-detection-retail-0004/README.md) model.
diff --git a/docs/dynamic_shape_binary_inputs.md b/docs/dynamic_shape_binary_inputs.md
index 9bf8dcda81..3324e3f3b5 100644
--- a/docs/dynamic_shape_binary_inputs.md
+++ b/docs/dynamic_shape_binary_inputs.md
@@ -36,9 +36,9 @@ pip3 install ovmsclient
### Download a Sample Image and Label Mappings
```bash
-wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/zebra.jpeg
+wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/common/static/images/zebra.jpeg
-wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/python/classes.py
+wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/common/python/classes.py
```
### Run Inference
diff --git a/docs/dynamic_shape_custom_node.md b/docs/dynamic_shape_custom_node.md
index 9423e45ec8..83a2909ac8 100644
--- a/docs/dynamic_shape_custom_node.md
+++ b/docs/dynamic_shape_custom_node.md
@@ -3,12 +3,12 @@
## Introduction
This guide shows how to configure a simple Directed Acyclic Graph (DAG) with a custom node that performs input resizing before passing input data to the model.
-The node below is provided as a demonstration. See instructions for how to build and use the custom node: [Image Transformation](https://github.com/openvinotoolkit/model_server/tree/main/src/custom_nodes/image_transformation).
+The node below is provided as a demonstration. See instructions for how to build and use the custom node: [Image Transformation](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/custom_nodes/image_transformation).
To run inference with this setup, we will use the following:
-- Example client in Python [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/face_detection/python/face_detection.py) that can be used to request inference on with the desired input shape.
+- Example client in Python [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/face_detection/python/face_detection.py) that can be used to request inference on with the desired input shape.
- An example [face_detection_retail_0004](https://github.com/openvinotoolkit/open_model_zoo/blob/releases/2021/4/models/intel/face-detection-retail-0004/README.md) model.
diff --git a/docs/dynamic_shape_dynamic_model.md b/docs/dynamic_shape_dynamic_model.md
index cf4c0083fe..c0499aed86 100644
--- a/docs/dynamic_shape_dynamic_model.md
+++ b/docs/dynamic_shape_dynamic_model.md
@@ -14,7 +14,7 @@ Another option to use dynamic shape feature is to export the model with dynamic
To the demonstrate dynamic dimensions, take advantage of:
-- Example client in Python [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/main/demos/face_detection/python/face_detection.py) that can be used to request inference with the desired input shape.
+- Example client in Python [face_detection.py](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/face_detection/python/face_detection.py) that can be used to request inference with the desired input shape.
- An example [face_detection_retail_0004](https://github.com/openvinotoolkit/open_model_zoo/blob/releases/2021/4/models/intel/face-detection-retail-0004/README.md) model.
diff --git a/docs/mediapipe.md b/docs/mediapipe.md
index 73f0eb1f15..67c2f97c5e 100644
--- a/docs/mediapipe.md
+++ b/docs/mediapipe.md
@@ -256,8 +256,8 @@ the version parameter is ignored. MediaPipe graphs are not versioned. Though, th
MediaPipe graphs can include only the calculators built-in the model server image.
If you want to add your own mediapipe calculator to OpenVINO Model Server functionality you need to add it as a dependency and rebuild the OpenVINO Model Server binary.
-If you have it in external repository, you need to add the http_archive() definition or git_repository() definition to the bazel [WORKSPACE](https://github.com/openvinotoolkit/model_server/blob/main/WORKSPACE) file.
-Then you need to add the calculator target as a bazel dependency to the [src/BUILD](https://github.com/openvinotoolkit/model_server/blob/main/src/BUILD) file. This should be done for:
+If you have it in external repository, you need to add the http_archive() definition or git_repository() definition to the bazel [WORKSPACE](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/WORKSPACE) file.
+Then you need to add the calculator target as a bazel dependency to the [src/BUILD](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/BUILD) file. This should be done for:
```
cc_library(
diff --git a/docs/mediapipe_conversion.md b/docs/mediapipe_conversion.md
index 24dc01357d..d63649a04a 100644
--- a/docs/mediapipe_conversion.md
+++ b/docs/mediapipe_conversion.md
@@ -190,7 +190,7 @@ input_order_list: ["Identity","Identity_1","Identity_2","Identity_3"]
### 3. Adjust graph input/output streams
This step is required if you plan to deploy the graph in OpenVINO Model Server and existing graph does not have supported input/output packet types. Check for supported input and output packet types [here](./mediapipe.md).
-In that cases you may need to add converter calculators as it was done [here](https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/object_detection/graph.pbtxt#L31).
+In that cases you may need to add converter calculators as it was done [here](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/mediapipe/object_detection/graph.pbtxt#L31).
### 4. Set the config.json file path in the session calculator
diff --git a/docs/metrics.md b/docs/metrics.md
index d18c512f6c..950aa9cf67 100644
--- a/docs/metrics.md
+++ b/docs/metrics.md
@@ -209,7 +209,7 @@ To use data from metrics endpoint you can use the curl command:
```bash
curl http://localhost:8000/metrics
```
-[Example metrics output](https://raw.githubusercontent.com/openvinotoolkit/model_server/main/docs/metrics_output.out)
+[Example metrics output](https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/docs/metrics_output.out)
## Performance considerations
Collecting metrics has negligible performance overhead when used with models of average size and complexity. However when used with very lightweight, fast models which inference time is very short, the metric incrementation can take noticeable proportion of the processing time. Consider it while enabling metrics for such models.
@@ -251,7 +251,7 @@ Exposing custom metrics in calculator implementations (MediaPipe graph nodes) is
With server metrics being scraped by [Prometheus](https://prometheus.io/) it is possible to integrate [Grafana](https://grafana.com/) to visualize them on the dashboards. Once you have Grafana configured with Prometheus as a data source, you can create your own dashboard or import one.
-In OpenVINO Model Server repository you can find [grafana_dashboard.json](https://github.com/openvinotoolkit/model_server/blob/main/extras/grafana_dashboard.json) file that can be used to visualize per model metrics like:
+In OpenVINO Model Server repository you can find [grafana_dashboard.json](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/extras/grafana_dashboard.json) file that can be used to visualize per model metrics like:
- Throughput [RPS] - number of requests being processed by the model per second.
- Mean Latency [ms] - latency averaged across all requests processed by the model in a certain timeframe.
- Latency Quantile [ms] - value of latency for quantiles [0.75, 0.90, 0.99], meaning the latency that has NOT been exceeded by 75%, 90% and 99% of the requests.
diff --git a/docs/model_server_c_api.md b/docs/model_server_c_api.md
index eba4dab99e..45b396b50e 100644
--- a/docs/model_server_c_api.md
+++ b/docs/model_server_c_api.md
@@ -19,7 +19,7 @@ With OpenVINO Model Server 2023.1 release C-API is no longer in preview state an
## API Description
-Server functionalities are encapsulated in shared library built from OpenVINO Model Server source. To include OpenVINO Model Server you need to link this library with your application and use C API defined in [header file](https://github.com/openvinotoolkit/model_server/blob/main/src/ovms.h).
+Server functionalities are encapsulated in shared library built from OpenVINO Model Server source. To include OpenVINO Model Server you need to link this library with your application and use C API defined in [header file](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/src/ovms.h).
Calling a method to start the model serving in your application initiates the OpenVINO Model Server as a separate thread. Then you can schedule inference both directly from app using C API and gRPC/HTTP endpoints.
diff --git a/docs/model_server_grpc_api_kfs.md b/docs/model_server_grpc_api_kfs.md
index 2ed4b0390e..0fd03b60b5 100644
--- a/docs/model_server_grpc_api_kfs.md
+++ b/docs/model_server_grpc_api_kfs.md
@@ -13,7 +13,7 @@ The API includes following endpoints:
* [Inference API](#inference-api)
* [Streaming Inference API](#streaming-inference-api-extension)
-> **NOTE**: Examples of using each of above endpoints can be found in [KServe samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples/README.md).
+> **NOTE**: Examples of using each of above endpoints can be found in [KServe samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples/README.md).
## Server Live API
@@ -57,7 +57,7 @@ Check documentation for more [details](./streaming_endpoints.md).
## See Also
-- [Example client code](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples/README.md) shows how to use GRPC API and REST API.
+- [Example client code](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples/README.md) shows how to use GRPC API and REST API.
- [KServe API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
- [gRPC](https://grpc.io/)
diff --git a/docs/model_server_grpc_api_tfs.md b/docs/model_server_grpc_api_tfs.md
index 73c0a91ffc..15faf67902 100644
--- a/docs/model_server_grpc_api_tfs.md
+++ b/docs/model_server_grpc_api_tfs.md
@@ -18,7 +18,7 @@ Gets information about the status of served models including Model Version
[Get Model Status proto](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/get_model_status.proto) defines three message definitions used while calling Status endpoint: *GetModelStatusRequest*, *ModelVersionStatus*, *GetModelStatusResponse* that are used to report all exposed versions including their state in their lifecycle.
- Read more about [Get Model Status API usage](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md#model-status-api).
+ Read more about [Get Model Status API usage](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/README.md#model-status-api).
## Model Metadata API
@@ -27,7 +27,7 @@ Gets information about the served models. A function called GetModelMetadata acc
[Get Model Metadata proto](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/get_model_metadata.proto) has three message definitions: *SignatureDefMap*, *GetModelMetadataRequest*, *GetModelMetadataResponse*.
-Read more about [Get Model Metadata API usage](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md#model-metadata-api).
+Read more about [Get Model Metadata API usage](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/README.md#model-metadata-api).
## Predict API
@@ -40,13 +40,13 @@ Endpoint for running an inference with loaded models or [DAGs](./dag_scheduler.m
* *PredictResponse* includes a map of outputs serialized by
[TensorProto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor.proto) and information about the used model spec.
-Read more about [Predict API usage](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md#predict-api)
+Read more about [Predict API usage](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/README.md#predict-api)
Also, using `string_val` field it is possible to send binary encoded images that would be preprocessed by OVMS using opencv and converted to OpenVINO-friendly format. For more information check [how binary data is handled in OpenVINO Model Server](./binary_input_tfs.md)
## See Also
-- [Example client code](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md) shows how to use GRPC API and REST API.
+- [Example client code](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/README.md) shows how to use GRPC API and REST API.
- [TensorFlow Serving](https://github.com/tensorflow/serving)
- [gRPC](https://grpc.io/)
diff --git a/docs/model_server_rest_api_kfs.md b/docs/model_server_rest_api_kfs.md
index 3a46c14c79..ad811e825a 100644
--- a/docs/model_server_rest_api_kfs.md
+++ b/docs/model_server_rest_api_kfs.md
@@ -36,7 +36,7 @@ Date: Tue, 09 Aug 2022 09:20:24 GMT
Content-Length: 2
```
-See also [code samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples) for getting server liveness with KServe API on HTTP Server Live endpoint.
+See also [code samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples) for getting server liveness with KServe API on HTTP Server Live endpoint.
## Server Ready API
**Description**
@@ -63,7 +63,7 @@ Date: Tue, 09 Aug 2022 09:22:14 GMT
Content-Length: 2
```
-See also [code samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples) for getting server readiness with KServe API on HTTP Server Ready endpoint.
+See also [code samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples) for getting server readiness with KServe API on HTTP Server Ready endpoint.
## Server Metadata API
**Description**
@@ -103,7 +103,7 @@ $ curl http://localhost:5000/v2
For detailed description of the response contents see [KServe API docs](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#server-metadata).
-See also [code samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples) for getting server metadata with KServe API on HTTP Server Metadata endpoint.
+See also [code samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples) for getting server metadata with KServe API on HTTP Server Metadata endpoint.
## Model Ready API
**Description**
@@ -130,7 +130,7 @@ Date: Tue, 09 Aug 2022 09:25:31 GMT
Content-Length: 2
```
-See also [code samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples) for getting model readiness with KServe API on HTTP Model Ready endpoint.
+See also [code samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples) for getting model readiness with KServe API on HTTP Model Ready endpoint.
@@ -206,7 +206,7 @@ $ curl http://localhost:8000/v2/models/resnet
For detailed description of the response contents see [KServe API docs](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#model-metadata).
-See also [code samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples) for running getting model metadata with KServe API on HTTP Model Metadata endpoint.
+See also [code samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples) for running getting model metadata with KServe API on HTTP Model Metadata endpoint.
## Inference API
**Description**
@@ -373,4 +373,4 @@ For detailed description of request and response contents see [KServe API docs](
> Note: Using //.. at the end of request URI results in truncated path, which might result in different response than expected.
-See also [code samples](https://github.com/openvinotoolkit/model_server/tree/main/client/python/kserve-api/samples) for running inference with KServe API on HTTP Inference endpoint.
+See also [code samples](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/client/python/kserve-api/samples) for running inference with KServe API on HTTP Inference endpoint.
diff --git a/docs/model_server_rest_api_tfs.md b/docs/model_server_rest_api_tfs.md
index 315256acb8..3d83f6857b 100644
--- a/docs/model_server_rest_api_tfs.md
+++ b/docs/model_server_rest_api_tfs.md
@@ -65,7 +65,7 @@ $ curl http://localhost:8001/v1/models/person-detection/versions/1
]
}
```
-Read more about [Get Model Status API usage](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md#model-status-api-1)
+Read more about [Get Model Status API usage](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/README.md#model-status-api-1)
## Model Metadata API
**Description**
@@ -148,7 +148,7 @@ $ curl http://localhost:8001/v1/models/person-detection/versions/1/metadata
}
```
-Read more about [Get Model Metadata API usage](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md#model-metadata-api-1)
+Read more about [Get Model Metadata API usage](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/README.md#model-metadata-api-1)
## Predict API
**Description**
@@ -212,7 +212,7 @@ On the server side, the binary encoded data is loaded using OpenCV which then co
Check [how binary data is handled in OpenVINO Model Server](./binary_input.md) for more information.
-Read more about [Predict API usage](https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md#predict-api-1)
+Read more about [Predict API usage](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/client/python/tensorflow-serving-api/samples/README.md#predict-api-1)
## Config Reload API
**Description**
diff --git a/docs/ovms_quickstart.md b/docs/ovms_quickstart.md
index dcc54e98f4..c6da821ebc 100644
--- a/docs/ovms_quickstart.md
+++ b/docs/ovms_quickstart.md
@@ -95,8 +95,8 @@ ovms --model_name faster_rcnn --model_path model --port 9000
Client scripts are available for quick access to the Model Server. Run an example command to download all required components:
```console
-wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/object_detection/python/object_detection.py
-wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/object_detection/python/requirements.txt
+wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/object_detection/python/object_detection.py
+wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/object_detection/python/requirements.txt
wget https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/master/data/dataset_classes/coco_91cl.txt
```
diff --git a/docs/pull_hf_models.md b/docs/pull_hf_models.md
index d39653be6a..2c093332a4 100644
--- a/docs/pull_hf_models.md
+++ b/docs/pull_hf_models.md
@@ -4,7 +4,7 @@ This document describes how to leverage OpenVINO Model Server (OVMS) pull featur
- pulling pre-configured models in IR format (described below)
- pulling GGUF models from Hugging Face
-- pulling models with automatic conversion and quantization via optimum-cli. Described in the [pulling with conversion](https://github.com/openvinotoolkit/model_server/blob/main/docs/pull_optimum_cli.md)
+- pulling models with automatic conversion and quantization via optimum-cli. Described in the [pulling with conversion](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/pull_optimum_cli.md)
> **Note:** Models in IR format must be exported using `optimum-cli` including tokenizer and detokenizer files also in IR format, if applicable. If missing, tokenizer and detokenizer should be added using `convert_tokenizer --with-detokenizer` tool.
diff --git a/docs/pull_optimum_cli.md b/docs/pull_optimum_cli.md
index 0c046d3932..f226a8075f 100644
--- a/docs/pull_optimum_cli.md
+++ b/docs/pull_optimum_cli.md
@@ -18,7 +18,7 @@ mkdir models
curl -L https://github.com/openvinotoolkit/model_server/releases/download/v2026.0/ovms_windows_python_on.zip -o ovms.zip
tar -xf ovms.zip
ovms\setupvars.bat
-ovms\python\python -m pip install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
+ovms\python\python -m pip install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2026/1/demos/common/export_models/requirements.txt
```
Then use the ovms cli commands described in `Pulling the models` section
diff --git a/docs/python_support/reference.md b/docs/python_support/reference.md
index 7bb5e82a2d..85c4ba1d0b 100644
--- a/docs/python_support/reference.md
+++ b/docs/python_support/reference.md
@@ -25,7 +25,7 @@ RUN pip3 install numpy
ENTRYPOINT [ `/ovms/bin/ovms` ]
```
-You can also modify `requirements.txt` from our [python demos](https://github.com/openvinotoolkit/model_server/tree/main/demos/python_demos) and from repository top level directory run `make python_image`
+You can also modify `requirements.txt` from our [python demos](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/demos/python_demos) and from repository top level directory run `make python_image`
## `OvmsPythonModel` class
@@ -1015,4 +1015,4 @@ node {
}
```
-See a [CLIP demo](https://github.com/openvinotoolkit/model_server/tree/main/demos/python_demos/clip_image_classification) for a complete example of a graph that uses Python nodes, OV Inference nodes and converter nodes.
+See a [CLIP demo](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/demos/python_demos/clip_image_classification) for a complete example of a graph that uses Python nodes, OV Inference nodes and converter nodes.
diff --git a/docs/starting_server.md b/docs/starting_server.md
index f9d35a96e6..980518a3ca 100644
--- a/docs/starting_server.md
+++ b/docs/starting_server.md
@@ -75,7 +75,7 @@ The required Model Server parameters are listed below. For additional configurat
### Starting the GenAI model from Hugging Face directly
-For models outside of OpenVINO organization follow the additional prerequisites described here: [Ovms pull mode](https://github.com/openvinotoolkit/model_server/blob/main/docs/pull_optimum_cli.md)
+For models outside of OpenVINO organization follow the additional prerequisites described here: [Ovms pull mode](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/pull_optimum_cli.md)
In case you do not want to prepare model repository before starting the server and you want to serve model from [HuggingFace] (https://huggingface.co/) you can just run OVMS with:
diff --git a/extras/nginx-mtls-auth/get_model.sh b/extras/nginx-mtls-auth/get_model.sh
index b68049c6b3..ceb721066a 100755
--- a/extras/nginx-mtls-auth/get_model.sh
+++ b/extras/nginx-mtls-auth/get_model.sh
@@ -17,7 +17,7 @@
curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/face-detection-retail-0004/FP32/face-detection-retail-0004.xml https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/face-detection-retail-0004/FP32/face-detection-retail-0004.bin -o model/face-detection-retail-0004.xml -o model/face-detection-retail-0004.bin
-curl --fail --create-dirs https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/people/people1.jpeg -o images/people1.jpeg
+curl --fail --create-dirs https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2026/1/demos/common/static/images/people/people1.jpeg -o images/people1.jpeg
chmod 666 -vR ./images/ ./model/
chmod +x ./images/ ./model/
diff --git a/src/custom_nodes/east_ocr/README.md b/src/custom_nodes/east_ocr/README.md
index a0e9b0c5f0..147889c7ad 100644
--- a/src/custom_nodes/east_ocr/README.md
+++ b/src/custom_nodes/east_ocr/README.md
@@ -7,7 +7,7 @@ DAG pipeline.
Additionally to the detected text boxes, in the two additional outputs are returned their coordinates with information about geometry
and confidence levels for the filtered list of detections.
-**NOTE** Exemplary [configuration file](https://github.com/openvinotoolkit/model_server/blob/main/demos/optical_character_recognition/python/config.json) is available in [optical character recognition demo](https://github.com/openvinotoolkit/model_server/blob/main/demos/optical_character_recognition/python/).
+**NOTE** Exemplary [configuration file](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/optical_character_recognition/python/config.json) is available in [optical character recognition demo](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/optical_character_recognition/python/).
# Building custom node library
diff --git a/src/custom_nodes/face_blur/README.md b/src/custom_nodes/face_blur/README.md
index 21afb00c5d..aa90545a91 100644
--- a/src/custom_nodes/face_blur/README.md
+++ b/src/custom_nodes/face_blur/README.md
@@ -18,7 +18,7 @@ All [OpenVINO Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/tree/
- vehicle-license-plate-detection
- pedestrian-and-vehicle-detector
-**NOTE** Exemplary [configuration file](https://github.com/openvinotoolkit/model_server/blob/main/demos/face_blur/python/config.json) is available in [face_blur demo](https://github.com/openvinotoolkit/model_server/blob/main/demos/face_blur/python/).
+**NOTE** Exemplary [configuration file](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/face_blur/python/config.json) is available in [face_blur demo](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/face_blur/python/).
# Building custom node library
@@ -48,7 +48,7 @@ make BASE_OS=redhat NODES=face_blur
| image | Returns blurred image in place of detected boxes. Boxes are filtered based on confidence_threshold param. Resolution is defined by the node parameters. | `N,C,H,W` | FP32 |
# Custom node parameters
-Parameters can be defined in pipeline definition in OVMS configuration file. [Read more](https://github.com/openvinotoolkit/model_server/blob/main/docs/custom_node_development.md) about node parameters.
+Parameters can be defined in pipeline definition in OVMS configuration file. [Read more](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/docs/custom_node_development.md) about node parameters.
| Parameter | Description | Default | Required |
| ------------- | ------------- | ------------- | ------------ |
| original_image_width | Required input image width | | ✓ |
diff --git a/src/custom_nodes/horizontal_ocr/README.md b/src/custom_nodes/horizontal_ocr/README.md
index 85ae4e2b95..d374413de0 100644
--- a/src/custom_nodes/horizontal_ocr/README.md
+++ b/src/custom_nodes/horizontal_ocr/README.md
@@ -8,7 +8,7 @@ Additionally to the detected text boxes, in the two additional outputs are retur
This custom node can be used to process video frames via [camera example](../../../demos/horizontal_text_detection/python/README.md).
-**NOTE** Exemplary [configuration file](https://github.com/openvinotoolkit/model_server/blob/main/demos/horizontal_text_detection/python/config.json) is available in [demo with camera](https://github.com/openvinotoolkit/model_server/blob/main/demos/horizontal_text_detection/python/).
+**NOTE** Exemplary [configuration file](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/horizontal_text_detection/python/config.json) is available in [demo with camera](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/horizontal_text_detection/python/).
# Building custom node library
diff --git a/src/custom_nodes/image_transformation/README.md b/src/custom_nodes/image_transformation/README.md
index 0ce9cb633c..b520bbe8c6 100644
--- a/src/custom_nodes/image_transformation/README.md
+++ b/src/custom_nodes/image_transformation/README.md
@@ -9,7 +9,7 @@ This custom node takes image with dynamic shape (color, width, height) as an inp
Important to note that this node uses OpenCV for processing so for good performance results prefers NHWC layout.
In other cases conversion applies which reduces performance of this node.
-**NOTE** Exemplary configuration files are available in [onnx model with server preprocessing demo](https://github.com/openvinotoolkit/model_server/tree/main/demos/using_onnx_model/python) and [config with single node](example_config.json).
+**NOTE** Exemplary configuration files are available in [onnx model with server preprocessing demo](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/demos/using_onnx_model/python) and [config with single node](example_config.json).
# Building custom node library
diff --git a/src/custom_nodes/model_zoo_intel_object_detection/README.md b/src/custom_nodes/model_zoo_intel_object_detection/README.md
index 556b858c15..03c4cda4c6 100644
--- a/src/custom_nodes/model_zoo_intel_object_detection/README.md
+++ b/src/custom_nodes/model_zoo_intel_object_detection/README.md
@@ -25,7 +25,7 @@ All [OpenVINO Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/tree/
Public [OpenVINO Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public) object detection models with output tensor shape: `[1, 1, 100, 7]`:
- ssdlite_mobilenet_v2
-**NOTE** Exemplary configuration files are available in [vehicle analysis pipeline demo](https://github.com/openvinotoolkit/model_server/blob/main/demos/horizontal_text_detection/python/config.json) and [multiple faces analysis demo](https://github.com/openvinotoolkit/model_server/blob/main/demos/multi_faces_analysis_pipeline/python/config.json).
+**NOTE** Exemplary configuration files are available in [vehicle analysis pipeline demo](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/horizontal_text_detection/python/config.json) and [multiple faces analysis demo](https://github.com/openvinotoolkit/model_server/blob/releases/2026/1/demos/multi_faces_analysis_pipeline/python/config.json).
# Building custom node library
diff --git a/src/example/SampleCpuExtension/README.md b/src/example/SampleCpuExtension/README.md
index 2b6cf14970..ed6f112de8 100644
--- a/src/example/SampleCpuExtension/README.md
+++ b/src/example/SampleCpuExtension/README.md
@@ -8,7 +8,7 @@ custom extension execution.
## Creating cpu_extension library
-Compile the library by running `make cpu_extension BASE_OS=ubuntu` in root directory of [Model Server repository](https://github.com/openvinotoolkit/model_server/tree/main). The implementation of this library slightly differs from the template in OpenVINO™ repository and can be found in [SampleCpuExtension directory](https://github.com/openvinotoolkit/model_server/tree/main/src/example/SampleCpuExtension).
+Compile the library by running `make cpu_extension BASE_OS=ubuntu` in root directory of [Model Server repository](https://github.com/openvinotoolkit/model_server/tree/main). The implementation of this library slightly differs from the template in OpenVINO™ repository and can be found in [SampleCpuExtension directory](https://github.com/openvinotoolkit/model_server/tree/releases/2026/1/src/example/SampleCpuExtension).
Shared library will be generated in the `lib` folder. Such library can be used to run Model Server, using `--cpu_extension` argument.