Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

blank_issues_enabled: false
Expand Down
2 changes: 1 addition & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

# To get started with Dependabot version updates, you'll need to specify which
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/autoupdate.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

name: autoupdate
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/codacy.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

# This workflow uses actions that are not certified by GitHub.
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/dependabot.autoapprove.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

name: Auto approve
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/dependabot.automerge.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

name: automerge
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/format.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

name: Google Java Format
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr-title-checker.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

name: "PR Title Checker"
Expand Down
2 changes: 1 addition & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

# default image used by all pipelines if not overridden per job
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

repos:
Expand Down
2 changes: 1 addition & 1 deletion cicd/.gitlab-ci-lib.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

# This is a shared library with common util gitlab "jobs" that can be reused across any project
Expand Down
2 changes: 1 addition & 1 deletion cicd/build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash -e
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
Expand Down
2 changes: 1 addition & 1 deletion cicd/docker_parse.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

# Parses a Docker image name and extracts information such as the registry, username, repository, and tag.
Expand Down
2 changes: 1 addition & 1 deletion cicd/docker_push_vdk.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

# This Bash script provides a utility to push a containers image to multiple container registries.
Expand Down
2 changes: 1 addition & 1 deletion cicd/notify.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash -e
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0

curl -X POST -H "Content-type: application/json" --data "{\"text\":\"$1\"}" $2
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from vdk.api.job_input import IJobInput

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import json
import pathlib
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import json
import pathlib
Expand Down
2 changes: 1 addition & 1 deletion examples/airflow-example/dags/airflow_example_dag.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from datetime import datetime

Expand Down
2 changes: 1 addition & 1 deletion examples/chunker/00_properties.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
# Copyright 2021-2024 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
Expand Down
2 changes: 1 addition & 1 deletion examples/chunker/10_chunk_data.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
# Copyright 2021-2024 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
Expand Down
2 changes: 1 addition & 1 deletion examples/chunker/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
# Copyright 2021-2024 VMware, Inc.
# SPDX-License-Identifier: Apache-2.0
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import logging
import os
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import logging
import os
Expand Down
2 changes: 1 addition & 1 deletion examples/confluence-reader/confluence_document.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0


Expand Down
2 changes: 1 addition & 1 deletion examples/confluence-reader/fetch_confluence_space.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import logging
import os
Expand Down
2 changes: 1 addition & 1 deletion examples/dag-example/example-dag/example_dag.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from vdk.plugin.dag.dag_runner import DagInput

Expand Down
2 changes: 1 addition & 1 deletion examples/dag-example/ingest-job1/10_insert_data.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import json
import pathlib
Expand Down
2 changes: 1 addition & 1 deletion examples/dag-example/ingest-job2/10_insert_data.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import json
import pathlib
Expand Down
2 changes: 1 addition & 1 deletion examples/dag-example/read-data-job/10_read.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from vdk.api.job_input import IJobInput

Expand Down
2 changes: 1 addition & 1 deletion examples/dag-with-args-example/dag-job/dag_job.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from vdk.plugin.dag.dag_runner import DagInput

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import json
import pathlib
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import json
import pathlib
Expand Down
2 changes: 1 addition & 1 deletion examples/dag-with-args-example/read-job-canada/10_read.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from vdk.api.job_input import IJobInput

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from vdk.api.job_input import IJobInput

Expand Down
2 changes: 1 addition & 1 deletion examples/dag-with-args-example/read-job-usa/10_read.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
from vdk.api.job_input import IJobInput

Expand Down
2 changes: 1 addition & 1 deletion examples/energy/dashboard.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import math

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import inspect
import logging
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import inspect
import logging
Expand Down
2 changes: 1 addition & 1 deletion examples/gdp-execution-id-example/20_python_step.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023-2025 Broadcom
# Copyright 2023-2026 Broadcom
# SPDX-License-Identifier: Apache-2.0
import logging

Expand Down
74 changes: 74 additions & 0 deletions examples/incremental-ingest-from-db-example-notebook/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Incremental Ingestion using Job Properties - Colab Notebook

This folder contains a Google Colab notebook that demonstrates incremental ingestion from a local SQLite database using Versatile Data Kit (VDK).

## Overview

The notebook provides an interactive tutorial on how to:
- Use VDK's job properties to track state between job runs
- Perform incremental ingestion (only ingesting new/changed records)
- Configure VDK for SQLite database connections
- Use VDK's IPython extension in a notebook environment

## Contents

- `incremental-ingest-example-notebook.ipynb` - The Google Colab notebook with step-by-step instructions

## Usage

### Option 1: Run on Google Colab (Recommended)

Click the "Open in Colab" badge at the top of the notebook to run it directly in Google Colab without any local setup.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vmware/versatile-data-kit/blob/main/examples/incremental-ingest-from-db-example-notebook/incremental-ingest-example-notebook.ipynb)

### Option 2: Run Locally

1. Install dependencies:
```bash
pip install vdk-ipython vdk-sqlite vdk-ingest-http
```

2. Open the notebook in Jupyter:
```bash
jupyter notebook incremental-ingest-example-notebook.ipynb
```

## What You'll Learn

1. How to set up a source SQLite database with sample data
2. How to configure VDK environment variables for database connections
3. How to use the `%%vdksql` magic command for SQL operations
4. How to implement incremental ingestion using job properties
5. How to use `job_input.get_property()` and `job_input.set_all_properties()`
6. How to ingest tabular data using `send_tabular_data_for_ingestion()`

## Key Concepts

### Incremental Ingestion

Incremental ingestion is a pattern where only new or changed records are processed, rather than reloading the entire dataset each time. This is achieved by:

- Storing a "watermark" (like the last processed date) in job properties
- Querying only records newer than the watermark
- Updating the watermark after successful ingestion

### Job Properties

VDK provides a key-value store for persisting state between job runs:

- `get_property(key, default_value)` - Retrieve a property value
- `set_all_properties(dict)` - Store multiple property values

## Related Resources

- [Original Example (File-based)](https://github.com/vmware/versatile-data-kit/tree/main/examples/incremental-ingest-from-db-example)
- [VDK Wiki - Job Properties](https://github.com/vmware/versatile-data-kit/wiki/Job-Properties)
- [VDK Wiki - Getting Started](https://github.com/vmware/versatile-data-kit/wiki/Getting-Started)
- [Issue #3060](https://github.com/vmware/versatile-data-kit/issues/3060) - This notebook addresses this issue

## Contributing

This notebook was created as part of the [Google Colab Notebooks for VDK Examples](https://github.com/vmware/versatile-data-kit/milestone/29) milestone.

See the main [CONTRIBUTING.md](../../CONTRIBUTING.md) for contribution guidelines.
Loading
Loading