Skip to content

[Fixes #14000] Refact extraMetadata#14286

Open
etj wants to merge 4 commits into
masterfrom
14000_extrametadata_refact
Open

[Fixes #14000] Refact extraMetadata#14286
etj wants to merge 4 commits into
masterfrom
14000_extrametadata_refact

Conversation

@etj

@etj etj commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Checklist

Reviewing is a process done by project maintainers, mostly on a volunteer basis. We try to keep the overhead as small as possible and appreciate if you help us to do so by completing the following items. Feel free to ask in a comment if you have troubles with any of them.

For all pull requests:

  • Confirm you have read the contribution guidelines
  • You have sent a Contribution Licence Agreement (CLA) as necessary (not required for small changes, e.g., fixing typos in the documentation)
  • Make sure the first PR targets the master branch, eventual backports will be managed later. This can be ignored if the PR is fixing an issue that only happens in a specific branch, but not in newer ones.

The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):

  • There is a ticket in https://github.com/GeoNode/geonode/issues describing the issue/improvement/feature (a notable exemption is, changes not visible to end-users)
  • The issue connected to the PR must have Labels and Milestone assigned
  • PR for bug fixes and small new features are presented as a single commit
  • PR title must be in the form "[Fixes #<issue_number>] Title of the PR"
  • New unit tests have been added covering the changes, unless there is an explanation on why the tests are not necessary/implemented

Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.

@etj etj self-assigned this Jun 1, 2026
Copilot AI review requested due to automatic review settings June 1, 2026 15:16
@cla-bot cla-bot Bot added the cla-signed CLA Bot: community license agreement signed label Jun 1, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request deprecates the legacy ExtraMetadata model and migrates its storage to SparseField entries, introducing a backward-compatible adapter to maintain the deprecated /extra_metadata/ endpoint and serializer field. The review feedback highlights several important improvement opportunities in the adapter, such as preventing potential TypeError and AttributeError exceptions when handling non-dictionary metadata, avoiding log pollution from deprecation warnings during bulk serialization, and returning a 400 Bad Request error instead of silently skipping oversized metadata entries.

Comment on lines +96 to +100
try:
metadata = json.loads(sparse_field.value) if sparse_field.value else {}
except (json.JSONDecodeError, TypeError):
metadata = {}
return {**{"id": sparse_field.pk}, **metadata}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If metadata is not a dictionary (e.g., if it is parsed as a list, string, or integer, or if it is None), unpacking it with **metadata will raise a TypeError. We should explicitly verify that metadata is a dictionary before unpacking to prevent potential 500 server errors.

Suggested change
try:
metadata = json.loads(sparse_field.value) if sparse_field.value else {}
except (json.JSONDecodeError, TypeError):
metadata = {}
return {**{"id": sparse_field.pk}, **metadata}
try:
metadata = json.loads(sparse_field.value) if sparse_field.value else {}
except (json.JSONDecodeError, TypeError):
metadata = {}
if not isinstance(metadata, dict):
metadata = {}
return {**{"id": sparse_field.pk}, **metadata}

Comment on lines +212 to +218
for sf in qs:
try:
meta = json.loads(sf.value) if sf.value else {}
except (json.JSONDecodeError, TypeError):
continue
if str(meta.get(key)) == str(value):
filtered.append(sf)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If meta is not a dictionary (e.g., if it is a list or string), calling meta.get(key) will raise an AttributeError, causing a 500 server error. We should verify that meta is a dictionary before attempting to access its keys.

Suggested change
for sf in qs:
try:
meta = json.loads(sf.value) if sf.value else {}
except (json.JSONDecodeError, TypeError):
continue
if str(meta.get(key)) == str(value):
filtered.append(sf)
for sf in qs:
try:
meta = json.loads(sf.value) if sf.value else {}
if not isinstance(meta, dict):
continue
except (json.JSONDecodeError, TypeError):
continue
if str(meta.get(key)) == str(value):
filtered.append(sf)

Comment on lines +149 to +151
def get_attribute(self, instance):
warnings.warn(DEPRECATION_REASON, DeprecationWarning, stacklevel=2)
logger.warning(DEPRECATION_REASON)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling logger.warning inside get_attribute will flood the logs on bulk serialization (e.g., when listing resources). It is better to rely on warnings.warn to signal deprecation, avoiding log pollution.

Suggested change
def get_attribute(self, instance):
warnings.warn(DEPRECATION_REASON, DeprecationWarning, stacklevel=2)
logger.warning(DEPRECATION_REASON)
def get_attribute(self, instance):
warnings.warn(DEPRECATION_REASON, DeprecationWarning, stacklevel=2)

Comment on lines +251 to +256
if len(value) > SPARSE_FIELD_VALUE_MAX_LENGTH:
logger.warning(
"extra_metadata entry skipped: serialized value exceeds "
f"{SPARSE_FIELD_VALUE_MAX_LENGTH} characters"
)
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently skipping metadata entries that exceed the maximum length can lead to silent data loss and make debugging difficult for API clients. It is highly recommended to return a 400 Bad Request response with an informative error message instead of silently continuing.

Comment on lines +295 to +300
if len(value) > SPARSE_FIELD_VALUE_MAX_LENGTH:
logger.warning(
"extra_metadata entry skipped: serialized value exceeds "
f"{SPARSE_FIELD_VALUE_MAX_LENGTH} characters"
)
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently skipping metadata entries that exceed the maximum length can lead to silent data loss and make debugging difficult for API clients. It is highly recommended to return a 400 Bad Request response with an informative error message instead of silently continuing.

This comment was marked as outdated.

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 71.15987% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.84%. Comparing base (91c4ec5) to head (4297467).
⚠️ Report is 19 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #14286      +/-   ##
==========================================
- Coverage   74.92%   74.84%   -0.09%     
==========================================
  Files         975      979       +4     
  Lines       59902    60384     +482     
  Branches     8157     8244      +87     
==========================================
+ Hits        44884    45192     +308     
- Misses      13194    13349     +155     
- Partials     1824     1843      +19     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@etj etj requested a review from Copilot June 3, 2026 14:40
@etj etj force-pushed the 14000_extrametadata_refact branch 2 times, most recently from b315c76 to 48e7442 Compare June 3, 2026 14:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Comment on lines +253 to +269
continue
meta_dict.pop("id", None)
value = json.dumps(meta_dict)
if len(value) > SPARSE_FIELD_VALUE_MAX_LENGTH:
return Response(
{"detail": f"serialized value exceeds {SPARSE_FIELD_VALUE_MAX_LENGTH} characters"}, status=400
)
name = _next_sparse_name(resource)
try:
SparseField.objects.create(resource=resource, name=name, value=value)
except IntegrityError:
# Concurrent request created the same name; retry with UUID
name = f"{SPARSE_FIELD_PREFIX}{uuid.uuid4().hex[:12]}"
SparseField.objects.create(resource=resource, name=name, value=value)

result = [_sparse_to_legacy(sf) for sf in _sparse_fields_for_resource(resource)]
return Response(result, status=201)
Comment thread geonode/documents/api/tests.py Outdated
Comment on lines +144 to +145
def test_creation_should_rase_exec_for_unsupported_files(self):
self.client.force_login(self.admin)
Comment thread geonode/documents/api/tests.py Outdated
Comment on lines +610 to +613
def test_creation_from_url_should_create_the_doc(self):
"""
If file_path is not available, should raise error
"""
Comment on lines +268 to +292
for filter_key, filter_value in metadata_filters.items():
# Extract field name from "metadata__fieldname"
if not filter_key.startswith("metadata__"):
continue

found_metadata_filter = True
field_name = filter_key[len("metadata__") :]

# Query SparseFields for this field name
sparse_fields = SparseField.objects.filter(name=field_name)

batch_pks = set()
for sf in sparse_fields:
try:
# Try to parse as JSON if it looks like JSON
stored_value = json.loads(sf.value) if sf.value and sf.value.startswith("{") else sf.value
except (json.JSONDecodeError, TypeError):
stored_value = sf.value

# Compare values (convert to string for consistent comparison)
if str(stored_value) == str(filter_value):
batch_pks.add(sf.resource.pk)

filtered_pks.update(batch_pks)

Comment thread geonode/documents/api/tests.py Outdated
"extension": "jpeg",
}
}
actual = self.client.post(self.url, data=payload, format="json")
Comment on lines +179 to +183
detail=True,
methods=["get", "put", "delete", "post"],
url_path=r"extra_metadata",
url_name="extra-metadata",
)
@etj etj force-pushed the 14000_extrametadata_refact branch 2 times, most recently from 57c60c9 to 98776ad Compare June 3, 2026 16:11
@etj etj requested a review from Copilot June 3, 2026 16:31

This comment was marked as outdated.

@etj etj force-pushed the 14000_extrametadata_refact branch 4 times, most recently from adad877 to 4e975f2 Compare June 5, 2026 15:14
@etj

etj commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

gemini-code-assist[bot]

This comment was marked as outdated.

@etj etj force-pushed the 14000_extrametadata_refact branch from 4e975f2 to 4297467 Compare June 5, 2026 16:08
@etj

etj commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the legacy ExtraMetadata model with SparseField entries to store custom metadata, introducing backward-compatible adapters and a migration script to handle existing data. The code review highlights several critical issues: a logic bug in filter_sparse_fields that changes query filtering from AND to OR logic, potential N+1 query performance bottlenecks in both the sparse field filtering and the deprecated metadata POST endpoint, a prefiltering bug when dealing with boolean or null values, and database performance concerns in the migration script that could be resolved by using batched bulk_create instead of get_or_create in a loop.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +248 to +315
def filter_sparse_fields(self, queryset, metadata_filters):
"""
Filter queryset by sparse field values (metadata custom fields).

Queryset is filtered by interrogating SparseField entries that match
the given metadata filter specifications.

Args:
queryset: ResourceBase queryset to filter
metadata_filters: dict with keys like "metadata__key" and values to match

Returns:
Filtered queryset containing only resources with matching sparse fields
"""
if not metadata_filters:
return queryset

filtered_pks = set()
found_metadata_filter = False

for filter_key, filter_value in metadata_filters.items():
# Extract field name from "metadata__fieldname"
if not filter_key.startswith("metadata__"):
continue

found_metadata_filter = True
field_name = filter_key[len("metadata__") :]

# Text prefilter to reduce the set to deserialize; the actual
# semantic match is performed after json.loads.
sparse_fields = (
SparseField.objects.filter(name__startswith="extra_")
.filter(value__icontains=field_name)
.filter(value__icontains=str(filter_value))
)

batch_pks = set()
for sf in sparse_fields:
try:
# ExtraMetadata was a JSONfield
stored_value = json.loads(sf.value) if sf.value and sf.value.startswith("{") else sf.value
except (json.JSONDecodeError, TypeError):
logger.warning(
f"Bad migrated ExtraMetadata into SparseField: {sf.name} for resource {sf.resource.id}:{sf.resource.title}"
)
continue

if not isinstance(stored_value, dict):
logger.warning(
f"Unexpected non-dict value in SparseField: {sf.name} for resource {sf.resource.id}:{sf.resource.title}"
)
continue

# Compare values in a type-agnostic way.
if str(stored_value.get(field_name, None)) == str(filter_value):
batch_pks.add(sf.resource.pk)

filtered_pks.update(batch_pks)

# Filter by the collected PKs
# If we processed metadata filters, return only resources with matching values
if found_metadata_filter:
filtered = queryset.filter(pk__in=filtered_pks) if filtered_pks else queryset.none()
else:
# No metadata filters found, return queryset as-is
filtered = queryset

return filtered

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are multiple critical issues in the filter_sparse_fields implementation:

  1. Logic Bug (AND vs OR): The original implementation used Django's .filter(**metadata_filters) which combined multiple metadata filters using AND logic. The new implementation accumulates PKs across all filters into filtered_pks using filtered_pks.update(batch_pks), which effectively changes the logic to OR. This will return incorrect results when filtering by multiple metadata fields.
  2. N+1 Queries: Accessing sf.resource.id and sf.resource.pk inside the loop triggers a database query for each SparseField record because select_related("resource") is not used. Using sf.resource_id directly avoids these queries.
  3. Boolean/None Prefilter Bug: If filter_value is a boolean (e.g., True), str(filter_value) evaluates to "True". However, the JSON string stored in value contains "true" (lowercase). This causes the icontains prefilter to fail to find any records. Booleans and None should be mapped to their JSON string representations ("true", "false", "null").
    def filter_sparse_fields(self, queryset, metadata_filters):
        """
        Filter queryset by sparse field values (metadata custom fields).

        Queryset is filtered by interrogating SparseField entries that match
        the given metadata filter specifications.

        Args:
            queryset: ResourceBase queryset to filter
            metadata_filters: dict with keys like "metadata__key" and values to match

        Returns:
            Filtered queryset containing only resources with matching sparse fields
        """
        if not metadata_filters:
            return queryset

        filtered_pks = None
        found_metadata_filter = False

        for filter_key, filter_value in metadata_filters.items():
            # Extract field name from "metadata__fieldname"
            if not filter_key.startswith("metadata__"):
                continue

            found_metadata_filter = True
            field_name = filter_key[len("metadata__") :]

            if isinstance(filter_value, bool):
                prefilter_val = "true" if filter_value else "false"
            elif filter_value is None:
                prefilter_val = "null"
            else:
                prefilter_val = str(filter_value)

            # Text prefilter to reduce the set to deserialize; the actual
            # semantic match is performed after json.loads.
            sparse_fields = (
                SparseField.objects.filter(name__startswith="extra_")
                .filter(value__icontains=field_name)
                .filter(value__icontains=prefilter_val)
            )

            batch_pks = set()
            for sf in sparse_fields:
                try:
                    # ExtraMetadata was a JSONfield
                    stored_value = json.loads(sf.value) if sf.value and sf.value.startswith("{") else sf.value
                except (json.JSONDecodeError, TypeError):
                    logger.warning(
                        f"Bad migrated ExtraMetadata into SparseField: {sf.name} for resource {sf.resource_id}"
                    )
                    continue

                if not isinstance(stored_value, dict):
                    logger.warning(
                        f"Unexpected non-dict value in SparseField: {sf.name} for resource {sf.resource_id}"
                    )
                    continue

                # Compare values in a type-agnostic way.
                if str(stored_value.get(field_name, None)) == str(filter_value):
                    batch_pks.add(sf.resource_id)

            if filtered_pks is None:
                filtered_pks = batch_pks
            else:
                filtered_pks.intersection_update(batch_pks)

        # Filter by the collected PKs
        # If we processed metadata filters, return only resources with matching values
        if found_metadata_filter:
            filtered = queryset.filter(pk__in=filtered_pks) if filtered_pks else queryset.none()
        else:
            # No metadata filters found, return queryset as-is
            filtered = queryset

        return filtered

Comment on lines +9 to +33
def migrate_extrametadata_to_sparsefields(apps, schema_editor):
"""
Migrate ExtraMetadata to SparseField.
Each ExtraMetadata object (with its JSON dict) is stored as a single
SparseField entry with name 'extra_<pk>' and value as JSON string.
Entries whose serialized JSON exceeds 1024 characters are skipped with a warning.
"""
ExtraMetadata = apps.get_model("base", "ExtraMetadata")
SparseField = apps.get_model("metadata", "SparseField")

for extra_meta in ExtraMetadata.objects.select_related("resource").iterator():
name = f"extra_{extra_meta.pk}"
value = json.dumps(extra_meta.metadata)
if len(value) > 1024:
logger.warning(
f"ExtraMetadata pk={extra_meta.pk} for resource {extra_meta.resource.id}:{extra_meta.resource.title} "
f"skipped during migration to SparseField: "
f"serialized value exceeds 1024 characters"
)
continue
SparseField.objects.get_or_create(
resource=extra_meta.resource,
name=name,
defaults={"value": value},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using SparseField.objects.get_or_create inside a loop executes a SELECT and potentially an INSERT query for every single ExtraMetadata record. On large production databases with thousands of records, this will cause significant migration delays and potential timeouts. Using bulk_create with batching is highly recommended for performance and efficiency.

def migrate_extrametadata_to_sparsefields(apps, schema_editor):
    """
    Migrate ExtraMetadata to SparseField.
    Each ExtraMetadata object (with its JSON dict) is stored as a single
    SparseField entry with name 'extra_<pk>' and value as JSON string.
    Entries whose serialized JSON exceeds 1024 characters are skipped with a warning.
    """
    ExtraMetadata = apps.get_model("base", "ExtraMetadata")
    SparseField = apps.get_model("metadata", "SparseField")

    to_create = []
    for extra_meta in ExtraMetadata.objects.select_related("resource").iterator():
        name = f"extra_{extra_meta.pk}"
        value = json.dumps(extra_meta.metadata)
        if len(value) > 1024:
            logger.warning(
                f"ExtraMetadata pk={extra_meta.pk} for resource {extra_meta.resource.id}:{extra_meta.resource.title} "
                f"skipped during migration to SparseField: "
                f"serialized value exceeds 1024 characters"
            )
            continue
        to_create.append(
            SparseField(
                resource=extra_meta.resource,
                name=name,
                value=value,
            )
        )
        if len(to_create) >= 1000:
            SparseField.objects.bulk_create(to_create, ignore_conflicts=True)
            to_create = []
    if to_create:
        SparseField.objects.bulk_create(to_create, ignore_conflicts=True)

Comment on lines +107 to +133
def _next_sparse_name(resource):
"""Generate the next available ``extra_<N>`` name.

Falls back to a UUID-based suffix if a name collision occurs (e.g. under
concurrent requests), keeping the ``extra_`` prefix intact.
The numeric part is still preferred for readability.
"""
existing = (
SparseField.objects.filter(
resource=resource,
name__startswith=SPARSE_FIELD_PREFIX,
)
.order_by("-name")
.values_list("name", flat=True)
)
max_n = 0
for name in existing:
suffix = name[len(SPARSE_FIELD_PREFIX) :]
try:
max_n = max(max_n, int(suffix))
except (ValueError, TypeError):
pass
candidate = f"{SPARSE_FIELD_PREFIX}{max_n + 1}"
# Guard against a race: if the candidate already exists, use a UUID suffix
if SparseField.objects.filter(resource=resource, name=candidate).exists():
candidate = f"{SPARSE_FIELD_PREFIX}{uuid.uuid4().hex[:12]}"
return candidate

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To avoid N+1 queries when creating multiple metadata entries, _next_sparse_name should accept an optional list of existing_names that are already loaded in memory.

def _next_sparse_name(resource, existing_names=None):
    """Generate the next available ``extra_<N>`` name.

    Falls back to a UUID-based suffix if a name collision occurs (e.g. under
    concurrent requests), keeping the ``extra_`` prefix intact.
    The numeric part is still preferred for readability.
    """
    if existing_names is None:
        existing_names = list(
            SparseField.objects.filter(
                resource=resource,
                name__startswith=SPARSE_FIELD_PREFIX,
            ).values_list("name", flat=True)
        )
    max_n = 0
    for name in existing_names:
        suffix = name[len(SPARSE_FIELD_PREFIX) :]
        try:
            max_n = max(max_n, int(suffix))
        except (ValueError, TypeError):
            pass
    candidate = f"{SPARSE_FIELD_PREFIX}{max_n + 1}"
    # Guard against a race: if the candidate already exists, use a UUID suffix
    if candidate in existing_names or SparseField.objects.filter(resource=resource, name=candidate).exists():
        candidate = f"{SPARSE_FIELD_PREFIX}{uuid.uuid4().hex[:12]}"
    return candidate

Comment on lines +236 to +278
@staticmethod
def _extra_metadata_post(request, resource):
data = request.data
if isinstance(data, str):
try:
data = json.loads(data)
except (json.JSONDecodeError, TypeError):
return Response(
{"detail": "Invalid JSON payload."},
status=400,
)

if not isinstance(data, list):
return Response(
{"detail": "Expected a JSON list of metadata objects."},
status=400,
)

cleaned = []
for meta_dict in data:
if not isinstance(meta_dict, dict):
logger.warning(f"Skipping non-dict metadata entry in POST payload: {meta_dict}")
continue
meta_dict = dict(meta_dict) # avoid mutating caller input
meta_dict.pop("id", None)
value = json.dumps(meta_dict)
if len(value) > SPARSE_FIELD_VALUE_MAX_LENGTH:
return Response(
{"detail": f"serialized value exceeds {SPARSE_FIELD_VALUE_MAX_LENGTH} characters"}, status=400
)
cleaned.append(value)

for value in cleaned:
name = _next_sparse_name(resource)
try:
SparseField.objects.create(resource=resource, name=name, value=value)
except IntegrityError:
# Concurrent request created the same name; retry with UUID
name = f"{SPARSE_FIELD_PREFIX}{uuid.uuid4().hex[:12]}"
SparseField.objects.create(resource=resource, name=name, value=value)

result = [_sparse_to_legacy(sf) for sf in _sparse_fields_for_resource(resource)]
return Response(result, status=201)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In _extra_metadata_post, _next_sparse_name is called inside a loop, which executes a database query on every iteration. Pre-fetching the existing names once before the loop and passing them to _next_sparse_name avoids these N+1 queries.

    @staticmethod
    def _extra_metadata_post(request, resource):
        data = request.data
        if isinstance(data, str):
            try:
                data = json.loads(data)
            except (json.JSONDecodeError, TypeError):
                return Response(
                    {"detail": "Invalid JSON payload."},
                    status=400,
                )

        if not isinstance(data, list):
            return Response(
                {"detail": "Expected a JSON list of metadata objects."},
                status=400,
            )

        cleaned = []
        for meta_dict in data:
            if not isinstance(meta_dict, dict):
                logger.warning(f"Skipping non-dict metadata entry in POST payload: {meta_dict}")
                continue
            meta_dict = dict(meta_dict)  # avoid mutating caller input
            meta_dict.pop("id", None)
            value = json.dumps(meta_dict)
            if len(value) > SPARSE_FIELD_VALUE_MAX_LENGTH:
                return Response(
                    {"detail": f"serialized value exceeds {SPARSE_FIELD_VALUE_MAX_LENGTH} characters"}, status=400
                )
            cleaned.append(value)

        existing_names = list(
            SparseField.objects.filter(
                resource=resource,
                name__startswith=SPARSE_FIELD_PREFIX,
            ).values_list("name", flat=True)
        )
        for value in cleaned:
            name = _next_sparse_name(resource, existing_names=existing_names)
            existing_names.append(name)
            try:
                SparseField.objects.create(resource=resource, name=name, value=value)
            except IntegrityError:
                # Concurrent request created the same name; retry with UUID
                name = f"{SPARSE_FIELD_PREFIX}{uuid.uuid4().hex[:12]}"
                SparseField.objects.create(resource=resource, name=name, value=value)

        result = [_sparse_to_legacy(sf) for sf in _sparse_fields_for_resource(resource)]
        return Response(result, status=201)

Comment thread geonode/api/tests.py
)

# Verify sparse field was created
_sf_check = SparseField.objects.filter(resource=_r, name="extra_1")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINOR: the update_or_create return the new object and a boolean saying if the object was created or not, so this part is not required. it should be:

obj, created = SparseField.objects.update_or_create(
            resource=_r,
            name="extra_1",
            defaults={"value": json.dumps({"category": "category"})},
        )
self.assertTrue(created, "SparseField should have been created")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sf_check is also used for first()

Comment thread geonode/base/utils.py
import re
import json
import logging
from schema import Schema

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the schema==0.7.7 dependency from the pyproject.toml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 5.1.x cla-signed CLA Bot: community license agreement signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants