Skip to content

Update the embed_content in RecommendationsAdapter #4455

@akolson

Description

@akolson

Overview

This task involves updating embed_content in RecommendationsAdapter to be able to get all file URLs for each node from which textual content will be extracted.

Description and outcomes

  • Update the embed_content in RecommendationsAdapter
    • The embed_content accepts a list of nodes(ContentNode) as a parameter.
    • For each node, find all file URLs to be extracted
    • Use kind and preset fields in ContentNode and File models respectively to determine which file URLs to extract.
    • Currently, all studio files are store in this bucket
  • Finding file URLs
    • Audio files (mp3)
      • Return the corresponding URL(s)
    • Video files (mp4, web)
      • Return the corresponding subtitle URL(s) if they exist, else return corresponding URL(s) for the actual video files
    • HTML files (html5)
      • HTML files are uploaded as zip files and extracted into this bucket
      • Return the corresponding URL(s) of the extracted zip location.
    • H5P files (h5p)
      • Return the corresponding URL(s)
    • ZIM files (zim)
      • Return the corresponding URL(s)
    • Document files (pdf, epub)
      • Return the corresponding URL(s)
    • Exercise files (Perseus)
      • Return the corresponding URL(s)
  • Making a request
    • Make a request to the recommendations backend. For example
      body = {
         'resources': resources,
         'metadata': {}
      }
      embed_content_request = EmbedContentRequest(
         headers={},  # Leaving this to allow for passing of headers to external api
         params={},  # Same for this
         body=body
      )
      return self.backend.make_request(embed_content_request)
      
      Where resources is the updated embed_content_request.json

Acceptance Criteria

  • All file URLs associated with the passed nodes in embed_content are extracted correctly

Assumptions and Dependencies

Scope

The scope of this task is limited to;

  • Updating embed_content to gather all file URLs required for content extraction.

Accessibility Requirements

NA

Resources

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions