Skip to content

In a multi repo the create_external_models command has problems with models being defined in one project and selected from in another project #5326

@blecourt-private

Description

@blecourt-private

Problem statement

In this multi repo setup the model silver.c from repo_2 selects from the model bronze.a which is defined in repo_1.

It seems that sqlmesh create_external_models treats bronze.a as an external model for repo_2 and fails to discover that the model is in fact defined in repo_1.

SQLMesh version: 0.216.0

Reproducible example

Using forementioned multi repo setup:

cd  sqlmesh/examples/multi

Trying to create external models:

sqlmesh -p repo_1 -p repo_2 create_external_models

Output:

Initializing new project state...
[WARNING] Unable to get schema for '"db"."bronze"."a"': 'Catalog Error: Table with name a does not exist!
Did you mean "information_schema.tables"?

LINE 1: DESCRIBE "db"."bronze"."a"

Despite including both projects SQLMesh doesn't infer that bronze.a is defined within the multi-repo.

Planning first:

sqlmesh -p repo_1 -p repo_2 plan

Output:

[WARNING] Linter warnings for /home/statsdb/projects/sqlmesh/examples/multi/repo_1/models/b.sql:
 - nomissingdescription: All models should be documented.
 - Line 1: nomissingaudits - Model `audits` must be configured to test data quality.
 - Line 7: noselectstar - Query should not contain SELECT * on its outer most projections, even if it can be expanded.
[WARNING] Linter warnings for /home/statsdb/projects/sqlmesh/examples/multi/repo_1/models/a.sql:
 - nomissingdescription: All models should be documented.
 - Line 1: nomissingaudits - Model `audits` must be configured to test data quality.

`prod` environment will be initialized

Environment statements:

before_all:                                                                                                                                                                                   
  + CREATE TABLE IF NOT EXISTS before_1 AS select @one()                                                                                                                                      
  + CREATE TABLE IF NOT EXISTS before_2 AS select @two()                                                                                                                                      
                                                                                                                                                                                              
after_all:                                                                                                                                                                                    
  + CREATE TABLE IF NOT EXISTS after_1 AS select @dup()                                                                                                                                       
  + CREATE TABLE IF NOT EXISTS after_2 AS select @dup()                                                                                                                                       
                                                                                                                                                                                              
Models:
??? Added:
    ??? bronze.a
    ??? bronze.b
    ??? silver.c
    ??? silver.d
    ??? silver.e
Models needing backfill:
??? bronze.a: [full refresh]
??? bronze.b: [full refresh]
??? silver.c: [full refresh]
??? silver.d: [full refresh]
??? silver.e: [full refresh]
Apply - Backfill Tables [y/n]: y
[1/1] bronze.a   [full refresh]                            0.10s   
[1/1] bronze.b   [full refresh]                            0.10s   
[1/1] silver.c   [full refresh]                            0.10s   
[1/1] silver.e   [full refresh]                            0.11s   
[1/1] silver.d   [full refresh]                            0.10s   
Executing model batches ???????????????????????????????????????? 100.0% ? 5/5 ? 0:00:00                                                                                                       
                                                                                                                                                                                              
? Model batches executed

Updating virtual layer  ???????????????????????????????????????? 100.0% ? 5/5 ? 0:00:00

? Virtual layer updated

Now creating external models:

sqlmesh -p repo_1 -p repo_2 create_external_models

Output:

[WARNING] The following models already exist and can't be converted to external: "db"."bronze"."a". Perhaps these models have been removed, while downstream models that reference them 
weren't updated accordingly.

Again - despite including both projects SQLMesh doesn't infer that bronze.a is defined within the multi-repo.

Creating external models for repo_2 only:

sqlmesh -p repo_2 create_external_models

There is no terminal output. The file external_models.yaml just contains []. So in this case SQLMesh correctly decides that bronze.a is not an external model, presumably because bronze.a is present in the state of the prod environment - but shouldn't the command that includes both projects be able to do the same?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions