Skip to content

read_csv from filehandle, when turned into a view, stops working with No files found that match the pattern "DUCKDB_INTERNAL_OBJECTSTORE://... #477

@nickzoic

Description

@nickzoic

What happens?

cursor.read_csv(filehandle) returns a DuckDBPyRelation object on which you can call .to_view(viewname) but the view isn't usable later, once the returned DuckDBPyRelation object has gone out of scope. If you try to do something like cursor.read_csv(filehandle).to_view('viewname') then it doesn't work at all.

This doesn't seem to be a problem for opening a csv by filename, or for relations made into tables, just for csvs opened from filehandles and made into views. I think I can understand why it's happening, but it is

(In case you're wondering, I'm opening files from filehandles as a workaround for duckdb/duckdb#12232 ... so more typically with bzip2.open(filename) as fh: cursor.read_csv(fh).to_view(viewname) or similar but using a StringIO makes for a simpler demo to reproduce.)

To Reproduce

import duckdb
from io import StringIO

cursor = duckdb.connect()

csv_file = StringIO("foo,bar\nhello,world")
rel = cursor.read_csv(csv_file)
rel.to_view("view1")
print(rel.alias)
print(cursor.sql("select * from view1"))

csv_file = StringIO("foo,bar\nhello,world")
cursor.read_csv(csv_file).to_view("view2")
print(cursor.sql("select * from view2"))

The first way works, the second way doesn't:

$ python x.py 
DUCKDB_INTERNAL_OBJECTSTORE://b38cc260dcc16094
┌─────────┬─────────┐
│   foo   │   bar   │
│ varchar │ varchar │
├─────────┼─────────┤
│ hello   │ world   │
└─────────┴─────────┘

Traceback (most recent call last):
  File "/home/nick/Work/wehi/countess/x.py", line 14, in <module>
    print(cursor.sql("select * from view2"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
duckdb.duckdb.IOException: IO Error: No files found that match the pattern "DUCKDB_INTERNAL_OBJECTSTORE://ce4251be44deb137"

It also fails with the same error if rel is deleted or goes out of scope before the SQL query of the view. Note also that read_csv(filename).to_view(viewname) works fine.

OS:

Linux 6.8.0 x86_64

DuckDB Package Version:

1.5.3 from pypi

Also source build duckdb-python 1.6.0-dev45 @ ab63b5f
w/ duckdb v1.5.2-4685-g01eda16d6e

Python Version:

3.12.3

Full Name:

Nick Moore

Affiliation:

Mnemote Pty Ltd

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release 1.5.3 also 1.3.1
I have tested with 1.6.0.dev45 @ ab63b5f

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions