Hi,
- What versions are you using?
con.version: 23.26.1.1.0
platform.platform: macOS-26.2-arm64-arm-64bit
sys.maxsize > 2**32: True
platform.python_version: 3.12.12
oracledb.__version__: 3.4.2
- Is it an error or a hang or a crash?
Error
- What error(s) or behavior you are seeing?
import pandas as pd
import oracledb
with oracledb.connect(...) as con:
cur = con.cursor()
cur.execute('create private temporary table ora$ptt_mytest (a int) on commit drop definition')
df = pd.DataFrame({'A': [0, 1]}).convert_dtypes(dtype_backend='pyarrow')
df = df.iloc[1:2]
print('Inserted:\n', df)
cur.executemany('insert into ora$ptt_mytest (a) values (:a)', df)
print('Selected:\n', pd.DataFrame.from_arrow(con.fetch_df_all('select * from ora$ptt_mytest')))
Both "inserted" and "selected" dataframes should be the same, but they are different
Inserted:
A
1 1
Selected:
A
0 0.0
We have isolated the problem to be only when the dataframe is some kind of view of another one.
Here we are creating a dataframe with 2 rows, and then calling df.iloc[1:2], which selects only the second row.
But instead, oracledb inserted the first row.
This does not happen if we (i believe)
- don't call
.convert_dtypes(), because then the Arrow PyCapsule interface creates new buffers
- round trip the dataframe e.g. through a file, so that we are sure it owns its data
This leads me to believe that when I call df.iloc[1:2], the result will point to the same Arrow buffer with an offset, and this offset is ignored by oracledb (but length is not ignored, because only 1 row from the buffer is inserted).
We ran into this problem with manual batching, i.e.
big_df: pd.DataFrame
batches = [big_df.iloc[i:i+1000] for i in range(0, len(big_df), 1000)]
for b in batches:
cur.executemany('INSERT ...', b)
which kept inserting the first batch again and again.
This may be a problem with Pandas not correctly exporting Arrow PyCapsule, but probably no, because polars.from_arrow(df) gives the correct result.
- Does your application call init_oracle_client()?
No, using thin mode.
Hi,
Error
Both "inserted" and "selected" dataframes should be the same, but they are different
We have isolated the problem to be only when the dataframe is some kind of view of another one.
Here we are creating a dataframe with 2 rows, and then calling
df.iloc[1:2], which selects only the second row.But instead, oracledb inserted the first row.
This does not happen if we (i believe)
.convert_dtypes(), because then the Arrow PyCapsule interface creates new buffersThis leads me to believe that when I call
df.iloc[1:2], the result will point to the same Arrow buffer with an offset, and this offset is ignored by oracledb (but length is not ignored, because only 1 row from the buffer is inserted).We ran into this problem with manual batching, i.e.
which kept inserting the first batch again and again.
This may be a problem with Pandas not correctly exporting Arrow PyCapsule, but probably no, because
polars.from_arrow(df)gives the correct result.No, using thin mode.