Skip to content

Function can't be used as column name for :on option in join/3 #1088

@viniciussbs

Description

@viniciussbs

I was testing dynamic column names on some functions like select, group_by and join.

select and group_by

It's already possible to use functions and regular expressions with select and group_by:

require Explorer.DataFrame, as: DF

df = DF.new([
      %{foo_xyz: "bar", id: 1, price: 1000},
      %{foo_xyz: "baz", id: 2, price: 5000},
      %{foo_xyz: "bar", id: 3, price: 3000}
    ])

df
|> DF.group_by(&String.starts_with?(&1, "foo_"))
|> DF.summarise(price: mean(price))
|> DF.select(~r/^(foo_.*|price)$/)
+--------------------------------------------+
| Explorer DataFrame: [rows: 2, columns: 2]  |
+-----------------------+--------------------+
|        foo_xyz        |       price        |
|       <string>        |       <f64>        |
+=======================+====================+
| bar                   | 2.0e3              |
+-----------------------+--------------------+
| baz                   | 5.0e3              |
+-----------------------+--------------------+

join

It falis when you try to use them with join:

foo_df = DF.new([
      %{foo_xyz: "bar", price: 2000},
      %{foo_xyz: "baz", price: 5000},
      %{foo_xyz: "bar", price: 2000}
    ])

df
|> DF.join(foo_df, on: &String.starts_with?(&1, "foo_"), how: :outer)
|> DF.discard(~r/_right$/)
** (FunctionClauseError) no function clause matching in anonymous fn/2 in Explorer.Shared.to_existing_columns/3    
    
    The following arguments were given to anonymous fn/2 in Explorer.Shared.to_existing_columns/3:
    
        # 1
        #Function<42.39164016/1 in :erl_eval.expr/6>
    
        # 2
        nil
    
    (explorer 0.10.1) lib/explorer/shared.ex:187: anonymous fn/2 in Explorer.Shared.to_existing_columns/3
    (elixir 1.17.2) lib/enum.ex:1829: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (explorer 0.10.1) lib/explorer/shared.ex:187: Explorer.Shared.to_existing_columns/3
    (explorer 0.10.1) lib/explorer/data_frame.ex:5240: anonymous fn/3 in Explorer.DataFrame.join/3
    (elixir 1.17.2) lib/enum.ex:1703: Enum."-map/2-lists^map/1-1-"/2
    (explorer 0.10.1) lib/explorer/data_frame.ex:5233: Explorer.DataFrame.join/3
    #cell:rob7fvisapm5gynn:8: (file)


It looks like the argument passed to :on is wrapped in a list before being passed down to Explorer.Shared.to_existing_columns/3 - according to current implementation. I don't know it this is a bug or expected behaviour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions