Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 35 additions & 2 deletions r/tests/testthat/test-dplyr-join.R
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,41 @@ test_that("Error handling for unsupported expressions in join_by", {
)
})

# TODO: test duplicate col names
# TODO: casting: int and float columns?
test_that("joins with duplicate column names", {
# When column names are duplicated (not in by), suffixes are added
left_dup <- tibble::tibble(
x = 1:5,
y = 1:5,
z = letters[1:5]
)
right_dup <- tibble::tibble(
x = 1:5,
y = 6:10,
z = LETTERS[1:5]
)

compare_dplyr_binding(
.input |>
left_join(right_dup, by = "x") |>
collect(),
left_dup
)

compare_dplyr_binding(
.input |>
inner_join(right_dup, by = "x") |>
collect(),
left_dup
)

# Test with custom suffixes
compare_dplyr_binding(
.input |>
left_join(right_dup, by = "x", suffix = c("_left", "_right")) |>
collect(),
left_dup
)
})
Comment on lines +218 to +225
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test especially is effectively duplicated by

test_that("suffix", {
left_suf <- Table$create(
key = c(1, 2),
left_unique = c(2.1, 3.1),
shared = c(10.1, 10.3)
)
right_suf <- Table$create(
key = c(1, 2, 3, 10, 20),
right_unique = c(1.1, 1.2, 3.1, 4.1, 4.3),
shared = c(20.1, 30, 40, 50, 60)
)
join_op <- inner_join(left_suf, right_suf, by = "key", suffix = c("_left", "_right"))
output <- collect(join_op)
res_col_names <- names(output)
expected_col_names <- c("key", "left_unique", "shared_left", "right_unique", "shared_right")
expect_equal(expected_col_names, res_col_names)
})

Though I will admit that the tests there do not follow the established pattern you follow here. Would you mind updating merging these two tests together? You can use your fixture for that purpose if you like, but if you do, would you mind adding a column with floats and also making the key column more obvious (e.g. by naming it key or to_join or something like that?

I'm happy to help describe in more detail if that's helpful!


test_that("right_join", {
compare_dplyr_binding(
Expand Down
Loading