Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native TEXTFILE reader produces incorrect results when varchars contain the field separator escape character #18215

Closed
alexjo2144 opened this issue Jul 10, 2023 · 2 comments · Fixed by #18265
Labels

Comments

@alexjo2144
Copy link
Member

alexjo2144 commented Jul 10, 2023

Here's a test case:

    @Test
    public void testTextfileFieldSeparator()
    {
        List<String> specialCharacterValues = ImmutableList.of(
                "1, 'a,comma'",
                "2, 'a|pipe'",
                "3, 'an''escaped quote'",
                "4, 'a~null encoding'");

        Session withHadoopReaders = Session.builder(getSession())
                .setCatalogSessionProperty("hive", "json_native_reader_enabled", "false")
                .setCatalogSessionProperty("hive", "text_file_native_reader_enabled", "false")
                .build();
        Session withNativeReaders = Session.builder(getSession())
                .setCatalogSessionProperty("hive", "json_native_reader_enabled", "true")
                .setCatalogSessionProperty("hive", "text_file_native_reader_enabled", "true")
                .build();

        try (TestTable table = new TestTable(
                getQueryRunner()::execute,
                "test_textfile_field_separator",
                "(id INT, varchar_t VARCHAR) WITH (format = 'TEXTFILE', textfile_field_separator=',', textfile_field_separator_escape='|', null_format='~')",
                specialCharacterValues)) {
            assertQuery(withHadoopReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'a,comma'", "VALUES 1");
            assertQuery(withHadoopReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'a|pipe'", "VALUES 2");
            assertQuery(withHadoopReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'an''escaped quote'", "VALUES 3");
            assertQuery(withHadoopReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'a~null encoding'", "VALUES 4");

            assertQuery(withNativeReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'a,comma'", "VALUES 1");
            // This assertion fails
            assertQuery(withNativeReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'a|pipe'", "VALUES 2");
            assertQuery(withNativeReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'an''escaped quote'", "VALUES 3");
            assertQuery(withNativeReaders, "SELECT id FROM " + table.getName() + " WHERE varchar_t = 'a~null encoding'", "VALUES 4");
        }
    }
@alexjo2144
Copy link
Member Author

FYI @dain

@kokosing
Copy link
Member

Could #18254 unblock the release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants