Fixed issues with query strings containing spaces and/or wildcards for Lucene Backend #43

Koen1999 · 2024-01-26T13:36:45Z

There are several issues with the Lucene backend currently that this PR fixes:

Queries containing whitespace matching against fields are quoted, which results in phrases in the context of Lucene. Phrases cannot contain wildcards, so this breaks other functionality. The solution implemented by this PR is to remove quotes, and to escape whitespace instead. Additionally changes are made to the pipeline such that matching is performed against keyword fields whenever a field type is considered to be a string by ElasticSearch. The only string field that is quoted in the solution is the empty string.
Search queries not matching any field are quoted at the moment. As a result, these search queries cannot contain wildcards, so this breaks other functionality. The solution is to replace quotes by wildcards.

Some of these issues were introduced by commit 83afccc in an attempt to fix some of the problems mentioned in #15. This PR should also fix the issues mentioned in #28 and #36.

Attached to this PR, you can find several examples of Sigma rules, and how these are compiled to Lucene Queries. You will find that (given the correct mapping of fieldnames by the pipeline), these Lucene queries will work in accordance with the expectations set by the Sigma syntax.

sigma-rules.zip

Note: Since I do not have a ElasticSearch instance with similar field names as commonly resulting from WinLog Beat, I cannot check which fields are string fields and hence, which fields should be keyword fields in the pipeline. For similar reasons, the field names in the attached Lucene queries are slightly different. Other contributors should check that field names are mapped correctly.

Edit: I realized there is more pipelines that I never heard off. All fields marked as a string field by elasticsearch should also be mapped to the .keyword variant for these pipelines.

andurin · 2024-01-29T13:30:56Z

Merged. Thank you!

Koen1999 · 2024-01-29T14:01:04Z

Note: Since I do not have a ElasticSearch instance with similar field names as commonly resulting from WinLog Beat, I cannot check which fields are string fields and hence, which fields should be keyword fields in the pipeline. For similar reasons, the field names in the attached Lucene queries are slightly different. Other contributors should check that field names are mapped correctly.

@andurin, did you manage to check whether the field mappings were correct and complete? If you make a new release with incorrect mappings, things might break for users. The important thing is that all fields indexed as string by elasticsearch should use the .keyword subfield.

andurin · 2024-01-30T16:23:47Z

@Koen1999, that's my current headache issue - Datatyping here is a little bit frustrating.

ES Mapping and extra .keyword fields

Elastic doesn't really dictate which mappings one should use and its supposed to change the way their *beats are doing the mapping.

e.g. a index template from packetbeat 8.7.1:

            "command_line": {
              "fields": {
                "text": {
                  "norms": false,
                  "type": "text"
                }
              },
              "ignore_above": 1024,
              "type": "keyword"
            },

versus packetbeat 8.12.0:

            "command_line": {
              "fields": {
                "text": {
                  "type": "match_only_text"
                }
              },
              "type": "wildcard"
            },

After reviewing the packetbeat "default" template - I'll undo your changes to the pipeline. Those fields are already of type keyword or "wildcard" which is also a keyword family type (https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#wildcard-field-type).

But I guess there is enough room for "wrong" queries in the lucene backend I would like to cover with more and new testcases. I would like to invite you to the discussion - #46.

Koen1999 added 2 commits January 26, 2024 14:21

Fixed issues with query strings contianing spaces for Lucene Backend

53f7aec

Fixed tests

5fc5de7

Koen1999 force-pushed the main branch from da3af77 to 5fc5de7 Compare January 26, 2024 14:54

Koen1999 mentioned this pull request Jan 26, 2024

Lucene Rule Generation Quotation Mark Issue #36

Closed

andurin self-assigned this Jan 29, 2024

andurin merged commit b6a6d58 into SigmaHQ:main Jan 29, 2024
3 checks passed

andurin added a commit that referenced this pull request Jan 30, 2024

Manually revert .keyword changes from #43

4792d29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed issues with query strings containing spaces and/or wildcards for Lucene Backend #43

Fixed issues with query strings containing spaces and/or wildcards for Lucene Backend #43

Koen1999 commented Jan 26, 2024 •

edited

Loading

andurin commented Jan 29, 2024

Koen1999 commented Jan 29, 2024

andurin commented Jan 30, 2024

Fixed issues with query strings containing spaces and/or wildcards for Lucene Backend #43

Fixed issues with query strings containing spaces and/or wildcards for Lucene Backend #43

Conversation

Koen1999 commented Jan 26, 2024 • edited Loading

andurin commented Jan 29, 2024

Koen1999 commented Jan 29, 2024

andurin commented Jan 30, 2024

ES Mapping and extra .keyword fields

Koen1999 commented Jan 26, 2024 •

edited

Loading