Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issues with query strings containing spaces and/or wildcards for Lucene Backend #43

Merged
merged 2 commits into from
Jan 29, 2024

Conversation

Koen1999
Copy link
Contributor

@Koen1999 Koen1999 commented Jan 26, 2024

There are several issues with the Lucene backend currently that this PR fixes:

  • Queries containing whitespace matching against fields are quoted, which results in phrases in the context of Lucene. Phrases cannot contain wildcards, so this breaks other functionality. The solution implemented by this PR is to remove quotes, and to escape whitespace instead. Additionally changes are made to the pipeline such that matching is performed against keyword fields whenever a field type is considered to be a string by ElasticSearch. The only string field that is quoted in the solution is the empty string.
  • Search queries not matching any field are quoted at the moment. As a result, these search queries cannot contain wildcards, so this breaks other functionality. The solution is to replace quotes by wildcards.

Some of these issues were introduced by commit 83afccc in an attempt to fix some of the problems mentioned in #15. This PR should also fix the issues mentioned in #28 and #36.

Attached to this PR, you can find several examples of Sigma rules, and how these are compiled to Lucene Queries. You will find that (given the correct mapping of fieldnames by the pipeline), these Lucene queries will work in accordance with the expectations set by the Sigma syntax.

sigma-rules.zip

Note: Since I do not have a ElasticSearch instance with similar field names as commonly resulting from WinLog Beat, I cannot check which fields are string fields and hence, which fields should be keyword fields in the pipeline. For similar reasons, the field names in the attached Lucene queries are slightly different. Other contributors should check that field names are mapped correctly.

Edit: I realized there is more pipelines that I never heard off. All fields marked as a string field by elasticsearch should also be mapped to the .keyword variant for these pipelines.

@andurin andurin self-assigned this Jan 29, 2024
@andurin andurin merged commit b6a6d58 into SigmaHQ:main Jan 29, 2024
3 checks passed
@andurin
Copy link
Collaborator

andurin commented Jan 29, 2024

Merged. Thank you!

@Koen1999
Copy link
Contributor Author

Note: Since I do not have a ElasticSearch instance with similar field names as commonly resulting from WinLog Beat, I cannot check which fields are string fields and hence, which fields should be keyword fields in the pipeline. For similar reasons, the field names in the attached Lucene queries are slightly different. Other contributors should check that field names are mapped correctly.

@andurin, did you manage to check whether the field mappings were correct and complete? If you make a new release with incorrect mappings, things might break for users. The important thing is that all fields indexed as string by elasticsearch should use the .keyword subfield.

@andurin
Copy link
Collaborator

andurin commented Jan 30, 2024

@Koen1999, that's my current headache issue - Datatyping here is a little bit frustrating.

ES Mapping and extra .keyword fields

Elastic doesn't really dictate which mappings one should use and its supposed to change the way their *beats are doing the mapping.

e.g. a index template from packetbeat 8.7.1:

            "command_line": {
              "fields": {
                "text": {
                  "norms": false,
                  "type": "text"
                }
              },
              "ignore_above": 1024,
              "type": "keyword"
            },

versus packetbeat 8.12.0:

            "command_line": {
              "fields": {
                "text": {
                  "type": "match_only_text"
                }
              },
              "type": "wildcard"
            },

After reviewing the packetbeat "default" template - I'll undo your changes to the pipeline. Those fields are already of type keyword or "wildcard" which is also a keyword family type (https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#wildcard-field-type).

But I guess there is enough room for "wrong" queries in the lucene backend I would like to cover with more and new testcases. I would like to invite you to the discussion - #46.

andurin added a commit that referenced this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants