Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depending on pyyaml can make things slow. #128

Open
jonathan-s opened this issue Jun 29, 2023 · 4 comments
Open

Depending on pyyaml can make things slow. #128

jonathan-s opened this issue Jun 29, 2023 · 4 comments
Assignees
Labels
optimization Improvement of performance or usability

Comments

@jonathan-s
Copy link

I ran some profiling on our code that runs pysigma. It turns out that a significant amount of time is spent in pyyaml. Would you be open to accept a PR using? https://pypi.org/project/ryaml/

I'm thinking that the best solution, if it turns out to be faster, would be to have pip install pysigma use pyyaml and pip install pysigma[rust] (or similar) to use ryaml.

Screenshot 2023-06-29 at 11 14 18
@jonathan-s
Copy link
Author

I have confirmed that the part parsing and loading yaml have completely disappeared, and takes a lot less time using ryaml.

@thomaspatzke
Copy link
Member

Sounds great! It really takes a while to parse the whole SigmaHQ rule repository, so if there's the possibility to speed up, lets do it!

One thing that's customized with YAML parsing is the check for duplicate keys, an error that is sometimes made in writing Sigma rules. Is this something that can also be done with ryaml or is it possibly already there?

@jonathan-s
Copy link
Author

@thomaspatzke, one thing that I bumped into, that you may have some insight into. There are several versions of the yaml spec. Version 1.1 uses on, yes and represents these as booleans. Version 1.2 of that yaml spec drops these and only uses true | True | TRUE | false | False | FALSE. The ryaml package only supports version 1.2 of the yaml spec.

Do you have any hunch to what extent this is a common practice to use the prior writing of booleans among people who write sigma rules?
 
Either way, that is one change that will be made if a user chooses to use the rust version.

@thomaspatzke
Copy link
Member

I think we should stick on the 1.2 standard and only accept the boolean values supported there. I just see the version 2 specification already refers to YAML 1.2.2, therefore a rule that is compliant to the specification shouldn't contain version 1.1 YAML.

@thomaspatzke thomaspatzke added the enhancement New feature or request label Sep 1, 2023
@thomaspatzke thomaspatzke added optimization Improvement of performance or usability and removed enhancement New feature or request labels Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Improvement of performance or usability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants