Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store fingerprinting scripts #7

Open
cooperq opened this issue Mar 24, 2015 · 7 comments
Open

Store fingerprinting scripts #7

cooperq opened this issue Mar 24, 2015 · 7 comments

Comments

@cooperq
Copy link

cooperq commented Mar 24, 2015

It would be really great to be able to store a copy of all the scripts identified as fingerprinting scripts. That way we could see if any scripts are commonly being used by different attackers. This could also help us come up with heuristics if people are using similar tactics across the board.

@ghostwords
Copy link
Owner

Sorry for the late reply. Could you elaborate on "if any scripts are commonly being used by different attackers" a bit? Do you see us parsing script contents somehow?

@cooperq
Copy link
Author

cooperq commented Apr 24, 2015

I mean, just a sha sum would do the trick. I think it's also worth reverse engineering any popular scripts to think about how we can build heuristics to detect them.

@ghostwords
Copy link
Owner

Absolutely!

Hashing: Ah, cool, that would help us in cases the same script goes by different filenames or is used by different domains. Perhaps we could also strip comments/whitespace when hashing to allow for trivial differences.

@cooperq
Copy link
Author

cooperq commented Apr 24, 2015

I think stripping comments and whitespace is a great idea. This at least lets us discover if there are standard FP scripts floating around, which I suspect there are. Many people were using the same script for canvas based FP.

@gunesacar
Copy link

In addition to detecting common scripts, this could be very useful for post-crawl analysis. While going through the crawl results, we had many cases where suspicious scripts were changed, taken offline or simply missing on the pages once they were found to present.

Also, I think simhash and MOSS can be very useful for finding near-duplicate scripts. In addition to comments and whitespaces, scripts may include unique identifiers, timestamps or different endpoint URLs. As long as the scripts have very similar content, simhash would give the same digest and MOSS would give a very high similarity score.

@cooperq
Copy link
Author

cooperq commented Apr 29, 2015

great ideas @gunesacar

@ghostwords
Copy link
Owner

Being able to access response bodies through the WebRequest API in Chrome will make this much easier to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants