Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[METRICS] get times on queries using UAST functions #606

Closed
ajnavarro opened this issue Nov 6, 2018 · 7 comments
Closed

[METRICS] get times on queries using UAST functions #606

ajnavarro opened this issue Nov 6, 2018 · 7 comments
Assignees
Labels
research Something that requires research

Comments

@ajnavarro
Copy link
Contributor

  • Get times from several queries using UAST functions (from examples or custom ones, try to get as most of real use cases as possible)
  • Output size of that queries (the total amount of bytes sent to the client)
  • Speed sending data to clients
  • Difference between call to UAST function and call to BBLFSH GRPC
@ajnavarro ajnavarro added the research Something that requires research label Nov 6, 2018
@erizocosmico erizocosmico self-assigned this Nov 7, 2018
@erizocosmico
Copy link
Contributor

erizocosmico commented Nov 8, 2018

Because of a bblfshd issue it's impossible to gather data from a big dataset. So I had to resort to getting this data from a much much smaller dataset. It has to be executed with --parallelism=1, so it would have been terribly slow anyway. From the 3 queries used, one with uast, one with uast_extract and one with uast_xpath, two queries use 200 rows and the other 1k rows.

Query 1

SELECT f.file_path, uast(f.blob_content, language(f.file_path))
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200

Results

Total sent to client: 13590150 bytes
Total Bytes Parsed: 1447903 bytes
Total Bytes Received: 13581842 bytes
Total Failed UASTs: 0
Total Parsed UASTs: 200
Total Time bblfsh: 60.87s
Total Time UAST: 66.17s
Total Time UDF: 5.30s
AVG Time UAST: 0.30s
AVG Time bblfsh: 0.33s

Query 2

SELECT f.file_path, uast_xpath(uast(f.blob_content, language(f.file_path)), "//uast:Identifier")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200

Results

Total sent to client: 5875504
Total Bytes Parsed: 1447903
Total Bytes Received: 13581842
Total Failed UASTs: 0
Total Parsed UASTs: 200
Total Time bblfsh: 44.778349917 s
Total Time UAST: 48.356236486 s
Total Time UDF: 3.577886569 s
AVG Time UAST: 0.223891749585 s
AVG Time bblfsh: 0.24178118243 s
Total Time UAST xpath: 56.925413308 s
AVG Time UAST xpath: 0.28462706654000003 s

Query 3

SELECT f.file_path, uast_extract(uast(f.blob_content, language(f.file_path), "//uast:Block"), "@pos")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000

Results

Total sent to client: 6679540
Total Bytes Parsed: 19912429
Total Bytes Received: 103448009
Total Failed UASTs: 0
Total Parsed UASTs: 1000
Total Time bblfsh: 493.207216871 s
Total Time UAST: 1348.310618189 s
Total Time UDF: 855.103401318 s
AVG Time UAST: 0.493207216871 s
AVG Time bblfsh: 1.348310618189 s
Total Time extract UAST: 701.984547075 s
AVG Time extract UAST: 0.7825914683110368 s

@bzz
Copy link
Contributor

bzz commented Nov 26, 2018

@erizocosmico nice! On

Because of a bblfshd issue it's impossible to gather data from a big dataset.

if you could provide a link to the issue I'll be happy to take a look from the bblfshd side.

@erizocosmico
Copy link
Contributor

@bzz this issue bblfsh/bblfshd#209

@ajnavarro
Copy link
Contributor Author

Moving to TODO to give a second chance with new bblfsh versions.

@erizocosmico
Copy link
Contributor

Query 1

SELECT f.file_path, uast(f.blob_content, language(f.file_path))
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200

Results

Bytes sent to client: 13581842
Bytes sent to bblfsh: 1447903
Bytes received from bblfsh: 13579609
Failed UASTs: 0
Parsed UASTs: 200
Time bblfsh (s): 42.442334624
Time UAST (s): 42.46458593
Time UDF (s): 46.354474774
Time marshal (s): 3.883643922
Time unmarshal (s): 0
AVG bblfsh (s): 0.21221167312
AVG UAST (s): 0.21232292965
AVG UDF (s): 0.23177237387000002
AVG marshal (s): 0.01941821961
AVG unmarshal (s): 0

Query 2

SELECT f.file_path, uast(f.blob_content, language(f.file_path))
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000

Results

Can't make it work with bblfshd.

Query 3

SELECT f.file_path, uast_xpath(uast(f.blob_content, language(f.file_path)), "//uast:Identifier")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200

Results

Bytes sent to client: 13581842
Bytes sent to bblfsh: 1447903
Bytes received from bblfsh: 13579609
Failed UASTs: 0
Parsed UASTs: 200
Time bblfsh (s): 54.802263305
Time UAST (s): 54.808935807
Time UDF (s): 59.802896322
Time marshal (s): 4.987847189
Time unmarshal (s): 4.735422825
Time XPath (s): 4.526671017
AVG bblfsh (s): 0.274011316525
AVG UAST (s): 0.27404467903499996
AVG UDF (s): 0.29901448161
AVG marshal (s): 0.024939235945
AVG unmarshal (s): 0.023677114125
AVG XPath (s): 0.022633355085

Query 4

SELECT f.file_path, uast_xpath(uast(f.blob_content, language(f.file_path)), "//uast:Identifier")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000

Results

Can't be done due to bblfsh issues.

Query 5

SELECT f.file_path, uast_extract(uast(f.blob_content, language(f.file_path), "//uast:Block"), "@pos")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000

Results

Bytes sent to client: 14865664
Bytes sent to bblfsh: 2185858
Bytes received from bblfsh: 20417400
Failed UASTs: 0
Parsed UASTs: 300
Time bblfsh (s): 58.495291385
Time UAST (s): 58.568144499
Time UDF (s): 69.460720492
Time marshal (s): 6.406234967
Time unmarshal (s): 5.563922599
Time XPath (s): 4.478470163
AVG bblfsh (s): 0.19498430461666666
AVG UAST (s): 0.19522714833
AVG UDF (s): 0.23153573497333332
AVG marshal (s): 0.021354116556666667
AVG unmarshal (s): 0.018546408663333333
AVG XPath (s): 0.014928233876666667

Slightly faster than before. But the issue with large datasets still persists. Queries last hours, nothing gets processed, lots of failures, etc. So we can only take metrics for little datasets like this. Whenever I run it with limit 1000 it starts failing.

@ajnavarro
Copy link
Contributor Author

Thanks a lot @erizocosmico

@erizocosmico
Copy link
Contributor

I pushed the metrics code to feature/metrics-bblfsh on my fork in case we need it again. Cause I deleted it the first time and had to do it again 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research Something that requires research
Projects
None yet
Development

No branches or pull requests

3 participants