Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark Ox::sax_parse as Ractor-safe and add a Ractor-based Ox::Sax example. #278

Merged
merged 1 commit into from
Jul 27, 2021

Conversation

okeeblow
Copy link
Contributor

For #277

I've been experimenting with Ractors since upgrading to Ruby 3.0 for #275 but quickly ran into Ractor::UnsafeError when trying to call Ox::sax_parse in anything except the main Ractor.

Per Ruby's Ractor C extension documentation (link below):
"By default, all C extensions are recognized as Ractor-unsafe. If C extension becomes Ractor-safe,
the extension should call rb_ext_ractor_safe(true) at the Init_ function and all defined method marked as Ractor-safe. Ractor-unsafe C-methods only been called from main-ractor. If non-main ractor calls it, then Ractor::UnsafeError is raised."

I don't like to open seemingly-large feature requests like this without making some attempt at it myself first,
and luckily it seems like Ox::sax_parse Just Works™ since I marked it Ractor-safe, even with the class_cache. Confirming this safety, making any remaining changes to Ox::Sax, and expanding this to the non-Sax parts of Ox are all unfortunately out of my depth as a n00b C coder, so I would appreciate if you could take this over if it interests you. I am happy with just Sax support since I have no current need for marshalling, but I imagine other Ox users wouldn't be satisfied if stratified.

In this commit:

  • Enable rb_ext_ractor_safe preprocessor macro via have_func in extconf.rb.
  • Mark Init_Ox and ox_sax_parse as Ractor -safe.
  • Add a new Ractor-based Ox::Sax example exercising both parallel and serial Ox::Sax handler Ractors to parse data from shared-mime-info XML files many users likely already have on their systems.

Official Ractor info:

Blogs:

Before this change:

[okeeblow@emi#CHECKING-YOU-OUT] time ./bin/checking-you-out ~/224031-dot-jpg
Received /home/okeeblow/.local/share/mime/packages/user-extension-rsrc.xml
/home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `sax_parse': ractor unsafe method called from not main ractor (Ractor::UnsafeError)
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `block in open'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:24:in `block (2 levels) in remember_me'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `loop'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `block in remember_me'
<internal:ractor>:583:in `send': The incoming-port is already closed (Ractor::ClosedError)
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:86:in `block in extended'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each_with_object'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `extended'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `extend'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `<top (required)>'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `require_relative'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `<top (required)>'
        from ./bin/checking-you-out:3:in `require_relative'
        from ./bin/checking-you-out:3:in `<main>'
./bin/checking-you-out ~/224031-dot-jpg  0.12s user 0.05s system 100% cpu 0.168 total

My new Ox::Sax Ractor example script's usage:

[okeeblow@emi#ox] ./examples/sax_ractor.rb
Please provide the path to a `shared-mime-info` XML package and some media-type query arguments (e.g. 'image/jpeg')
[okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml
Please provide some media-type query arguments (e.g. 'image/jpeg')

Finding all-extant types:

[okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml image/jpeg font/ttf application/xhtml+xml image/x-pict
Parallel Ractors
["Worker 0 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]",
 "Worker 1 gave us TrueType 字型 (font/ttf) [.ttf]",
 "Worker 2 gave us XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]",
 "Worker 3 gave us Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]"]

Serial Ractor
"ONLY ONE OX gave us [#<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>, #<CYO TrueType 字型 (font/ttf) [.ttf]>, #<CYO XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]>, #<CYO Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]>]"

…and not finding invalid ones:

[okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml lol/rofl fart/butt image/jpeg
Parallel Ractors
["Worker 0 gave us nothing",
 "Worker 1 gave us nothing",
 "Worker 2 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]"]

Serial Ractor
"ONLY ONE OX gave us [nil, nil, #<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>]"

Unit tests pass.

…ax` example.

For ohler55#277

I've been experimenting with `Ractor`s since upgrading to Ruby 3.0 for ohler55#275
but quickly ran into `Ractor::UnsafeError` when trying to call `Ox::sax_parse` in anything except the main `Ractor`.

Per Ruby's `Ractor` C extension documentation (link below):
"By default, all C extensions are recognized as `Ractor`-unsafe. If C extension becomes `Ractor`-safe,
the extension should call `rb_ext_ractor_safe(true)` at the `Init_` function and all defined method marked as `Ractor`-safe.
`Ractor`-unsafe C-methods only been called from main-ractor. If non-main ractor calls it, then `Ractor::UnsafeError` is raised."

I don't like to open seemingly-large feature requests like this without making some attempt at it myself first,
and luckily it seems like `Ox::sax_parse` Just Works™ since I marked it `Ractor`-safe, even with the `class_cache`.
Confirming this safety, making any remaining changes to `Ox::Sax`, and expanding this to the non-`Sax` parts of `Ox`
are all unfortunately out of my depth as a n00b C coder, so I would appreciate if you could take this over if it interests you.
I am happy with just `Sax` support since I have no current need for marshalling,
but I imagine other `Ox` users wouldn't be satisfied if stratified.

In this commit:
- Enable `rb_ext_ractor_safe` preprocessor macro via `have_func` in `extconf.rb`.
- Mark `Init_Ox` and `ox_sax_parse` as `Ractor` -safe.
- Add a new `Ractor`-based `Ox::Sax` example exercising both parallel and serial `Ox::Sax` handler `Ractor`s
  to parse data from `shared-mime-info` XML files many users likely already have on their systems.

Official `Ractor` info:

- "Ractor: a proposal for a new concurrent abstraction without thread-safety issues": https://bugs.ruby-lang.org/issues/17100 (ruby/ruby#3365)
- Ruby's official `Ractor` documentation: https://docs.ruby-lang.org/en/master/doc/ractor_md.html
- "A way to mark C extensions as thread-safe, Ractor-safe, or unsafe": https://bugs.ruby-lang.org/issues/17307 (ruby/ruby#3824)
- Ruby's C Extension `Ractor` documention covering `rb_ext_ractor_safe`: https://docs.ruby-lang.org/en/master/doc/extension_rdoc.html#label-Appendix+F.+Ractor+support
- A `Ractor` C Extension from the creator of `Ractor` that might serve as a useful example: https://github.com/ko1/ractor-tvar

Blogs:

- "Ractors: Multi-Core Parallel Processing Comes to Ruby 3": https://www.ruby3.dev/ruby-3-fundamentals/2021/01/27/ractors-multi-core-parallel-processing-in-ruby-3/
- "Ruby Ractor Experiments: Safe async communication" :https://ivoanjo.me/blog/2021/02/14/ractor-experiments-safe-async/
- "Playing with Ruby Ractors": https://billy-ruffian.co.uk/playing-with-ruby-ractors/
- "How Fast are Ractors?": https://www.fastruby.io/blog/ruby/performance/how-fast-are-ractors.html (https://github.com/noahgibbs/ractor_basic_benchmarks/tree/main/benchmarks)

Before this change:

```
[okeeblow@emi#CHECKING-YOU-OUT] time ./bin/checking-you-out ~/224031-dot-jpg
Received /home/okeeblow/.local/share/mime/packages/user-extension-rsrc.xml
/home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `sax_parse': ractor unsafe method called from not main ractor (Ractor::UnsafeError)
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `block in open'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:24:in `block (2 levels) in remember_me'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `loop'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `block in remember_me'
<internal:ractor>:583:in `send': The incoming-port is already closed (Ractor::ClosedError)
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:86:in `block in extended'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each_with_object'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `extended'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `extend'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `<top (required)>'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `require_relative'
        from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `<top (required)>'
        from ./bin/checking-you-out:3:in `require_relative'
        from ./bin/checking-you-out:3:in `<main>'
./bin/checking-you-out ~/224031-dot-jpg  0.12s user 0.05s system 100% cpu 0.168 total
```

My new `Ox::Sax` Ractor example script's usage:

```
[okeeblow@emi#ox] ./examples/sax_ractor.rb
Please provide the path to a `shared-mime-info` XML package and some media-type query arguments (e.g. 'image/jpeg')
```

```
[okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml
Please provide some media-type query arguments (e.g. 'image/jpeg')
```

Finding all-extant types:

```
[okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml image/jpeg font/ttf application/xhtml+xml image/x-pict
Parallel Ractors
["Worker 0 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]",
 "Worker 1 gave us TrueType 字型 (font/ttf) [.ttf]",
 "Worker 2 gave us XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]",
 "Worker 3 gave us Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]"]

Serial Ractor
"ONLY ONE OX gave us [#<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>, #<CYO TrueType 字型 (font/ttf) [.ttf]>, #<CYO XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]>, #<CYO Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]>]"
```

…and not finding invalid ones:

```
[okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml lol/rofl fart/butt image/jpeg
Parallel Ractors
["Worker 0 gave us nothing",
 "Worker 1 gave us nothing",
 "Worker 2 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]"]

Serial Ractor
"ONLY ONE OX gave us [nil, nil, #<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>]"
```

Unit tests pass.
@ohler55
Copy link
Owner

ohler55 commented Jul 26, 2021

Let me do a little reading on actor before I merge this.

@ohler55
Copy link
Owner

ohler55 commented Jul 26, 2021

After a quick scan of Ox code it looks like the only part that needs some work to be ractor (thread) safe is the cache code. Adding a mutex to that should take care of that though.

From the ractor information it sounds like rb_ext_ractor_safe need to be called in Init_ox() as you have done and every other method function. Do you know how much overhead there is in the rb_ext_ractor_safe() call? If it is expensive is there a way to check to see if the code is running in a ractor?

@okeeblow
Copy link
Contributor Author

rb_ext_ractor_safe doesn't appear to be expensive at all, just setting a flag as a hoop to jump through to make sure extension authors have tested it. Here's the relevant part of the commit that added the function showing it just looks for the flag and bails out if unset: ruby/ruby@a58c322#diff-f8c174347e6ea8889b5036064a1ff4fe5e7c53a821befa9bdc5ccbf17800a649R2472-L2472

@ohler55
Copy link
Owner

ohler55 commented Jul 26, 2021

The check internal to ruby is fine. I was wondering more about the calling of rb_ext_ractor_safe() itself. It does need to find the ractor for the current thread and set the flag. I guess the only way to find out is to run some benchmarks.

I'm inclined to merge this MR and then make a second pass to finish it before doing a release. Let me look more carefully this evening.

@okeeblow
Copy link
Contributor Author

That check is in the part I linked, not in rb_ext_ractor_safe itself, which just sets the flag: ruby/ruby@a58c322#diff-cb1f5822bd6dce7e2bc5e5ae0927004df17afe0034ed9a69ab34c50beed12034R1016

@ohler55
Copy link
Owner

ohler55 commented Jul 26, 2021

Yes, I got that. I was not concerned about that as there is nothing I can do about that. My concern was over whether rb_ext_ractor_safe() was expensive to call or not. If it is I will have Ox keep track of whether or not a check should be made before calling it.

@okeeblow
Copy link
Contributor Author

Ah I get what you mean. I don't have enough experience with threads in Ruby to have a sense for how expensive GET_THREAD() is in any other context v(._. )v

@ohler55
Copy link
Owner

ohler55 commented Jul 26, 2021

I'll try some benchmarks. Might take a day or two though.

@okeeblow
Copy link
Contributor Author

Take all the time you like. Thanks for your attention to these Ruby 3 features :)

@ohler55
Copy link
Owner

ohler55 commented Jul 26, 2021

Are you part of the Ruby support team?

@okeeblow
Copy link
Contributor Author

Nope, just a happy Ox user with some specific goals for my Ox-using library

@ohler55 ohler55 merged commit 5cc4bff into ohler55:develop Jul 27, 2021
@ohler55
Copy link
Owner

ohler55 commented Jul 27, 2021

A data point for you. When parsing with the new parse on Oj placing a mutex around the cache (string cache so it's used more) takes the performance for 68% of the old to 81% so a significant drop in performance. I think there will need to be a check against the cache option to determine if the code is actually ractor_safe.

@okeeblow
Copy link
Contributor Author

I see. I have a lot to learn in that regard so I appreciate your patience with my amateur C changes :)

@ohler55
Copy link
Owner

ohler55 commented Jul 27, 2021

We are all learning. You've taught me a bit about Ractor. Sharing what we knew help us all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants