Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial async API and uring usage #4078

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vertexclique
Copy link

@vertexclique vertexclique commented Jan 27, 2024

Rationale of the change

I selected most frictionless bit to change. It was fs read/write.
I didn't improve read that much but I focused on write. Writes are improved as you saw below. I am 100% sure vectored writes are even faster.

Tests are passing locally. No public API change except async_read module rename and NucleiReader rename. Tbh, NucleiReader rename is also bad. I wanted to call it AsyncReader but it will get confused with AsyncRead trait.

Some Roadmap Items

  • Slowly move to futures-util and futures-lite.
  • Migrate towards pin-project.
  • Make all interfaces to use futures::Async* drop all Tokio read-likes.
  • Operate over raw byte types like &[u8] and &mut [u8] drop Bytes.
  • slowly phase out all spawn_ code in actual async context so user should do these. It is not our problem to manage async io on behalf of the user. Don't incorporate runtime's hardcoded methods if not necessary. (some storage APIs are actually require it, we still keep for them.)
  • Write Makefile or tasks to easily execute benchmarks.
  • Debunk what do we want to have from uring backend: low latency (more resource consumption like cpu), high latency less resource hungry? What do we want?
  • Other backends like epoll and kqueue should be selectable by features.
  • migrate tests, benchmarks and such to nuclei attributes like #[nuclei::test] , #[nuclei::bench] etc.
  • Add Reader benchmarks with more contention, it is already obvious that all of them improved but it will shine on harder reads. Larger concurrency than 16 concurrent read tasks are needed.

Performance

Full benchmark results against master:
https://gist.github.com/vertexclique/cd1307d38210bec2d5e3b768502daef0

If bufs fits in memory we are getting immense performance:

exact_buf_write/256 KiB time:   [71.903 ns 72.184 ns 72.523 ns]
                        thrpt:  [3366.4 GiB/s 3382.2 GiB/s 3395.4 GiB/s]
                 change:
                        time:   [-37.945% -37.559% -37.162%] (p = 0.00 < 0.05)
                        thrpt:  [+59.140% +60.151% +61.147%]
                        Performance has improved.

Bench PDF: exact_buf_write_256 KiB - Criterion.rs.pdf

@Xuanwo
Copy link
Member

Xuanwo commented Apr 30, 2024

We have a discussion thread at https://discord.com/channels/1081052318650339399/1200638279939260436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants