Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FileDescriptor.Pipe() #58

Merged
merged 17 commits into from
Sep 27, 2021
Merged

Conversation

GeorgeLyon
Copy link
Contributor

See #57 for discussion.

@milseman
Copy link
Contributor

milseman commented Aug 5, 2021

@lorentey I haven't looked into pipes too deeply, but this seems like the right direction. There are some minor details I'm not sure about, such as an initializer backed by a syscall vs a static create method (just haven't thought it through), etc. WDYT?

@milseman milseman mentioned this pull request Aug 5, 2021
@GeorgeLyon GeorgeLyon changed the title Implement FileDescriptor.Pipe() Implement FileDescriptor.Pipe Aug 5, 2021
@GeorgeLyon GeorgeLyon changed the title Implement FileDescriptor.Pipe Implement FileDescriptor.Pipe() Aug 5, 2021
Copy link
Member

@lorentey lorentey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be very welcome addition! 👍

I think I'd prefer if we went with a simple pipe() method, unless there is a good reason to have the Pipe type that I'm not seeing.

Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileHelpers.swift Outdated Show resolved Hide resolved
Sources/System/FileHelpers.swift Outdated Show resolved Hide resolved
@GeorgeLyon
Copy link
Contributor Author

I've addressed some of @lorentey's feedback. I still think a separate structure is warranted for the reasons listed here: #58 (comment), but let me know if you disagree.

@karwa
Copy link

karwa commented Aug 31, 2021

I think the "fileDescriptor" part of the name is superfluous.

What if we went further with the pipe terminology and called them "inlet" and "outlet"? i.e. the descriptor used to push data in to the pipe and the descriptor where data emerges from?

@milseman
Copy link
Contributor

milseman commented Aug 31, 2021

The more I think about this PR, the more I like having a nominal type (inside FileDescriptor) for the pipe with well-named members. Just as FileDescriptor is clearly modeling the OS concept of an entry in a descriptor table and shouldn't be confusable with a theoretical future File moveonly struct, I feel that FileDescriptor.Pipe won't be confused with some theoretical future type used with a Process actor or anything like that.

I really like @karwa 's suggestion (inlet / outlet) as thematically consistent and clear, assuming there's not a better "standard" Swift name for the concept.

@GeorgeLyon
Copy link
Contributor Author

I changed it to inlet/outlet, great suggestion @karwa!

@lorentey
Copy link
Member

lorentey commented Aug 31, 2021

Naming them inlet and outlet is an attractive idea, but I think they just make things worse:

let (outlet, inlet) = try FileDescriptor.pipe()
defer { 
  try! outlet.close() // Closing a valid pipe never errors out.
  try! inlet.close()
}
try launchChildProcess(input: .standardInput, output: inlet) // ???
try launchChildProcess(input: outlet, output: .standardOutput) // ???

Repeating #58 (comment):

The terms input and output in my suggestion reflect the naming convention as defined by FileDescriptor.standardInput and .standardOutput -- an "input filedescriptor" is one that you can read from; an "output fd" is one you can write to.

let (input, output) = try FileDescriptor.pipe()
defer { 
  try! input.close() // Closing a valid pipe never errors out.
  try! output.close()
}
try launchChildProcess(input: .standardInput, output: output)
try launchChildProcess(input: input, output: .standardOutput)

@karwa
Copy link

karwa commented Aug 31, 2021

It's not so bad if you use them from the pipe object/tuple:

let pipe = try FileDescriptor.Pipe()
defer { 
  try! pipe.close() // Closing a valid pipe never errors out.
}
try launchChildProcess(input: .standardInput, output: pipe.inlet) // Oh, ok - outputting to the pipe
try launchChildProcess(input: pipe.outlet, output: .standardOutput) // Oh, ok - reading from the pipe

But yeah, if you immediately destructure the result in to 2 file descriptors in the calling function, those names might not make sense and you can choose to bind them to other names in your destructuring.

@lorentey
Copy link
Member

I don't know, pipe.inlet doesn't look like a huge improvement to me. For an argument labeled output:, my default instinct would be to type pipe. then select whatever matches the label best from the completion list.

Going from output to inlet requires a bit of mental gymnastics. It's clever and correct, but it's not helpful.

@GeorgeLyon
Copy link
Contributor Author

To further improve on @karwa's example, you can just pass the pipe:

try launchChildProcess(input: .standardInput, output: pipe)
try launchChildProcess(input: pipe, output: .standardOutput)

And launchChildProcess would know to use inlet for output and outlet for input (we can accomplish this with overloads or protocols). This way there is not danger of making a mistake.


Even without that, inlet may not be perfect but it is certainly better in that case than input since it would be much easier to type input: pipe.input and miss the error. Having it be inlet at least gives you a bit more opportunity to do a double-take.

@lorentey
Copy link
Member

lorentey commented Sep 1, 2021

Even without that, inlet may not be perfect but it is certainly better in that case than input since it would be much easier to type input: pipe.input and miss the error.

What error? 😊

My entire point is that passing pipe.input for the input: argument would be correct: the labels align perfectly with the names for pipe components, as well as with the existing standardInput/standardOutput terminology. (pipe.input is what @karwa calls the "outlet".)

extension FileDescriptor {
  public static func pipe() throws -> (input: Self, output: Self)
}

let pipe = try FileDescriptor.pipe()
defer {
  try! pipe.input.close()
  try! pipe.output.close()
}

// This looks obviously correct, and is actually correct. It's also eminently consistent.
try spawnChildProcess(input: .standardInput, output: pipe.output)
try spawnChildProcess(input: pipe.input, output: .standardOutput)

// This looks incorrect, but is actually correct. It's clever, but tiresome. I'll get it wrong every single time.
try spawnChildProcess(input: .standardInput, output: pipe.inlet)
try spawnChildProcess(input: pipe.outlet, output: .standardOutput)

This also rhymes with how pipe() happens to line up the indices of its result array with the "0 means stdin, 1 means stdout" convention.

To further improve on @karwa's example, you can just pass the pipe:

Yep. However, does this trivial little syscall deserve an ABI stable named type plus a never-ending list of overloads? I am not convinced it does.

Let's suppose we wanted to implement the super simple spawnChildProcess API above:

public func spawnChildProcess(
  executable: FilePath,
  arguments: [String],
  input: FileDescriptor = .standardInput,
  output: FileDescriptor = .standardOutput,
  error: FileDescriptor = .standardError,
  environment: ProcessEnvironment? = nil
) throws -> ProcessID

(I don't think we'd want this as the primary frontend to process creation -- first and foremost, System ought to expose the full power of posix_spawn and friends. But something like this could perhaps be useful as a convenience construct.)

How exactly would we allow people to supply Pipe values to these three parameters?

It seems we have two options:

  1. Add seven(!!!) more overloads for the various combinations of pipe/nonpipe arguments. If we ever add more two-way channels like this, then we'll need exponentially more overloads.
  2. Introduce some silly protocols like FileDescriptorInputConvertible/FileDescriptorOutputConvertible.

These seem both terrible.


My position is that pipe() is not important or complicated enough to deserve more than a single public entry point in System. Getting the directions right isn't difficult even with the original pipe() -- the only thing System needs to do is to replace the awkward and error-prone pointer-based C API.

We don't need to overthink this.

@lorentey
Copy link
Member

lorentey commented Sep 1, 2021

It seems we have two options:

I realize it gets a little worse, as we'd need spawnChildProcess to close the unused half of the pipes it is given at the start of the new process. This eliminates option 2, and upgrades the overloads in option 1 from being just a pipepipe.input/output shortcut to something a little more involved.

Practically, I don't think spawnChildProcess would be the right way to expose a high-level interface to pipes. A more likely high-level approach would be to create some sort of declarative API for hooking processes into a pipeline of connected stdout/stdins:

let pipeline: Pipe = [
  Command("cat", "swift-system/README.md")
  Command("tr", "[:space:]", "\n")
  Command("sort")
  Command("uniq", "-c")
  Command("sort", "-nr")
  Command("head", "-10")
])
try await try pipeline.run()
// Prints to stdout:
//   96 
//   9 the
//   7 for
//   6 to
//   6 and
//   6 a
//   5 is
//   5 System
//   5 =
//   5 ##

I believe such a facility would be very desirable, but (in my view) it belongs in a package on top of System, rather than embedded in it. Whether pipe() returned a named type or a tuple wouldn't matter one bit -- that is not where the difficulties lie.


On the other hand, having a named FileDescriptor.Pipe type would allow us to add a finite, small set of convenience methods to System's posix_spawn wrapper, though:

func startProcess() throws -> (pid: ProcessID, results: FileDescriptor) {
  let file = try FileDescriptor.open("/tmp/foo.txt", .readOnly)
  defer { try! file.close() }

  let pipe = try FileDescriptor.pipe()
  defer { try! pipe.output.close() } // Not both!

  var actions: Spawn.Actions = []

  actions.setStandardInput(to: file)
  // Expands to:
    actions.close(.standardInput)
    actions.duplicate(file, as: .standardInput)
    actions.close(file)

  actions.setStandardOutput(to: pipe)
  // Expands to:
    actions.close(pipe.input)
    actions.close(.standardOutput)
    actions.duplicate(pipe.output, as: .standardOutput) // Note how the terminology lines up again!
    actions.close(pipe.output)

  do {
    let pid = try ProcessID.spawn(
      path: "/usr/bin/sort", 
      actions: actions, 
      arguments: ["/usr/bin/sort", "-r"])
    return (pid, pipe.input)
  } catch {
    try! pipe.input.close()
    throw error
  }
}

I don't think that System must provide such action shortcuts (they may be a little too magical) -- but if y'all think this is something worth spending a named type on, I will not continue arguing against it, either. At the end of the day, it is a relatively minor thing. 😅

The dup2 example action above underscores the importance of consistent naming though -- I think actions.duplicate(pipe.inlet, as: .standardOutput) wouldn't work nearly as well.

(On the other hand, I do realize I'm being a bit underhanded about naming the second component of startProcess's result tuple -- it wants to be called output, even though it is an input file descriptor.)

@GeorgeLyon
Copy link
Contributor Author

I'm swayed by @lorentey's argument. I have replaced the Pipe struct with a typealias which should not impact the ABI, but give a bit of sugar for those that want it.
I agree that writing to the wrong end of a pipe is a "cheap" mistake, and the various affordances we have discussed (automatically selecting the correct end of a pipe, closeAfter) can be implemented in a higher level API.

Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Copy link
Contributor

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming is backwards from the pipe's perspective and is confusing. I get the stdin/stdout analogy, but I do not believe it applies in this case (or, rather, it can just as easily apply in reverse).

If a "thing" (i.e. an encapsulation) has a member named input or output, it's the thing's input/output. It's not the input/output of whatever is holding the thing. Input and output are terms relative to some frame of reference and the concepts flip when viewed from the inside vs the outside.

Stdin/stdout are the input and output of the currently executing process, that's why they're globals. That is, globals are "local" members inside of the current process. Inside the process, we read from the input and write to the output, hence the names. Outside that process, we communicate with a process by writing to its input and/or reading from its output.

Even if it's just a typealias, giving it a name like Pipe is part of the source stability story. Giving it a name (even if via typealias) reifies it into a "thing" that has members.

If we're going with just a tuple, then the labels are less salient and can be ignored by someone familiar with the ordering. Otherwise they provide good documentation and I would prefer a clearer name. "Input/output" is relative to some frame of reference. man 2 pipe terminology of "read end" and "write end" is absolute, as is plumbing terminology of "inlet" and "outlet". (readEnd: FD, writeEnd: FD), (outlet: FD, inlet: FD), etc., read better to me than (input: FD, output: FD) while lacking a frame of reference.

Trouble with Tuples

As for tuples, I've usually regretted any time that I've settled on a typealias for a tuple rather than a real nominal type. It's unfortunate that real nominal types are such a pain to write in Swift.

An API pattern I've been playing around with elsewhere for such nominalized types is to have a var destructure: (fieldA: A, fieldB: B, ...), which gives you the tuple version if you really want/need it (e.g. switch expressions). This probably isn't the time to formalize this pattern in System.

I want to have a more comprehensive treatment of the file descriptor type hierarchy at some point and I don't mean to balloon this PR into that. The OS concept of file descriptor has been ad-hoc overloaded and extended over the decades (always easier to awkwardly extend an existing construct than invent a new one), but we want to provide stronger/better types when we can deliver extra value in doing so. I don't think it's too crazy to make a RawRepresentable struct for pipes and a FileDescriptorProtocol or some such family of related protocols. We currently have (albeit in sketches/draft PRs) stronger types such as SocketDescriptor and KernelQueue. These have different subsets of operations that make sense on them and each has additional operations available.

IIUC, pipe.write(...) and pipe.read(...) have an obvious behavior of forwarding to the appropriate end. Closing a pipe is weird though because (IIUC) you're likely to close the write end, but the read end is still open for reading anything buffered until it gets a zero (meaning EOF). A nominal type also lets us say FileDescriptor.Pipe.open() instead of just FileDescriptor.pipe(), which helps drive some of the intuition that the result should be closed (just like FileDescriptor.open()).

For this PR

I'm inclined to drop the type alias for now. That at least saves the name FileDescriptor.Pipe for future nominalization. Or, we could just nominalize it now...

Sources/System/Internals/Syscalls.swift Outdated Show resolved Hide resolved
@milseman
Copy link
Contributor

@GeorgeLyon sorry for the long back-and-forth. It's frustrating that such a simple API sparks so much debate and deferred API design tradeoffs. I think @lorentey and I have settled on:

  1. This not being a big enough deal to agonize over, so let's get something decent in place and merge.
  2. Avoid the tuple type alias. Type aliased tuples are just not quite production-ready in Swift.
  3. Let's call it (readEnd: FD, writeEnd: FD). A pipe has 2 ends, one for reading and the other for writing, so let's not add a layer of terminology indirection via "outlet/inlet" nor assume an inside/outside polarity with "input/output".

Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved
Sources/System/FileOperations.swift Outdated Show resolved Hide resolved

func testAdHocPipe() throws {
// Ad-hoc test testing `Pipe` functionality.
// We cannot test `Pipe` using `MockTestCase` because it calls `pipe` with a pointer to an array local to the `Pipe`, the address of which we do not know prior to invoking `Pipe`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably do (not gonna hold up this PR for it though) want to allow for mock testing of such stack local pointers. We'd probably have a variant that we'd pass in an array and it would compare the contents.

@GeorgeLyon
Copy link
Contributor Author

No worries! This matters so it is design, not bike-shedding and design is important :)

I like the "readEnd", "writeEnd" nomenclature. For now I just blindly accepted the review suggestions, but I can do a second pass to make sure everything builds, etc. I'm not sure how you all handle formatting/CI etc so let me know if there is something else I can do (or if it is simpler feel free to just commandeer this PR).

return valueOrErrno(retryOnInterrupt: false) {
system_pipe(fds.baseAddress!)
}.map { _ in (FileDescriptor(rawValue: fds[0]), FileDescriptor(rawValue: fds[1])) }
}
Copy link
Contributor

@milseman milseman Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to say var fds: Array<CInt> = [-1, -1] and use the implicit array-to-pointer conversion by passing &fds to the syscall (though I don't recall if it still works or has limitations). Otherwise, you can probably use withUnsafeMutablePointer to avoid the extra bind-memory step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorentey suggested I use the tuple form for a stronger guarantee that those values would end up on the stack. I haven't checked but I believe withUnsafeMutablePointer will still require rebinding the type (I think this was a signedness issue), but would also require manually passing the count (1), whereas withUnsafeMutableBytes takes this information from the buffer pointer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do make sure that this builds on windows - I'm almost certain that this will break the Windows builds. Also note that Windows should transact in HANDLEs, aka void *, so this is going to truncate at least. See CreatePipe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's fine to simply surround these additions with #if !os(Windows) for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! @atrick / @glessard , what's the lesser of evils for the pointer binding stuff?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer to a homogenous tuple is also bound to its element type, so you can use assumingMemoryBound.

(I hope. I use this quite a lot)

Copy link
Contributor

@glessard glessard Sep 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, this case indeed calls for assumingMemoryBound.
The memory is already bound to Int32, so bindMemory doesn't add information.

This being said, we unfortunately don't have assumingMemoryBound on the RawBufferPointer types at this point. We also gain nothing from using a Buffer in this case: we must rely in the C call to be well behaved instead of having any sort of bounds checking. Given that, I suggest this for lines 389-393:

    return withUnsafeMutablePointer(to: &fds) { pointer in
      valueOrErrno(retryOnInterrupt: false) {
        system_pipe(UnsafeMutableRawPointer(pointer).assumingMemoryBound(to: Int32.self))
      }
    }.map { _ in (FileDescriptor(rawValue: fds.0), FileDescriptor(rawValue: fds.1)) }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karl is correct, and @glessard 's code edit looks good.
Although the original code was harmless, it doesn't really make sense to bind memory that belongs to a variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a document detailing what "binding memory" actually does? I had thought this was just managing compile-time information so I'm not 100% clear on the difference between "assuming" memory bound and actually binding memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some scattered documentation, including the doc-comments, but the largest chunk is in the RawPointer proposal: https://github.com/apple/swift-evolution/blob/main/proposals/0107-unsaferawpointer.md
Experience has shown that those sources are insufficient.

@NevinBR (I think) came up with a nice description of binding for humans a few weeks ago: https://forums.swift.org/t/pitch-implicit-pointer-conversion-for-c-interoperability/51129/36.

In general, if you're reminding the compiler of type information it should already have known, but had been obscured for some reason, then assumingMemoryBound is the thing to reach for (as we did here). If you are telling the compiler new information, then reach for bindMemory or withMemoryRebound.

@milseman
Copy link
Contributor

@swift-ci please test

Tests/SystemTests/FileOperationsTest.swift Outdated Show resolved Hide resolved
Tests/SystemTests/FileOperationsTest.swift Outdated Show resolved Hide resolved
Tests/SystemTests/FileOperationsTest.swift Outdated Show resolved Hide resolved
Tests/SystemTests/FileOperationsTest.swift Outdated Show resolved Hide resolved
@milseman
Copy link
Contributor

@swift-ci please test

@milseman
Copy link
Contributor

@swift-ci please test

Copy link
Contributor

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to merge as soon as the pointer binding thing is figured out.

return valueOrErrno(retryOnInterrupt: false) {
system_pipe(fds.baseAddress!)
}.map { _ in (FileDescriptor(rawValue: fds[0]), FileDescriptor(rawValue: fds[1])) }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! @atrick / @glessard , what's the lesser of evils for the pointer binding stuff?

@milseman
Copy link
Contributor

@swift-ci please test

@milseman
Copy link
Contributor

@swift-ci please test

@milseman milseman merged commit 689390f into apple:main Sep 27, 2021
@milseman
Copy link
Contributor

@GeorgeLyon do you have a Twitter handle I can use for a shoutout?

@GeorgeLyon
Copy link
Contributor Author

I do not, but thank you for the thought :) Happy to have this landed!

@lin7sh
Copy link

lin7sh commented Sep 28, 2021

Late for the party, curious about is this optimised for Darwin's mach port implementation?

@lorentey
Copy link
Member

Late for the party, curious about is this optimised for Darwin's mach port implementation?

I'm sorry, I don't understand this question. This PR adds a lightweight Swift wrapper around the classic UNIX pipe system call. Pipes are a primitive IPC mechanism that predate Mach ports.

The PR doesn't change how pipe(2) works (that's not within Swift System's mandate) -- it merely exposes an idiomatic Swift interface for it that faithfully reflects its existing semantics.

@glessard
Copy link
Contributor

I believe that pipe(2) is implemented in the "open-source" part of Darwin's xnu kernel. You can probably find an answer at opensource.apple.com, where the periodic source releases are organized.

@lin7sh
Copy link

lin7sh commented Oct 1, 2021

thanks for information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants