Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update knit proposal #73

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

devongovett
Copy link

Based on my experiments on yarnpkg/yarn#3753, I'm updating the knit RFC with a few additional sections as requested by @bestander:

  1. Added yarn install --knit option to automatically link dependencies when installing.
  2. Section comparing several other yarn features for linking deps.
  3. Mentioned requirement of --preserve-symlinks option in node for this to work correctly.

@bestander
Copy link
Member

Awesome!
Thanks, @devongovett, for stepping up

* `yarn pack` + `yarn install dep.tgz` is similar to `file://` dependencies. The pack + install process must be re-run every time a change is made. It does correctly dedupe dependencies, however, as `node_modules` are excluded by `yarn pack`.

* [Workspaces](https://github.com/yarnpkg/rfcs/pull/66) are similar but solve a different problem. Where workspaces are great for a tree of related modules (e.g. a monorepo), `yarn knit` is for linking together modules in separate trees, e.g. things that might be shared between multiple workspaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice overview.

# Alternatives

Another similar approach is to run `yarn pack`, which creates a tarball of the library's contents as if it were downloaded from npm, and install the tarball in the app used to test the library. This has the benefits of reducing divergence between development and production -- `library/node_modules` is not shared across apps, the developer can work on multiple versions of the library, and the `node_modules` hierarchy is faithfully represented. The downside is that everytime the developer edits a file, they need to re-pack and reinstall the library.
Finally, another issue is with the way the node require resolution algorithm works. Dependencies of symlinked modules are resolved relative to the realpath, not the symlink. This means that you'll still get duplicates if both modules depend on a third dependency, or errors if that dependency is not installed in either place. This is solved by the recently added runtime option for node `--preserve-symlinks`, which skips getting the realpath when resolving modules. Something similar would need to be added to browserify/webpack to solve this there as well. I recently opened a [PR for browserify](https://github.com/substack/node-browserify/pull/1742) to support the same option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is a problem with knitting because on install the knitted dependencies should not have subdependencies packed in the "repository".
So all subdependencies should be installed fresh at referer level.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the linked module may still have a node_modules folder where you are developing. When you require('dep') from your app, it gets resolved using the realpath, e.g. ~/dev/dep/index.js. If that file has require('dep2'), it is resolved to ~/dev/dep/node_modules/dep2, and that copy will be used instead of the one installed in your app. I verified this to be the case. It is not the case with the --preserve-symlinks option, because the resolution algorithm uses the symlink path (~/dev/app/node_modules/dep/index.js) instead of the realpath, so when resolving dep2, ~/dev/app/node_modules/dep2 will be used instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does knitting mean that dependencies need run prepublish script and the files listed in package.json/files should be copied to some temp location?

I think there would not be a symlink ~/dev/app -> ~/dev/dep, it would probably be ~/dev/app -> (yarn-cache)/dep-knit-<uuid> and node_modules would be empty there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying files kinda defeats the purpose, no? Then you'd have to run yarn knit every time you make a change to a dependency. I think the symlink would be ~/dev/app/index.js -> ~/.yarn-knit/dep/index.js -> ~/dev/dep/index.js for example. The realpath is still the last one, which may have a node_modules folder in it.

Copy link
Member

@bestander bestander Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there is a contradiction we need to sort out:

https://github.com/devongovett/rfcs/blob/50712e27d2d242bf466224186fcbb87b9bbab0ba/accepted/0000-yarn-knit.md#isolating-node_modules-correctly

https://github.com/devongovett/rfcs/blob/50712e27d2d242bf466224186fcbb87b9bbab0ba/accepted/0000-yarn-knit.md#faithfully-representing-the-node_modules-hierarchy

https://github.com/devongovett/rfcs/blob/50712e27d2d242bf466224186fcbb87b9bbab0ba/accepted/0000-yarn-knit.md#a-practical-proposal----knitting

The links above indicate that knitting would simulate full npm publishing/installation but without actual publish/install to npm registry.
Symlinking app -> dep breaks this simulation because dep will have non publishable artifacts (node_modules, .yarnignore files etc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the middle symlink is for. Only the published files would have a symlink in ~/.yarn-knit/dep, so (as long as --preserve-symlinks is used) non-published files would not show up to the app.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
Then node_modules would not be symlinked and this is not a problem.

@bestander
Copy link
Member

@devongovett, what do you think of making knit just a for of link and not a separate command?
E.g. all link:/ dependencies would be knit instead of symlinked to the source if yarn install/add/upgrade is called with --knit-link flag?

That might reduce the terms overload on users.

@devongovett
Copy link
Author

@bestander so you mean the following?

  • yarn link --knit - same as yarn knit in RFC
  • yarn add dep --knit - does what yarn knit dep did in the RFC
  • yarn install --knit - links all dependencies that have been setup with yarn link --knit

@bestander
Copy link
Member

  • yarn link --knit - same as yarn knit in RFC

Yes

  • yarn add dep --knit - does what yarn knit dep did in the RFC

Only if dep is like dep@link:../path/to/dep

  • yarn install --knit - links all dependencies that have been setup with yarn link --knit

And also all dependencies with link:/ and file:/ (when used as link:/) specifier.

@devongovett
Copy link
Author

  1. is fine
  2. doesn't make sense to me because you'd need to know the path to the global yarn registry of knit dependencies to specify the link path. I'm on board with yarn add dep --knit means install dep from the knit registry though.
  3. So it would treat file:// deps as link://? I suppose so, but this seems separate from knit - i.e. it has nothing to do with deps in the knit registry.

@devongovett
Copy link
Author

cc @ide who wrote the original knit proposal

@bestander
Copy link
Member

bestander commented Jul 13, 2017

What I tried to say that knit is a mode of link feature, the implementation could probably piggy-back ride on the link with a conditional action at the end.

And we can do linking 3 different ways:

  • yarn link command
  • yarn install/add/upgrade with package.json that contains link: dependencies
  • yarn install/add/upgrade with package.json that contains file: dependencies but called with yarn-link-file-dependencies true in yarnrc, this basically makes them behave as link: dependencies

So knitting should be compatible with all of them

@devongovett
Copy link
Author

Ah I see, sorry for the misunderstanding.

Yes, I think knit could be a mode of link:

  • yarn link --knit - same as yarn knit in RFC.
  • yarn link dep --knit - same as yarn link now, but also installs dependencies of dep locally.

What do you think of dropping the need for yarn link dep and just using yarn add dep --knit instead though? This would have the effect of also automatically linking any sub-dependencies of dep which might have been setup locally (with yarn link --knit) as well, since implementation of add is shared with install. This might have happened previously with yarn link because it included node_modules, so sub-dependencies of dep that were linked would also be carried over.

@bestander
Copy link
Member

Yes, I think knit could be a mode of link:

  • yarn link --knit - same as yarn knit in RFC.

👍
What do you think of giving it a more self explanatory name like --prepublish or --prepare?

  • yarn link dep --knit - same as yarn link now, but also installs dependencies of dep locally.

Considering that at step one, yarn link --knit, we simulated a publishing to npm by putting files to a temp/cache folder it should create a symlink not to the folder of knitted project but to the folder where we put the build artifacts.
So it would work slightly differently.

What do you think of dropping the need for yarn link dep and just using yarn add dep --knit instead though?

I am worried that it will be confusing.
Would it create a symlink or would it install dep by copying files?
link implies that the dependency will be installed via a link from a local place.

@devongovett
Copy link
Author

Sorry for the delay in reply.

Yes, I think the knit name isn't very descriptive. In my POC pull (yarnpkg/yarn#3753), I called it yarn link --deep because it makes links to each file rather than the directory. Maybe that makes sense? Ideally this would be the default mode for yarn link, but I suppose we can't change compatibility with npm link. Maybe a config option in yarnrc to make it the default?

yarn add dep --knit is a bit confusing, but what about yarn add dep --link? Then it's clear it's linking. It also seems clearer than yarn link dep since it is both linking and installing dependencies, whereas yarn link dep is only linking.

So my proposal would be:

  • yarn link --deep - links each file in the cwd to the global link registry
  • yarn add dep --link - links dep to the global link registry and installs its dependencies
  • yarn install --link - installs all dependencies and auto-links ones that are in the global link registry

No new commands, only two new options.

@ide
Copy link
Contributor

ide commented Aug 2, 2017

That naming proposal mostly makes sense to me. I prefer a flag other than --deep like --pack (as in yarn pack) or --publish since part of the RFC's intent is to mimic what happens when you publish a package. If your package has src and build directories and your package.json specifies "files": ["build"], then only the build directory should get symlinked (and shallowly, not the files within it).

I like the link:// URLs too, really useful for internal projects.

@devongovett
Copy link
Author

I think yarn link --pack is probably the best option so far. --publish might be confused with actual publishing I guess.

@devongovett
Copy link
Author

Another open question is what put in the lock file. I would say that installing with the --link option would not affect the lock file: it would remain as if you had installed from the normal package registry. Otherwise we might run the risk of committing changes to a lock file that don't work in a real environment. The only case where this doesn't work is if the package isn't published to the registry yet. In that case I guess it would work as link:// urls do now, and you would have to update it manually when the package is published. Thoughts?

@bestander
Copy link
Member

link --pack sounds good

@bestander
Copy link
Member

As for yarn.lock - good question.
What would be the result of this operation:

yarn link project-b --pack
yarn add left-pad

Would the second command overwrite what is inside node_modules?
I think lockfile should indicate something to conform with the promise that the same yarn.lock produces the same node_modules

@devongovett
Copy link
Author

yeah its a bit weird if you commit your yarn.lock file though, since you may end up with local-only artifacts being committed.

@devongovett
Copy link
Author

Updated the RFC with the new naming as discussed above, and renamed the file. Sorry if it makes it harder to see the diff. :/

@devongovett
Copy link
Author

@bestander @ide thoughts on the update? Would like to get started on an implementation if you think this is good.


This is the step that simulates publishing `dep`. Running `yarn link --pack` in `dep` finds all the files that "yarn publish" would pack up and upload to npm. Crucially, this excludes `node_modules`, and would follow the same algorithm as "yarn publish" such as reading package.json's `files` field.

Then it simulates publishing `dep`: it creates a directory named `dep-X.Y.Z` (where `X.Y.Z` is the version of `dep` in its package.json) inside of a global directory like `~/.yarn-link`. A symlink is created for each file or directory that `yarn publish` would normally have packed up. This step shares some conceptual similarities with publishing to a registry, except it uses symlinks and it's local on your computer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use cache folder for this instead?
If we opt in to a new ~/.yarn-link it will fall out of OS cache radars and becomes a risk of growing out of control.

How about <yarn cache>/links/?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, though there should only be symlinks in the global registry anyway, so I don't think it should get that large. Putting it in the cache folder might cause your linked modules to suddenly stop working e.g. after a reboot, which would be weird IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we'll defer this discussion once we agree on the model


### Running "yarn link --pack" inside of `dep`

This is the step that simulates publishing `dep`. Running `yarn link --pack` in `dep` finds all the files that "yarn publish" would pack up and upload to npm. Crucially, this excludes `node_modules`, and would follow the same algorithm as "yarn publish" such as reading package.json's `files` field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disadvantage is that you need to run yarn link --pack every time you want to iterate.
What do you think of running the --pack at the time of link consumption?

  1. yarn link would register dep as "linkable"
  2. yarn link dep --pack would run the packing and installation

Copy link
Author

@devongovett devongovett Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. You have to run yarn link --pack once to set up, and only again if you add/remove a file in the root. If you add a file to a subdirectory (e.g. src/) it should automatically show up everywhere since we symlink directories in the root. I think this is reasonable. I definitely don't want to have to run a command every time any change is made: might as well use yarn pack + yarn install in that case.

Copy link
Member

@bestander bestander Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I've missed that point.

I understand that your goal for this change is to improve iteration speed when you are actively developing dep that is used in one of your projects.

I think the goal should be to achieve correctness of dep and simulate full publish/install cycle.
We could think of performance tweaks in the next iteration and people who absolutely need speed will just set up their environment to use vanilla yarn link.

Am I right you want to have this setup?

/path/to/app/node_modules/dep ->(symlink) ~/.yarn-links/dex-x.y.z ->(multiple symlinks) /path/to/dep

I think you won't be able to achieve the right isolation with symlinks inside ~/.yarn-links/dep-x.y.z -> dep. You'll end up cherry-picking symlinks of files and folders while respecting rules in .yarnignore/.npmingore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the right idea. The dir in the app symlinks to a global directory which contains symlinks to individual files/directories in the root of the dependency.

My goal for this feature is to have something that works as closely to yarn link as possible but avoids the problem of having multiple copies of dependencies. This is solved by not including node_modules in the list of symlinks, and installing the dependencies locally in the project you're linking into.

It's a benefit that it happens to be closer to what you'd actually publish, but not the main goal for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of multiple symlinks that work around node_modules and .yarnignore.
That would be quite hard to debug in non trivial cases IMHO.

And yeah, it goes the other direction with the --pack idea which should be about simulating a publish.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it make it global is so when you re-"publish" (e.g. after adding a new file) you don't also have to re-"install" in every project that uses that dependency. This is because those projects point to the global directory, not to the individual files, so updating the global cascades the updates everywhere.

As for cleaning up references, we have a couple options:

  1. We could clean them up when you run yarn link --pack again - we'd blow away everything inside the global dir for the package and create new symlinks. This way anything that didn't exist anymore would go away.

  2. We could do (1) but across all packages in the global dir as a cleanup step: find any broken links and delete them.

Copy link
Member

@bestander bestander Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense if you have multiple projects refer the same package.
Is it a common use case for you?
I though that yarn link is something that you do temporarily while testing a dependency and don't leave as a link for long.

Copy link
Author

@devongovett devongovett Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah at work we tend to build our apps as a composition of a bunch of libraries, and we have several different apps which all use some of the same libraries. Setting up and maintaining the links between all of our repos has been a huge pain for us, which is why I'm invested in this feature. Our devs work on different parts of our apps which live across repos all the time, so we want to keep things linked generally. I've written some automation to setup the repos locally for everyone and link everything together using the POC of this yarn feature, and it seems to work pretty well. Ideally we'd use workspaces, but since we have many libraries that are used across different applications, it would be hard to go the monorepo route for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying and investing in this feature, @devongovett.
I agree in general that Yarn is in the position to solve your problem but before merging your RFC I want us to sort out a few things that bother me.

  1. I want to keep knit proposal separate from this one.
    I think knit is about running prepublish scripts and making files available to projects that use a dep .
    This proposal is about improving link command with a trick around symlink so that node_modules and dev files don't leak from dep into a project using it.

  2. Semantics.
    After thinking more, --pack seems misleading.
    I would say yarn link --exclude-ignored-files or something like that, I don't insist on the actual flag name but it should not imply that some "packing" or prepublish will be happening.

  3. I am still not convinced that symlinks should be created in ~/yarn-links or any other global folder.
    Symlinks are cheap to create and any issues with the implementation would be contained inside a project that uses this feature.

  4. Another thing is this issue [docs] create a page for working with non npm dependencies website#573.
    We have quite a few way of working with npm packages locally and I am concerned that we are getting too many very similar solutions that approach the problem from different angles.
    I suggest hiding this RFC implementation behind a flag, similar to "workspaces-experimental true", and running with it for a while as an opt-in feature.
    So that if we find a better solution we could remove or change it without concerns of breaking the world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the linking behavior is what we (Expo) want, whatever the name of the feature ends up being. These are our needs:

(a) we have a repo with multiple apps and multiple libraries
(b) we want the apps to be able to use the latest code in the libraries
(c) we want the libraries to be installed/linked as if we had published them to npm and then installed them in the apps. npm link/yarn link today has very different behavior than installing published code and we can't trust those commands.
(d) we want to be able to write code in the libraries and easily see how it affects an app. symlinks seem like a good way to do this.


Separately, we also do have a need for packing up a library (to simulate publishing) and installing in an app -- this happens in our continuous integration systems. This behavior is even closer to publishing+installing a library, which is why we use it in CI. We run something like: cd library && yarn pack && cd app && yarn add library.tgz.

However, in CI we don't need a tight feedback loop between editing library code and seeing the effects of those edits in an app. During development, we want to edit library code and see the app change immediately, which is why symlinks are appealing. We have a solution for CI already (yarn pack && yarn add) but no good one for development today.

@bestander
Copy link
Member

@devongovett, I think we are almost there, thank you for your work on this feature, it should be really useful.
I've added a few suggestions above

@BYK BYK self-assigned this Sep 29, 2017
@bestander
Copy link
Member

Looks like this PR got stuck :(
I'll be happy to iterate more on the details.

@devongovett
Copy link
Author

@bestander sorry about that. I got really busy, and a bit confused about what the next steps are for this proposal. However, we really are in desperate need of this feature, so I am happy to work on it if we can come to a well-defined agreement about what we need.

Here is my proposal to move things forward:

  1. Split this proposal out into a separate document, leaving knit as it is. We can update the current doc to be much more clear about how the feature should work on some of the finer points discussed in the comments.
  2. yarn link --pack (or some other name) should run prepublish scripts, and link all files that would be published (following .npmignore, excluding node_modules, etc.) into a global directory, just as yarn link works currently, but linking individual files instead of the project directory.
  3. yarn install --link installs all dependencies, and for packages that are in the global link directory, those packages are linked.

The global directory is needed so that yarn install --link knows which projects are available to link. It also enables changes to be made within a project and automatically propagate those changes to all places that use it.

Please let me know if there are other things that we should cover.

@BYK BYK removed their assignment Oct 6, 2017
@bestander
Copy link
Member

Hey, @devongovett, no worries, I can totally understand.
Same here, I got busy with other projects and could not be very responsive recently.

I wish Yarn would have a plugin API so that people could code their needs to get unblocked.

On your proposal.

  1. +1 let's make the minimal possible change you need to get unblocked.
    It is way easier to get a smaller feature in.
    And +1 on updating the current doc with a few comments guiding the future contributors.

  2. Ok, we can discuss the flag name and the folder layout in the new RFC.

  3. Do we need this flag?
    It seems more consistent if yarn link pkg-a links a package that was declared with yarn link command before.


#### Isolating `node_modules` correctly

You can install `dep` in two different apps without sharing the `node_modules` of `dep`. This is a problem with Electron apps, whose V8 version is different than Node's and uses a different ABI. If you have `node-app` and `electron-app` that both depend on `dep`, the native dependencies of `dep` need to be recompiled separately for each app; `node-app/n_m/dep/n_m` must not be the same as `electron-app/n_m/dep/n_m`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the proposal will solve this issue. Node.js calls realpath on every "module". So even when files are linked instead of the whole folder, any lookup origining from the linked module will resolve in the realpath -> in the node_modules folder of the linked dependency and not in the app's node_modules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when run with --preserve-symlinks: https://nodejs.org/api/cli.html#cli_preserve_symlinks. See drawbacks section. I think webpack already supports such an option as well, and I tried to patch browserify. browserify/browserify#1742

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, that makes sense.

@fenduru
Copy link

fenduru commented Jun 15, 2018

How will yarn knit work when knitting a package that is in a monorepo? For instance imagine the following directory structure

/
  my-mono-repo/
    packages/
      foo
      bar
  baz/

Where bar has a dependency on foo, and from baz you run yarn knit ../my-mono-repo/packages/bar. Without special handling for workspaces I'm not sure that the current knit proposal would properly pull in the local copy of bar or not.

@arcanis
Copy link
Member

arcanis commented Jun 15, 2018

This rfc is pretty old, and predates our workspace feature which is a much safer alternative to yarn link. I don't think we'll implement yarn knit.

@SamB
Copy link

SamB commented Jul 25, 2019

@arcanis: workspaces may be safer, but they seem to be fairly explicitly intended for planned usage, not ad-hoc "oh man, there's an issue in one of our dependencies, let me go make a PR for that, testing it in our thing so I can easily iterate on fixing our actual problem" use cases.

So, features that would actually help with those ad-hoc use cases are still desirable. Since yarn link exists and can be used for this, it is the baseline against which other approaches would be measured (with presumably a glance or two at npm link), and the advantages weighed against (a) the maintenance burden, (b) the expected cognitive load on users trying to decide what to do, and (c) is it likely that someone will soon discover a much more elegant approach?

Note: I have no position on whether any given yarn knit proposal would in fact improve things.

[Hmm. I'm really not good at writing short comments, am I?]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants