Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Add GloVe #359

Merged
merged 5 commits into from
Nov 23, 2018
Merged

Add GloVe #359

merged 5 commits into from
Nov 23, 2018

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Oct 12, 2018

Description

This adds an implementation of GloVe. Unlike the original C implementation with asynchronous AdaGrad we used synchronous batched AdaGrad to make use of GPU. This also adds optimized C++ tools vocab_count and cooccur to construct the input for the train_glove.py script. Both tools make use of a (single-machine) map-reduce pattern to scale better than the single-threaded versions of https://github.com/stanfordnlp/GloVe .

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Add GloVe
  • Scalable vocab_count and cooccur tools

Comments

@leezu leezu requested a review from szha as a code owner October 12, 2018 08:09
@leezu leezu force-pushed the glove branch 2 times, most recently from 0028c31 to e1f9433 Compare October 12, 2018 08:48
@mli
Copy link
Member

mli commented Oct 12, 2018

Job PR-359/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/3/index.html

@codecov
Copy link

codecov bot commented Oct 18, 2018

Codecov Report

Merging #359 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #359   +/-   ##
=======================================
  Coverage   84.25%   84.25%           
=======================================
  Files          94       94           
  Lines        8313     8313           
=======================================
  Hits         7004     7004           
  Misses       1309     1309
Flag Coverage Δ
#PR359 83.19% <0%> (ø) ⬆️
#PR416 84.2% <0%> (ø) ⬆️
#master 83.19% <0%> (ø) ⬆️
#notserial 58.04% <0%> (ø) ⬆️
#py2 83.92% <0%> (ø) ⬆️
#py3 84.03% <0%> (ø) ⬆️
#serial 67.98% <0%> (ø) ⬆️

@mli
Copy link
Member

mli commented Oct 18, 2018

Job PR-359/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/4/index.html

@mli
Copy link
Member

mli commented Oct 21, 2018

Job PR-359/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/5/index.html

@mli
Copy link
Member

mli commented Oct 27, 2018

Job PR-359/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/6/index.html

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a readme.md or some rst file explaining how to use this training script (steps for compilation and cmd to run) and the expected result

@mli
Copy link
Member

mli commented Oct 30, 2018

Job PR-359/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/7/index.html

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I haven't gone through all the code in tools/cooccur.cc. Some comments so far

.gitmodules Show resolved Hide resolved
scripts/word_embeddings/train_glove.py Outdated Show resolved Hide resolved
scripts/word_embeddings/tools/cooccur.cc Show resolved Hide resolved
scripts/word_embeddings/tools/cooccur.cc Show resolved Hide resolved
scripts/word_embeddings/tools/vocab_count.cc Outdated Show resolved Hide resolved
scripts/word_embeddings/tools/utils.h Show resolved Hide resolved
scripts/word_embeddings/tools/cooccur.cc Outdated Show resolved Hide resolved
@leezu
Copy link
Contributor Author

leezu commented Oct 31, 2018

Will update this PR (rebase) to follow the structure adopted in #384 once merged.

@mli
Copy link
Member

mli commented Oct 31, 2018

Job PR-359/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/8/index.html

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments

scripts/word_embeddings/tools/cooccur.cc Show resolved Hide resolved
scripts/word_embeddings/tools/cooccur.cc Outdated Show resolved Hide resolved
scripts/word_embeddings/tools/cooccur.cc Outdated Show resolved Hide resolved
scripts/word_embeddings/tools/cooccur.cc Outdated Show resolved Hide resolved
@mli
Copy link
Member

mli commented Nov 5, 2018

Job PR-359/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/9/index.html

@mli
Copy link
Member

mli commented Nov 9, 2018

Job PR-359/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/10/index.html

@szha szha added the release focus Progress focus for release label Nov 11, 2018
@mli
Copy link
Member

mli commented Nov 23, 2018

Job PR-359/12 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-359/12/index.html

@szha szha merged commit bbab1e1 into dmlc:master Nov 23, 2018
@leezu leezu deleted the glove branch November 24, 2018 02:21
paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this pull request Jun 9, 2019
* Add GloVe

* Address comments

* Address comments

* Add dropout

* Autoformat
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
release focus Progress focus for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants