Skip to content
This repository has been archived by the owner on Mar 27, 2023. It is now read-only.

Merge #1

Open
wants to merge 158 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
dc0a0ac
Added tests for more versions of Active Record
ankane Dec 10, 2020
3eef8e3
Test with Ruby 3
ankane Dec 27, 2020
b2341df
Updated readme [skip ci]
ankane Jan 22, 2021
b6f5416
Removed unnecessary sorting
ankane Jan 27, 2021
80c4882
Removed norms calculation for optimized similarity
ankane Jan 27, 2021
7360d75
Added safety check for optimized similarity
ankane Jan 27, 2021
7b611bc
Use minmax
ankane Jan 28, 2021
9189dcb
Reduced allocations
ankane Jan 28, 2021
d24ea8e
Use inner for consistency
ankane Jan 29, 2021
2dbd0a8
Added comment [skip ci]
ankane Jan 29, 2021
eb46ba0
Fixed similarity calculation
ankane Jan 29, 2021
0ec80cd
Reverted calculation, but use norms[i] instead of max score
ankane Jan 29, 2021
6b4d91a
Updated comment [skip ci]
ankane Jan 29, 2021
dce5855
Improved optimize_similar_items test [skip ci]
ankane Jan 31, 2021
ca8257e
Test score
ankane Jan 31, 2021
797edc9
Added comment [skip ci]
ankane Jan 31, 2021
109ee0e
Fixed warning
ankane Feb 16, 2021
73ee6ef
Added methods to get ids and to get factors for specific users and items
ankane Feb 16, 2021
4126bca
Version bump to 0.2.4 [skip ci]
ankane Feb 16, 2021
b16cbc2
Added note about storing factors [skip ci]
ankane Feb 16, 2021
98b92ee
Added optimize_similar_users method
ankane Feb 20, 2021
54f209d
Updated headers [skip ci]
ankane Feb 20, 2021
b90e81f
Updated readme [skip ci]
ankane Feb 20, 2021
8306175
Fixed CI
ankane Feb 20, 2021
b123907
Improved variable naming [skip ci]
ankane Feb 20, 2021
f92b766
Fixed test name [skip ci]
ankane Feb 20, 2021
48e5816
Added support for Faiss for optimize_item_recs and optimize_similar_u…
ankane Feb 20, 2021
db9105b
Added comment [skip ci]
ankane Feb 20, 2021
dd899bd
Added rmse method [skip ci]
ankane Feb 20, 2021
93222e8
Moved optimize tests to new file
ankane Feb 20, 2021
3865c17
Test similar_users with NGT [skip ci]
ankane Feb 20, 2021
6d841b8
Prep to change key
ankane Feb 20, 2021
3b12b03
Improved performance
ankane Feb 20, 2021
f12c6b1
Fixed test
ankane Feb 20, 2021
a9df9f0
Use Numo for performance
ankane Feb 20, 2021
fe27323
Improved performance of user_recs
ankane Feb 20, 2021
099ae86
Improved code
ankane Feb 20, 2021
7ce9a58
Improved code [skip ci]
ankane Feb 20, 2021
d1140d6
Improved similar code
ankane Feb 20, 2021
3f55970
Fixed CI for now
ankane Feb 20, 2021
d9253be
Fixed CI matrix
ankane Feb 20, 2021
3a05058
Re-add Faiss to CI (error likely due to gem cache between different m…
ankane Feb 20, 2021
560e127
Fixed flaky test
ankane Feb 20, 2021
08968d8
Fixed CI
ankane Feb 20, 2021
a8b04d8
Improved performance by memoizing normalized factors
ankane Feb 20, 2021
0b4fafa
Added tests for count: nil
ankane Feb 20, 2021
96edff5
Improved predict test
ankane Feb 20, 2021
3871733
Fixed test
ankane Feb 20, 2021
73fd0e3
Use create_index method for optimize_user_recs
ankane Feb 20, 2021
3a85d9e
Updated comment [skip ci]
ankane Feb 20, 2021
df7819e
Moved optimize methods [skip ci]
ankane Feb 20, 2021
084e761
Reset indexes after fit [skip ci]
ankane Feb 20, 2021
868b3e8
Fixed test [skip ci]
ankane Feb 20, 2021
3059239
Added top_items method - #8
ankane Feb 20, 2021
897584f
Added wilson_score to all Gemfiles
ankane Feb 20, 2021
20ed83b
Version bump to 0.2.5 [skip ci]
ankane Feb 20, 2021
2d2ae1f
Updated readme [skip ci]
ankane Feb 20, 2021
a788de1
Updated readme [skip ci]
ankane Feb 20, 2021
ae7a457
Added missing word [skip ci]
ankane Feb 20, 2021
de2f05e
Reset norms when refitting [skip ci]
ankane Feb 20, 2021
2449811
Added basic benchmarking [skip ci]
ankane Feb 20, 2021
998f7b2
Improved naming [skip ci]
ankane Feb 21, 2021
877d5cc
Improved inspect method [skip ci]
ankane Feb 21, 2021
dd019c9
Fixed error with fit after loading
ankane Feb 21, 2021
8e1e6c8
Improved performance [skip ci]
ankane Feb 21, 2021
f39d765
Update maps and build matrix in single pass [skip ci]
ankane Feb 21, 2021
a378161
Added tests for invalid ratings [skip ci]
ankane Feb 21, 2021
f9c44b5
Updated comment [skip ci]
ankane Feb 21, 2021
17db0a8
Added rating tests for validation set [skip ci]
ankane Feb 21, 2021
7257fb3
Updated comment [skip ci]
ankane Feb 21, 2021
5488042
Improved code [skip ci]
ankane Feb 21, 2021
3fc049f
Fixed issue with similar_users and item_recs returning the original u…
ankane Feb 24, 2021
45ff023
Version bump to 0.2.6 [skip ci]
ankane Feb 24, 2021
551ca0f
Simplify test [skip ci]
ankane Feb 24, 2021
c6bd1ba
Prep to use IndexHNSWFlat in 0.3.0 [skip ci]
ankane Feb 26, 2021
642bcc0
Updated readme [skip ci]
ankane Feb 27, 2021
cc1ba71
Prep to use Numo for top items
ankane Feb 28, 2021
cbfc701
Added code to remove wilson_score dependency for top_items [skip ci]
ankane Feb 28, 2021
73e16d8
Updated comment [skip ci]
ankane Feb 28, 2021
d37ccc8
Added link to Neighbor examples [skip ci]
ankane Apr 23, 2021
b0fa612
Use ActiveRecord::Schema.define for test setup [skip ci]
ankane Jul 29, 2021
21c7f0f
Added warning for value - closes #13
ankane Aug 6, 2021
2ec82bc
Only check if implicit
ankane Aug 6, 2021
02c4796
Version bump to 0.2.7 [skip ci]
ankane Aug 6, 2021
ea93db1
Fixed formula [skip ci]
ankane Aug 12, 2021
676dded
No need for max [skip ci]
ankane Aug 12, 2021
20deab5
Use group_prop in example [skip ci]
ankane Oct 21, 2021
3d772fd
Test with Active Record 7 rc1
ankane Dec 7, 2021
d936911
Added note [skip ci]
ankane Dec 9, 2021
82101e2
Added comment [skip ci]
ankane Dec 15, 2021
a2a510d
Test with Active Record 7 by default
ankane Dec 16, 2021
b78561f
Test with Ruby 3.1 on CI
ankane Jan 7, 2022
f72634d
Fixed CI
ankane Jan 7, 2022
903c6e3
Added test for top items with no range - #20
ankane Feb 15, 2022
f868b82
Fixed error with top_items with all same rating - fixes #20
ankane Feb 15, 2022
75708a7
Version bump to 0.2.8 [skip ci]
ankane Mar 13, 2022
34063fc
Fixed error with load_movielens
ankane Mar 22, 2022
5d5ed61
Updated readme [skip ci]
ankane Mar 22, 2022
422bbf2
Updated readme [skip ci]
ankane Mar 22, 2022
555e482
Version bump to 0.2.9 [skip ci]
ankane Mar 22, 2022
694a92e
Dropped support for Ruby < 2.6 [skip ci]
ankane Mar 22, 2022
e17ba9c
Removed dependency on wilson_score gem for top_items
ankane Mar 22, 2022
a4ad124
Updated comment [skip ci]
ankane Mar 22, 2022
01cbd20
Added test for similar_users [skip ci]
ankane Mar 22, 2022
ba09dcd
Improved test [skip ci]
ankane Mar 22, 2022
0d03738
Changed item_id to user_id for similar_users
ankane Mar 22, 2022
7d76ffe
Use fetch for item_id and score [skip ci]
ankane Mar 22, 2022
f25709b
Changed warning to an error when value passed to fit
ankane Mar 22, 2022
f9b53db
Fixed tests
ankane Mar 22, 2022
b412475
Changed to use Faiss over NGT for optimize_item_recs and optimize_sim…
ankane Mar 22, 2022
f4aa745
Updated license year [skip ci]
ankane Mar 22, 2022
c7cd3fc
Version bump to 0.3.0 [skip ci]
ankane Mar 22, 2022
15fafdd
Added support for JSON serialization
ankane Jul 10, 2022
91c81ea
Added note about serialized recommender [skip ci]
ankane Jul 10, 2022
e629c89
Version bump to 0.3.1 [skip ci]
ankane Jul 10, 2022
3b432e4
Updated readme [skip ci]
ankane Jul 10, 2022
6bba087
Added link to Trove [skip ci]
ankane Jul 10, 2022
1f6b45b
Improved tests [skip ci]
ankane Aug 7, 2022
2b55ff3
Fixed issue when fit is called multiple times
ankane Aug 8, 2022
95b18da
Updated link [skip ci]
ankane Aug 8, 2022
508ceb8
Version bump to 0.3.2 [skip ci]
ankane Sep 27, 2022
2331116
Updated actions
ankane Nov 14, 2022
c7f2e83
Added Ruby 3.2 to CI
ankane Dec 26, 2022
ecae29f
Removed unused gemfiles [skip ci]
ankane Jan 11, 2023
033052a
Fixed issue with has_recommended and inheritance with Active Record <…
ankane Jan 11, 2023
380f4ec
Switched to require_relative
ankane Jan 11, 2023
e9c5bd3
Dropped support for Ruby < 2.7
ankane Jan 11, 2023
a75e1c7
Dropped support for Rails < 6
ankane Jan 30, 2023
df5eb2b
Deprecated marshal serialization
ankane Jan 30, 2023
e05b8be
Added deprecation warning to load
ankane Jan 30, 2023
51d6d90
Version bump to 0.4.0 [skip ci]
ankane Jan 30, 2023
35d7740
Added Active Record 7.1 to CI
ankane Jul 26, 2023
ad5c272
Fixed test
ankane Sep 10, 2023
3b7318d
Updated readme [skip ci]
ankane Sep 10, 2023
993018d
Test with Rails 7.1.0.beta1
ankane Sep 15, 2023
f91b3f5
Updated readme [skip ci]
ankane Sep 29, 2023
af04f3c
Fixed encoding for MovieLens data
ankane Sep 30, 2023
a25c955
Fixed count for all same scores
ankane Oct 1, 2023
40855e2
Made code consistent with user_recs
ankane Oct 1, 2023
06be249
Test with Rails 7.1
ankane Oct 6, 2023
6d785f8
Test with Ruby 3.3 on CI
ankane Dec 26, 2023
7c5555c
Updated badge [skip ci]
ankane Feb 17, 2024
bce703c
Fixed CI
ankane May 23, 2024
7c8a397
Reduced memory for item_recs and similar_users
ankane May 23, 2024
ea3a19f
Updated order [skip ci]
ankane May 23, 2024
6a1a478
Version bump to 0.4.1 [skip ci]
ankane May 23, 2024
77ff5bc
Test with Active Record 7.2.0.beta2 on CI
ankane Jun 10, 2024
5bf3a58
Updated CI
ankane Jun 10, 2024
cb9ba53
Removed dependency on csv gem for load_movielens
ankane Jun 10, 2024
ea66188
Removed comment [skip ci]
ankane Jun 10, 2024
ebf4336
Improved style [skip ci]
ankane Jun 18, 2024
869f141
Updated license year [skip ci]
ankane Jun 24, 2024
a19430b
Version bump to 0.4.2 [skip ci]
ankane Jun 24, 2024
cbbe4db
Improved code [skip ci]
ankane Jun 25, 2024
3047aa6
Improved test [skip ci]
ankane Jul 21, 2024
506eba2
Updated cache action [skip ci]
ankane Jul 27, 2024
90b8495
Test with Active Record 7.2.0 on CI
ankane Aug 10, 2024
8b5c4dd
Updated link [skip ci]
ankane Aug 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 26 additions & 7 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,31 @@ name: build
on: [push, pull_request]
jobs:
build:
if: "!contains(github.event.head_commit.message, '[skip ci]')"
strategy:
fail-fast: false
matrix:
include:
- ruby: 3.3
gemfile: Gemfile
- ruby: 3.2
gemfile: gemfiles/activerecord71.gemfile
- ruby: 3.1
gemfile: gemfiles/activerecord70.gemfile
- ruby: "3.0"
gemfile: gemfiles/activerecord61.gemfile
- ruby: 2.7
gemfile: gemfiles/activerecord60.gemfile
runs-on: ubuntu-latest
env:
BUNDLE_GEMFILE: ${{ matrix.gemfile }}
steps:
- uses: actions/checkout@v2
- uses: ruby/setup-ruby@v1
with:
ruby-version: 2.7
bundler-cache: true
- run: bundle exec rake test
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: ${{ matrix.ruby }}
bundler-cache: true
- uses: actions/cache@v4
with:
path: ~/.disco
key: disco
- run: bundle exec rake test
63 changes: 63 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,66 @@
## 0.4.2 (2024-06-24)

- Removed dependency on `csv` gem for `load_movielens`

## 0.4.1 (2024-05-23)

- Reduced memory for `item_recs` and `similar_users`

## 0.4.0 (2023-01-30)

- Fixed issue with `has_recommended` and inheritance with Rails < 6.1
- Deprecated marshal serialization
- Dropped support for Ruby < 2.7 and Rails < 6

## 0.3.2 (2022-09-26)

- Fixed issue when `fit` is called multiple times

## 0.3.1 (2022-07-10)

- Added support for JSON serialization

## 0.3.0 (2022-03-22)

- Changed `item_id` to `user_id` for `similar_users`
- Changed warning to an error when `value` passed to `fit`
- Changed to use Faiss over NGT for `optimize_item_recs` and `optimize_similar_users` when both are installed
- Removed dependency on `wilson_score` gem for `top_items`
- Dropped support for Ruby < 2.6

## 0.2.9 (2022-03-22)

- Fixed error with `load_movielens`

## 0.2.8 (2022-03-13)

- Fixed error with `top_items` with all same rating

## 0.2.7 (2021-08-06)

- Added warning for `value`

## 0.2.6 (2021-02-24)

- Improved performance
- Improved `inspect` method
- Fixed issue with `similar_users` and `item_recs` returning the original user/item
- Fixed error with `fit` after loading

## 0.2.5 (2021-02-20)

- Added `top_items` method
- Added `optimize_similar_users` method
- Added support for Faiss for `optimize_item_recs` and `optimize_similar_users` methods
- Added `rmse` method
- Improved performance

## 0.2.4 (2021-02-15)

- Added `user_ids` and `item_ids` methods
- Added `user_id` argument to `user_factors`
- Added `item_id` argument to `item_factors`

## 0.2.3 (2020-11-28)

- Added `predict` method
Expand Down
4 changes: 3 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ gemspec

gem "rake"
gem "minitest", ">= 5"
gem "activerecord"
gem "activerecord", "~> 7.2.0"
gem "sqlite3"
gem "daru"
gem "matrix" # for daru
gem "rover-df"
gem "ngt", ">= 0.3.0"
gem "faiss"
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2019-2020 Andrew Kane
Copyright (c) 2019-2024 Andrew Kane

MIT License

Expand Down
95 changes: 60 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
- Works with explicit and implicit feedback
- Uses high-performance matrix factorization

[![Build Status](https://github.com/ankane/disco/workflows/build/badge.svg?branch=master)](https://github.com/ankane/disco/actions)
[![Build Status](https://github.com/ankane/disco/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/disco/actions)

## Installation

Add this line to your application’s Gemfile:

```ruby
gem 'disco'
gem "disco"
```

## Getting Started
Expand All @@ -35,24 +35,24 @@ recommender.fit([

> IDs can be integers, strings, or any other data type

If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating, or use a value like number of purchases, number of page views, or time spent on page:
If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating.

```ruby
recommender.fit([
{user_id: 1, item_id: 1, value: 1},
{user_id: 2, item_id: 1, value: 1}
{user_id: 1, item_id: 1},
{user_id: 2, item_id: 1}
])
```

> Use `value` instead of rating for implicit feedback
> Each `user_id`/`item_id` combination should only appear once

Get user-based (user-item) recommendations - “users like you also liked”
Get user-based recommendations - “users like you also liked”

```ruby
recommender.user_recs(user_id)
```

Get item-based (item-item) recommendations - “users who liked this item also liked”
Get item-based recommendations - “users who liked this item also liked”

```ruby
recommender.item_recs(item_id)
Expand Down Expand Up @@ -99,18 +99,13 @@ recommender.item_recs("Star Wars (1977)")
[Ahoy](https://github.com/ankane/ahoy) is a great source for implicit feedback

```ruby
views = Ahoy::Event.
where(name: "Viewed post").
group(:user_id).
group("properties->>'post_id'"). # postgres syntax
count
views = Ahoy::Event.where(name: "Viewed post").group(:user_id).group_prop(:post_id).count

data =
views.map do |(user_id, post_id), count|
views.map do |(user_id, post_id), _|
{
user_id: user_id,
item_id: post_id,
value: count
item_id: post_id
}
end
```
Expand Down Expand Up @@ -181,26 +176,26 @@ user.update_recommended_products_v2(recs)
user.recommended_products_v2
```

For Rails < 6, speed up inserts by adding [activerecord-import](https://github.com/zdennis/activerecord-import) to your app.

## Storing Recommenders

If you’d prefer to perform recommendations on-the-fly, store the recommender

```ruby
bin = Marshal.dump(recommender)
File.binwrite("recommender.bin", bin)
json = recommender.to_json
File.write("recommender.json", json)
```

> You can save it to a file, database, or any other storage system
The serialized recommender includes user activity from the training data (to avoid recommending previously rated items), so be sure to protect it. You can save it to a file, database, or any other storage system, or use a tool like [Trove](https://github.com/ankane/trove). Also, user and item IDs should be integers or strings for this.

Load a recommender

```ruby
bin = File.binread("recommender.bin")
recommender = Marshal.load(bin)
json = File.read("recommender.json")
recommender = Disco::Recommender.load_json(json)
```

Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor). See the [examples](https://github.com/ankane/neighbor/tree/master/examples/disco).

## Algorithms

Disco uses high-performance matrix factorization.
Expand All @@ -226,16 +221,26 @@ recommender.fit(data, validation_set: validation_set)

## Cold Start

Collaborative filtering suffers from the [cold start problem](https://www.yuspify.com/blog/cold-start-problem-recommender-systems/). It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.
Collaborative filtering suffers from the [cold start problem](https://en.wikipedia.org/wiki/Cold_start_(recommender_systems)). It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.

```ruby
recommender.user_recs(new_user_id) # returns empty array
```

There are a number of ways to deal with this, but here are some common ones:

- For user-based recommendations, show new users the most popular items.
- For item-based recommendations, make content-based recommendations with a gem like [tf-idf-similarity](https://github.com/jpmckinney/tf-idf-similarity).
- For user-based recommendations, show new users the most popular items
- For item-based recommendations, make content-based recommendations with a gem like [tf-idf-similarity](https://github.com/jpmckinney/tf-idf-similarity)

Get top items with:

```ruby
recommender = Disco::Recommender.new(top_items: true)
recommender.fit(data)
recommender.top_items
```

This uses [Wilson score](https://www.evanmiller.org/how-not-to-sort-by-average-rating.html) for explicit feedback and item frequency for implicit feedback.

## Data

Expand All @@ -257,45 +262,65 @@ Or a Daru data frame
Daru::DataFrame.from_csv("ratings.csv")
```

## Faster Similarity
## Performance

If you have a large number of users/items, you can use an approximate nearest neighbors library like [NGT](https://github.com/ankane/ngt) to speed up item-based recommendations and similar users.
If you have a large number of users or items, you can use an approximate nearest neighbors library like [Faiss](https://github.com/ankane/faiss) to improve the performance of certain methods.

Add this line to your application’s Gemfile:

```ruby
gem 'ngt', '>= 0.3.0'
gem "faiss"
```

Speed up the `user_recs` method with:

```ruby
recommender.optimize_user_recs
```

Speed up item-based recommendations with:
Speed up the `item_recs` method with:

```ruby
model.optimize_item_recs
recommender.optimize_item_recs
```

Speed up similar users with:
Speed up the `similar_users` method with:

```ruby
model.optimize_similar_users
recommender.optimize_similar_users
```

This should be called after fitting or loading the model.
This should be called after fitting or loading the recommender.

## Reference

Get ids

```ruby
recommender.user_ids
recommender.item_ids
```

Get the global mean

```ruby
recommender.global_mean
```

Get the factors
Get factors

```ruby
recommender.user_factors
recommender.item_factors
```

Get factors for specific users and items

```ruby
recommender.user_factors(user_id)
recommender.item_factors(item_id)
```

## Credits

Thanks to:
Expand Down
54 changes: 53 additions & 1 deletion Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,57 @@ task default: :test
Rake::TestTask.new do |t|
t.libs << "test"
t.pattern = "test/**/*_test.rb"
t.warning = false
t.warning = false # for daru
end

# TODO use benchmark-ips
def benchmark_user_recs(name, recommender)
ms = Benchmark.realtime do
recommender.user_ids.each do |user_id|
recommender.user_recs(user_id)
end
end
puts "%-8s %f" % [name, ms]
end

# TODO use benchmark-ips
def benchmark_item_recs(name, recommender)
ms = Benchmark.realtime do
recommender.item_ids.each do |item_id|
recommender.item_recs(item_id)
end
end
puts "%-8s %f" % [name, ms]
end

namespace :benchmark do
task :user_recs do
require "bundler/setup"
Bundler.require
require "benchmark"

data = Disco.load_movielens
recommender = Disco::Recommender.new
recommender.fit(data)

benchmark_user_recs("none", recommender)
recommender.optimize_user_recs
benchmark_user_recs("faiss", recommender)
end

task :item_recs do
require "bundler/setup"
Bundler.require
require "benchmark"

data = Disco.load_movielens
recommender = Disco::Recommender.new
recommender.fit(data)

benchmark_item_recs("none", recommender)
recommender.optimize_item_recs(library: "ngt")
benchmark_item_recs("ngt", recommender)
recommender.optimize_item_recs(library: "faiss")
benchmark_item_recs("faiss", recommender)
end
end
Loading