Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various improvements #41

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
root = true

[*]
indent_style = space
indent_size = 2
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not required, but it helps to configure an editor of any contributor to right (for project) settings and prevent too long lines, whitespaces at line ends, no EOL symbols in files, etc. Like this: https://github.com/biola/Voight-Kampff/pull/41/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L54

It's pretty common standard, without OS or editor lock, so I don't see a problem with it, only help.


[*.md]
indent_style = space
indent_size = 4
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ Gemfile.lock
*.rbc
coverage
doc
*.gem
14 changes: 14 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
AllCops:
TargetRubyVersion: 2.4
NewCops: enable

Layout/MultilineMethodCallIndentation:
EnforcedStyle: indented

Metrics/BlockLength:
Exclude:
- spec/**/*

Metrics/LineLength:
Exclude:
- spec/support/*
1 change: 0 additions & 1 deletion .ruby-version

This file was deleted.

2 changes: 2 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

source 'https://rubygems.org'

gemspec
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,29 +28,36 @@ Configuration

A JSON file is used to match [user agent strings](http://simplyfast.info/browser) to a list of known bots.

If you'd like to use an [updated list](https://github.com/monperrus/crawler-user-agents) or make your own customizations, run `rake voight_kampff:import_user_agents`. This will download a `crawler-user-agents.json` file into the `./config` directory.
If you'd like to use an [updated list](https://github.com/monperrus/crawler-user-agents) or make your own customizations, run `rake voight_kampff:import_user_agents`. This will download a `crawler-user-agents.json` file into the `./tmp` directory.

__Note:__ The pattern entries in the JSON file are evaluated as [regular expressions](http://en.wikipedia.org/wiki/Regular_expression).

Usage
-----
There are three ways to use Voight-Kampff

1. Through Rack::Request such as in your [Ruby on Rails](http://rubyonrails.org) controllers:
`request.bot?`
1. Through `Rack::Request` in your app such as [Ruby on Rails](http://rubyonrails.org):
```ruby
request.bot?
```

2. Through the `VoightKampff` module:
AlexWayfer marked this conversation as resolved.
Show resolved Hide resolved
`VoightKampff.bot? 'your user agent string'`
2. Through the `VoightKampff` module:
```ruby
VoightKampff.bot? 'your user agent string'
```

3. Through a `VoightKampff::Test` instance:
`VoightKampff::Test.new('your user agent string').bot?`
3. Through a `VoightKampff::Test` instance:
```ruby
VoightKampff::Test.new('your user agent string').bot?
```

All of the above examples accept `human?` and `bot?` methods. All of these methods will return `true` or `false`.
All of the above examples accept `human?` and `bot?` methods.
All of these methods will return `true` or `false`.

Upgrading to version 1.0
------------------------

Version 1.0 uses a new source for a list of bot user agent strings since the old source was no longer maintained. This new source, unfortuately, does not include as much detail. Therefore the following methods have been deprecated:
Version 1.0 uses a new source for a list of bot user agent strings since the old source was no longer maintained. This new source, unfortunately, does not include as much detail. Therefore the following methods have been deprecated:
- `#browser?`
- `#checker?`
- `#downloader?`
Expand Down Expand Up @@ -81,10 +88,10 @@ If you use Rack, change your gemfile:

FAQ
---
__Q:__ __What's with the name?__
__Q:__ __What's with the name?__
__A:__ It's the [machine in Blade Runner](https://en.wikipedia.org/wiki/Blade_Runner#Voight-Kampff_machine) that is used to test whether someone is a human or a replicant.

__Q:__ __I've found a bot that isn't being matched__
__Q:__ __I've found a bot that isn't being matched__
__A:__ The list is being pulled from [github.com/monperrus/crawler-user-agents](https://github.com/monperrus/crawler-user-agents).
If you'd like to have entries added to the list, please create a pull request with that project. Once that pull request is merged, feel free to create an issue here and I'll release a new gem version with the updated list. In the meantime you can always run `rake voight_kampff:import_user_agents` on your project to get that updated list.

Expand All @@ -97,7 +104,7 @@ Thanks to [github.com/monperrus/crawler-user-agents](https://github.com/monperru

Contributing
------------
PR without tests will not get merged, Make sure you write tests for api and rails app.
PR without tests will not get merged, Make sure you write tests for API and Rails app.
Feel free to ask for help, if you do not know how to write a determined test.

Running Tests?
Expand Down
7 changes: 0 additions & 7 deletions config.ru

This file was deleted.

10 changes: 7 additions & 3 deletions lib/tasks/voight_kampff.rake
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# frozen_string_literal: true

namespace :voight_kampff do
desc 'Import a new crawler-user-agents.json file'
task :import_user_agents, :url do |t, args|
task :import_user_agents, :url do |_t, args|
args.with_defaults url: 'https://raw.githubusercontent.com/monperrus/crawler-user-agents/master/crawler-user-agents.json'

require 'net/http'
Expand All @@ -9,8 +11,10 @@ namespace :voight_kampff do
contents = Net::HTTP.get(uri)

if contents.present?
file = File.open('./config/crawler-user-agents.json', 'w')
file.write(contents.force_encoding(Encoding::UTF_8))
File.write(
'./tmp/crawler-user-agents.json',
contents.force_encoding(Encoding::UTF_8)
)
else
puts "voight_kampff:import_user_agents - empty file received from #{uri}"
end
Expand Down
12 changes: 5 additions & 7 deletions lib/voight_kampff.rb
Original file line number Diff line number Diff line change
@@ -1,23 +1,21 @@
require 'json'
# frozen_string_literal: true

require 'voight_kampff/test'
require 'voight_kampff/methods'

# Class helper methods
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to run RuboCop without it?

module VoightKampff
class << self
def root
require 'pathname'
Pathname.new File.expand_path '..', File.dirname(__FILE__)
end
ROOT = File.expand_path '..', __dir__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use brackets for more readability

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're writing text in natural language (like English) mostly without parentheses, so "readability" is very subjective.


class << self
def human?(user_agent_string)
test(user_agent_string).human?
end

def bot?(user_agent_string)
test(user_agent_string).bot?
end
alias :replicant? :bot?
alias replicant? bot?

private

Expand Down
5 changes: 4 additions & 1 deletion lib/voight_kampff/engine.rb
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# frozen_string_literal: true

module VoightKampff
# Integration with Rails
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary for RuboCop success.

class Engine < Rails::Engine
rake_tasks do
load 'tasks/voight_kampff.rake'
end

initializer :add_voight_kampff_methods do |app|
initializer :add_voight_kampff_methods do |_app|
ActionDispatch::Request.class_eval do
include VoightKampff::Methods
end
Expand Down
19 changes: 12 additions & 7 deletions lib/voight_kampff/methods.rb
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
module VoightKampff::Methods
def human?
VoightKampff::Test.new(user_agent).human?
end
# frozen_string_literal: true

module VoightKampff
# Helper for Rack::Request
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

module Methods
extend Forwardable
def_delegators :voight_kampff, :human?, :bot?, :replicant?

private

def bot?
VoightKampff::Test.new(user_agent).bot?
def voight_kampff
VoightKampff::Test.new(user_agent)
end
end
alias :replicant? :bot?
end
2 changes: 2 additions & 0 deletions lib/voight_kampff/rack.rb
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
# frozen_string_literal: true

require 'voight_kampff'
require 'voight_kampff/rack_request'
6 changes: 3 additions & 3 deletions lib/voight_kampff/rack_request.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# frozen_string_literal: true

# Reopen the Rack::Request class to add bot detection methods
Rack::Request.class_eval do
include VoightKampff::Methods
end
Rack::Request.include VoightKampff::Methods if defined?(Rack::Request)
2 changes: 2 additions & 0 deletions lib/voight_kampff/rails.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

require 'voight_kampff'
require 'voight_kampff/rack'
require 'voight_kampff/engine'
79 changes: 48 additions & 31 deletions lib/voight_kampff/test.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,50 @@
# frozen_string_literal: true

module VoightKampff
# Test User-Agent against Voight-Kampff
class Test
CRAWLERS_FILENAME = 'crawler-user-agents.json'

class << self
def crawlers
@crawlers ||= JSON.parse File.read preferred_path
end

def crawler_regexp
@crawler_regexp ||= begin
# NOTE: This is admittedly a bit convoluted
# but the performance gains make it worthwhile
crawler_patterns =
crawlers.map.with_index do |crawler, index|
"(?<match#{index}>#{crawler['pattern']})"
end.join('|')
crawler_patterns = "(#{crawler_patterns})"
Regexp.new(crawler_patterns, Regexp::IGNORECASE)
end
end

private

def lookup_paths
AlexWayfer marked this conversation as resolved.
Show resolved Hide resolved
# These paths should be orderd by priority
[
(Rails.root if defined? Rails),
Dir.pwd,
VoightKampff::ROOT
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant ROOT is needed only here, may be calculate root path of the gem here without unnecessary constants?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it similar to Rails.root at least, I yes, it's using only in single place right now, but I also think this approach is more safe if the test.rb file will be renamed or just a method will be moved to another file. I see no problem with this simple constant. Again: it's more mnemonic to Rails and maybe even Dir.pwd, also it adds readability (without additional ..s and File and ()).

]
end

def preferred_path
AlexWayfer marked this conversation as resolved.
Show resolved Hide resolved
lookup_paths.find do |base_path|
next unless base_path

path = File.join base_path, 'tmp', CRAWLERS_FILENAME

break path if File.exist? path
end
end
AlexWayfer marked this conversation as resolved.
Show resolved Hide resolved
end

attr_accessor :user_agent_string

def initialize(user_agent_string)
Expand All @@ -19,42 +62,16 @@ def human?
def bot?
!human?
end
alias :replicant? :bot?
alias replicant? bot?

private

def lookup_paths
# These paths should be orderd by priority
base_paths = []
base_paths << Rails.root if defined? Rails
base_paths << VoightKampff.root

base_paths.map { |p| p.join('config', CRAWLERS_FILENAME) }
end

def preferred_path
lookup_paths.find { |path| File.exist? path }
end

def matching_crawler
if match = crawler_regexp.match(@user_agent_string)
index = match.names.first.sub(/match/, '').to_i
crawlers[index]
end
end

def crawler_regexp
@@crawler_regexp ||= begin
# NOTE: This is admittedly a bit convoluted but the performance gains make it worthwhile
index = -1
crawler_patterns = crawlers.map{|c| index += 1; "(?<match#{index}>#{c["pattern"]})" }.join("|")
crawler_patterns = "(#{crawler_patterns})"
Regexp.new(crawler_patterns, Regexp::IGNORECASE)
end
end
match = self.class.crawler_regexp.match(@user_agent_string)
return unless match

def crawlers
@@crawlers ||= JSON.load(File.open(preferred_path, 'r'))
index = match.names.first.sub(/match/, '').to_i
self.class.crawlers[index]
end
end
end
5 changes: 4 additions & 1 deletion spec/controllers/replicants_controller_spec.rb
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# frozen_string_literal: true

require 'spec_helper'

describe ReplicantsController, type: :controller do
let(:user_agent_string) { '' }
before do
expect_any_instance_of(ActionController::TestRequest).to receive(:user_agent).and_return user_agent_string
expect_any_instance_of(ActionController::TestRequest)
.to receive(:user_agent).and_return user_agent_string
get :index
end

Expand Down
13 changes: 8 additions & 5 deletions spec/internal/app/controllers/replicants_controller.rb
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
# frozen_string_literal: true

class ReplicantsController < ActionController::Base
def index
header = "Replicants:\n===========\n"

status, content = if request.bot?
[200, '- Rick Deckard']
else
[403, 'No replicants here']
end
status, content =
if request.bot?
[200, '- Rick Deckard']
else
[403, 'No replicants here']
end

render plain: header + content, status: status
end
Expand Down
2 changes: 2 additions & 0 deletions spec/internal/config/routes.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# frozen_string_literal: true

Rails.application.routes.draw do
resources :replicants, only: :index
root to: 'replicants#index'
Expand Down
8 changes: 6 additions & 2 deletions spec/lib/voight_kampff/rack_request_spec.rb
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# frozen_string_literal: true

require 'spec_helper'

describe Rack::Request do
let(:user_agent_string) { }
let(:env) { {'HTTP_USER_AGENT' => user_agent_string} }
let(:user_agent_string) { nil }
let(:env) { { 'HTTP_USER_AGENT' => user_agent_string } }
subject { Rack::Request.new(env) }

require_relative '../../../lib/voight_kampff/rack_request'

it { expect(subject).to respond_to :human? }
it { expect(subject).to respond_to :bot? }
it { expect(subject).to respond_to :replicant? }
Expand Down
Loading