Skip to content

v0.12.0

Compare
Choose a tag to compare
@aecio aecio released this 18 Jan 17:58
· 178 commits to master since this release

We are pleased to announce version 0.12.0 of ACHE Crawler!

Following is a detailed log of the changes since the last version:

  • Upgrade crawler-commons dependency to version 0.9
  • Removed Elasticsearch transport-client-based repository
  • Removed Elasticsearch 1.4.4 binaries dependency
  • Added DumpDataFromElasticsearch tool for dumping documents from Elasticsearch
    repositories
  • Added configuration for minimum relevance in link selectors
  • Added configuration for selecting whether should re-crawl sitemaps and
    robots.txt links
  • Added documentaion about relevance_threshold parameters to the target page
    classifiers documentation page
  • Added support for crawling via HTTP proxy in okhttp3 fetcher (by @maqzi)
  • Added tracking of more HTTP error messages (301, 302, 3xx, 402) (by @maqzi)
  • Upgrade crawler-commons library to version 1.0
  • Upgrade commons-validator library to version 1.6
  • Upgrade okhttp3 library to version 3.14.0
  • Fix issue #177: Links from recent TLDs are considered invalid
  • Upgrade RocksDB dependency (rocksdbjni) to version 6.2.2
  • Added error code details to RocksDB exception logs
  • Upgrade gradle-node-plugin to version 1.3.1
  • Upgrade npm version to 6.10.2
  • Upgrade ache-dashboard npm dependencies
  • Upgrade gradle wrapper to version 5.6.1
  • Update Dockerfile to use openjdk:11-jdk (Java 11)
  • Added content_type field to RegexTargetClassifier
  • Change default link classifier to LinkClassifierBreadthSearch
  • Update io.airlift:airline dependency to version 0.8
  • Update gradle build script to use new plugins DSL
  • Update coverals gradle plugin to version 2.9.0
  • Update searchkit to version ^2.4.0