Skip to content

htmldate-0.6.1

Compare
Choose a tag to compare
@adbar adbar released this 17 Jan 12:37
· 389 commits to master since this release

htmldate finds original and updated publication dates of any web page. All the steps needed from web page download to HTML parsing, scraping and text analysis are included.

In a nutshell, with Python:

from htmldate import find_date
find_date('http://blog.python.org/2016/12/python-360-is-now-available.html')
'2016-12-23'
find_date('https://netzpolitik.org/2016/die-cider-connection-abmahnungen-gegen-nutzer-von-creative-commons-bildern/', original_date=True)
'2016-06-23'

On the command-line:

$ htmldate -u http://blog.python.org/2016/12/python-360-is-now-available.html
'2016-12-23'

Releases used in production and meant to be archived on Zenodo for reproducibility and citability.

For more information see htmldate.readthedocs.io