Skip to content

Releases: pystorm/streamparse

streamparse 3.0.0.dev0

12 Mar 02:12
Compare
Choose a tag to compare
Pre-release

This is the first developer preview release of streamparse 3.0. It has not been tested extensively in production yet, so we are looking for as much feedback as we can get from users who are willing to test it out.

You can install this release via pip with pip install --pre streamparse==3.0.0.dev0. It will not automatically install because it's a pre-release.

⚠️ API Breaking Changes ⚠️

  • Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
  • The deprecated Spout.emit_many method has been removed. (pystorm/pystorm@004dc27)
  • As a consequence of using the new Python Topology DSL, all Bolts and Spouts that emit anything are expected to have the outputs attribute declared. It must either be a list of str or Stream objects, as described in the docs.
  • We temporarily removed the sparse run command, as we've removed all of our Clojure code, and this was the only thing that had to still be done in Clojure. (Watch issue #213 for future developments)

Features

  • Added sparse slot_usage command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218)
  • Can now specify ssh_password in config.json if you don't have SSH keys setup. Storing your password in plaintext is not recommended, but nice to have for local VMs. (PR #224, thanks @motazreda)
  • Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
  • Now remove _resources directory after JAR has been created.

Other Changes

  • Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old storm subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded)
  • All Bolt, Spout, and Topology-related classes are all available directly at the streamparse package level (i.e., you can just do from streamparse import Bolt now) (Commit b9bf4ae).
  • sparse kill now will kill inactive topologies. (Issue #156)
  • All examples now use the Python DSL
  • The Kafka-JVM example has been cleaned up a bit, so now you can click on Storm UI log links and they'll work.

streamparse 2.1.4

11 Jan 18:11
Compare
Choose a tag to compare

This minor release adds support for specifying ui.port in config.json to make the sparse stats and sparse worker_uptime commands work when ui.port is not set to the default 8080.

streamparse 2.1.3

20 Oct 17:07
Compare
Choose a tag to compare

Fix a race condition in TicklessBatchingBolt that could cause a tuple to be part of more than one batch. (PR #193)

streamparse 2.1.2

13 Oct 20:33
Compare
Choose a tag to compare

This release fixes an issue where reraise wasn't being imported from six in bolt.py (commit d743188).

streamparse 2.1.1

13 Oct 20:03
Compare
Choose a tag to compare

This bugfix release just fixes an issue where TicklessBatchingBolt was crashing when trying to handle exceptions in TicklessBatchingBolt.run() (commit 48bace6).

streamparse 2.1.0

01 Oct 14:57
Compare
Choose a tag to compare

Features

  • Added back an updated version of the pre-2.0 BatchingBolt that did not rely on tick tuples called TicklessBatchingBolt. This is useful in cases where you know your spout will not replay tuples after a topology shutdown. Because Storm is not guaranteed to continue to send tick tuples when the topology is shutting down, the standard BatchingBolt may have a batch of tuples waiting to be processed (that were never ACKed) sitting in it when the topology shuts down. When you resubmit and start it back up, those tuples will be lost unless the spout saves state between runs (which is pretty uncommon). With the TicklessBatchingBolt this is much less likely to happen because we use a timer thread which is independent of Storm, which will continue to execute even while the topology is shutting down. As long as the time you give Storm to shutdown is greater than the time it takes to process the last batch, your last batch will always be fully processed. (PR #191)
  • Can now specify virtualenv command-line arguments in config.json via virtualenv_flags (issue #94, PR #159)
  • Added support for pulling out source->stream->fields mapping with Storm 0.10.0+ (commit 61f163d)

Bug fixes

  • Restored --version argument to sparse that was accidentally removed in previous release. (commit 48b6de7)
  • Fixed missing comma in setup.py (issue #160, commit bde3cc3)
  • Fixed issue where an empty tasks.py file (for invoke) was necessary to make fabric pre-submit hooks work. (issue #157, commit a10c478)
  • Fixed issue where run and submit couldn't parse email addresses and git hashes properly (PR #189, thanks @eric7j, commit 8670e3f)
  • Fixed issue where fabric env wasn't being populated when use_virtualenv was False (commit a10c478)
  • Fixed issue where updating virtualenvs would hang when VCS path changed. (commits e923a3c and 3e27cf0)

Documentation

Depedencies

  • simplejson is now a required dependency instead of just recommended (commit 08ef3d4)
  • Updated invoke requirements to now require 0.8 or later (commit 360128c)
  • Updated requirements to specify six >= 1.5 specifically to avoid issues for OS X users (issues #113 and #190, commit a0c1309)

streamparse 2.0.2

29 Jun 17:56
Compare
Choose a tag to compare

This release fixes an issue where tick tuples were not being acked in the new BatchingBolt implementation (e38a024). It also updates the documentation for BatchingBolt to indicate that you can enable tick tuples on a per-bolt basis in your topology file by adding :conf {"topology.tick.tuple.freq.secs", 1} to your python-bolt-spec arguments (d1c405a).

streamparse 2.0.1

23 Jun 13:53
Compare
Choose a tag to compare

This bugfix release fixes an issue where reading non-ASCII messages on Python 2.7 would cause a UnicodeDecodeError (#154). Thanks to @daTokenizer for reporting this!

streamparse 2.0.0

16 Jun 20:27
Compare
Choose a tag to compare

This release adds a bunch of new functionality (e.g., additional subcommands), but also changes some things that were not used by a lot of people in backward-incompatible ways.

⚠️ API BREAKING CHANGES ⚠️

  • BatchingBolt now uses tick tuples instead of a separate timer thread. This is an API-breaking change, as BatchingBolt.secs_between_batches is now BatchingBolt.ticks_between_batches. You also will need to make sure you run your Storm topology with topology.tick.tuple.freq.secs set to how frequently you want the ticks to occur. Read the docs for more details. (#125, #137)
  • streamparse fabric and invoke tasks have been moved to sparse sub-commands:
  • fab remove_logs ➡️ sparse remove_logs
  • fab tail_logs ➡️ sparse tail
  • fab activate_env is no longer necessary, as all commands that need the fabric environment modified do this automatically.
  • fab create_or_update_virtualenvs ➡️ sparse update_virtualenv (note the case change, since this only every worked on a single virtualenv at a time)
  • inv jar_for_deploy ➡️ sparse jar
  • inv list_topologies ➡️ sparse list
  • inv kill_topology ➡️ sparse kill
  • inv run_local_topology ➡️ sparse run
  • inv submit_topology ➡️ sparse submit
  • inv tail_topology ➡️ sparse tail
  • inv visualize_topology ➡️ sparse visualize
  • inv prepare_topology has been removed because the commands that relied on it (sparse run, sparse submit, and sparse jar) all call streamparse.util.prepare_topology automatically.
  • The streamparse.ext package has been removed and so have the streamparse.ext.fabric and streamparse.ext.invoke modules.
  • streamparse.ext.util ➡️ streamparse.util
  • Users should no longer do from streamparse.ext.fabric import * and from streamparse.ext.invoke import * in their projects' fabfile.py and tasks.py files. pre_submit and post_submit hooks will be executed automatically even without this.

Major enhancements

  • sparse run now runs indefinitely by default (#122)
  • Added Bolt.process_tick(tup) method for processing tick tuples (#116, #124)
  • Added sparse worker_uptime and sparse stats commands for getting information about running Storm topologies and their workers. (#17, #52)
  • --ackers and --workers can now be specified as separate arguments to sparse submit and sparse run, instead of just using --par. (#74, #97)
  • Bolt.emit_many() is now deprecated and will be removed in streamparse 3.0. Please just call Bolt.emit() repeatedly instead. (#66)
  • Added lots of documentation about how topologies work and how to get started with streamparse. (#26, #103)
  • Added conda recipe template for building a streamparse conda package. (#105)
  • SSH tunnels are no longer required for kill, list, and submit commands (#96, #98, #112).
  • env.use_ssh_config is True by default now (#54)
  • Can now deploy/build simple JARs in addition to Uber-JARs. This speeds up sparse submit for pure Python projects. (#106)
  • Added sparse jar, sparse remove_logs, and sparse update_virtualenv commands to replace old Fabric and Invoke tasks.

Minor enhancements

  • Removed dependency on docopt and switched to using argparse for command-line arguments. Now sub-commands all have their own detailed --help switches (e.g., sparse run --help) and sparse --help will list all of the available commands with a brief description of what they do. (#115, #152)
  • Added first pieces of support for a Python DSL for defining topologies (#84) as part of a grander vision to move away from Clojure (#136). Please note that this cannot actually be used yet, because the utility to take the Python DSL and then generate something Storm understands out of it has not been written yet.
  • Overhauled unit tests to separate simplify IPC testing (#41, #47).
  • Added documentation on using an unofficial version of Storm (#142)
  • Added support for Tox (#128)
  • Updated spouts and bolts to allow Python tuples to be emitted. (#119)
  • Switched to using Travis Docker containers for building (#90)
  • Made update of virtualenvs optional by seeing if requirements.txt exists (#60)
  • Created a new storm subpackage, which will be split off into its own package (pystorm) for version 3.0 of streamparse. This contains all of the IPC/Multi-Lang related code. In the future streamparse will just be a collection of utilities for managing Storm topologies/clusters.
  • Moved a lot of code from the Spout and Bolt classes into the Component parent class to cut down on code duplication.

Bugfixes

  • Fixed multithreaded emitting (#101, #133)
  • sparse commands that use lein underneath now display output from lein immediately. (#109)
  • Fixed typo in config name to get maxbytes (#110)
  • We now reset Botl.current_tups even when receiving a heartbeat (#107)
  • Spout.emit_many() works again (#144)
  • sparse tail tails all machines now. (#104)

Contributors for this release (by number of commits)

Thanks to all our contributors!

Storm 0.9.3 support

26 Jan 22:40
Compare
Choose a tag to compare

This release adds support for Storm 0.9.3 in addition to a number of bug fixes.
New and updated examples available.