Skip to content

Transform and Stack

oceandlr edited this page Mar 25, 2019 · 25 revisions

To facilitate computations, including storage of intermediate results, the Stack class enables manipulations of DataSets as a stack. The Transform class has a Stack and enables operations on that stack.

class Transform(object):
def __init__(self, dataSrc, dataSink, limit=1024):
    self.source = dataSrc
    self.sink = dataSink
    self.stack = Stack()
    self.window = limit
    self.keep = 0
    self.ops = {
        "+"    : self.add,
        "-"    : self.subtract,
        "*"    : self.multiply,
        "/"    : self.divide,
        '--'   : self.diff,
        'grad' : self.gradient,
        'hist' : self.histogram,
        'min' : self.min,
        'max' : self.max,
        'mean' : self.mean,
        'std'  : self.std
    }
    ...

The Transform can initially populated from a DataSource, which therefore eases the model of reading in raw data from an existing database, using it as the basis for some derived data (e.g., rates, thresholds), and saving the derived data. Items can also be pushed onto the Stack from externally. This is illustrated below:

In python, you can store your setup material in a file and load it:

> cat loads.py
from sosdb import Sos
from numsos.DataSource import SosDataSource
from numsos.DataSink import SosDataSink
from numsos.Transform import Transform
from numsos.Stack import Stack

src = SosDataSource()
src.config(path='/dir/my-container')
src.show_schemas()
src.show_schema('meminfo_E5-2698')
src.select(['timestamp','component_id','Active'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')
dst1 = src.get_results()

then:
> python
In [1]: %run loads.py

You can push the current DataSet to the Stack in a Transform:

# Here we create a Transform, which is _not_ associated with a DataSource or DataSink:
In [9]: t = Transform(None, None)

# There is nothing on the Stack
In [10]: t.show()

# Push the current DataSet to the Stack:
In [11]: t.push(dst1)
Out[11]: <sosdb.DataSet.DataSet at 0x7f22dc97e450>

In [12]: t.show()
[TOP] 1000  ['timestamp', 'component_id', 'Active']

You can examine the DataSet at the top of the Stack:

In [13]: t.top().show(limit=3)
   timestamp     component_id           Active 
---------------- ---------------- ---------------- 
2018-02-16T17:59:13.003055             12.0          82672.0 
2018-02-16T17:59:14.002905             12.0          82672.0 
2018-02-16T17:59:15.002761             12.0          82672.0 
---------------- ---------------- ---------------- 
3 results

To get delta values for Active, you need to subtract values of Active, grouped by component_id and in timestamp order. You can do this by the following:

# The operation will operate on the DataSet at the top of the Stack.
# _keep_ is a list of columns from the original DataSet which will be retained in the result. 
# This is a feature of operations in the _Transform_ class, which would not be available in 
# numpy or basic _DataSet_ operations
In [15]: t.diff(['Active'], group_name="component_id",keep=['timestamp'])
Out[15]: <sosdb.DataSet.DataSet at 0x7f22d6c29fd0>

In [16]: t.show()
[TOP]  900  ['component_id', 'timestamp', 'Active_diff']

In [21]: t.top().show(limit=3)
component_id        timestamp      Active_diff 
---------------- ---------------- ---------------- 
        12.0 2018-02-16T17:59:13.003055              0.0 
        12.0 2018-02-16T17:59:14.002904              0.0 
        12.0 2018-02-16T17:59:15.002760              0.0 
---------------- ---------------- ---------------- 
3 results

This leverages that fact that the select statement was ordered by 'comp_time', which means the results are time ordered for each component.

If you had passed a DataSource as a parameter, you can load its data onto the Stack:

In [24]: t_src = Transform(src, None)

In [25]: t_src.show()

In [26]: t_src.begin()
Out[26]: <sosdb.DataSet.DataSet at 0x7f22d6bfbe10>

# In this case, 
In [27]: t_src.show()
[TOP] 1000  ['timestamp', 'component_id', 'Active']

When a Transform uses a DataSource, the data on the top of the Stack will appear to be that of the last select on the source. Additional calls to select and begin will add additional items to the Stack:

  In [64]: src.select(['timestamp','component_id','Inactive'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')

  In [65]: t_src.show()
  [TOP] 1000  ['timestamp', 'component_id', 'Active']

  In [66]: t_src.begin()
  Out[66]: <sosdb.DataSet.DataSet at 0x7f22d6bbd950>

  In [67]: t_src.show()
  [TOP] 1000  ['timestamp', 'component_id', 'Inactive']
  [  1] 1000  ['timestamp', 'component_id', 'Active']

Changing the select for a DataSource once it has been pushed onto the Stack does not change the Stack:

  In [69]: t = Transform(None, None)

  In [70]: dst1 = src.get_results()

  In [71]: dst1.series
  Out[71]: ['timestamp', 'component_id', 'Inactive']

  In [72]: t.push(dst1)
  Out[72]: <sosdb.DataSet.DataSet at 0x7f22d6bbdb10>

  In [73]: t.show()
  [TOP] 1000  ['timestamp', 'component_id', 'Inactive']

  In [74]: src.select(['timestamp','component_id','Active','MemFree'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')

  In [75]: dst1 = src.get_results()

  In [76]: dst1.series
  Out[76]: ['timestamp', 'component_id', 'Active', 'MemFree']

  In [77]: t.show()
  [TOP] 1000  ['timestamp', 'component_id', 'Inactive']

Currently, operations that fail remove the item from the Stack. This is issue 2 -- under consideration is a debug mode where the entire stack would be copied and then be able to be restored, if an exception were thrown, however that would affect performance.

Operations that succeed replace the top item. You can duplicate items you want to preserve:

In [86]: t = Transform(None, None)

In [87]: src.select(['timestamp','component_id','Active','MemFree'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')

In [88]: dst1 = src.get_results()

In [89]: t.push(dst1)
Out[89]: <sosdb.DataSet.DataSet at 0x7f22d6bbd290>

In [90]: t.show()
[TOP] 1000  ['timestamp', 'component_id', 'Active', 'MemFree']

# duplicate the top item
In [91]: t.dup()
Out[91]: <sosdb.DataSet.DataSet at 0x7f22d6bbd290>

In [92]: t.show()
[TOP] 1000  ['timestamp', 'component_id', 'Active', 'MemFree']
[  1] 1000  ['timestamp', 'component_id', 'Active', 'MemFree']

# A successful query replaces the top item:
In [93]: t.diff(['Active'], group_name='component_id',keep=['timestamp','component_id','Active'])
Out[93]: <sosdb.DataSet.DataSet at 0x7f22d6bbd150>

In [94]: t.show()
[TOP]  900  ['timestamp', 'component_id', 'Active', 'Active_diff']
[  1] 1000  ['timestamp', 'component_id', 'Active', 'MemFree']


# a failed operation currently removes the item on the top of the stack that it failed upon:
In [93]: t.diff(['Inactive'], group_name='component_id',keep=['timestamp','component_id','Inactive'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-93-d1b9acf0438c> in <module>()
----> 1 t.diff(['Inactive'], group_name='component_id',keep=['timestamp','component_id','Inactive'])
....
KeyError: 'Inactive'

In [95]: t.show()
[TOP] 1000  ['timestamp', 'component_id', 'Active', 'MemFree']

Series can be renamed (This is a SOS feature, does not require numSOS):

# Starting from:
In [144]: t.show()
[TOP] 1000  ['timestamp', 'component_id', 'Active', 'MemFree']
[  1] 1000  ['timestamp', 'component_id', 'Active', 'Junk']

In [145]: t.top().rename('Active','Foo')

In [146]: t.show()
[TOP] 1000  ['timestamp', 'component_id', 'Foo', 'MemFree']
[  1] 1000  ['timestamp', 'component_id', 'Active', 'Junk']

Next: need iteration examples

Main

Basic

Data Computations

Reference Docs

Other

Clone this wiki locally