-
Notifications
You must be signed in to change notification settings - Fork 7
Transform and Stack
To facilitate computations, including storage of intermediate results, the Stack class enables manipulations of DataSets as a stack. The Transform class has a Stack and enables operations on that stack.
class Transform(object):
def __init__(self, dataSrc, dataSink, limit=1024):
self.source = dataSrc
self.sink = dataSink
self.stack = Stack()
self.window = limit
self.keep = 0
self.ops = {
"+" : self.add,
"-" : self.subtract,
"*" : self.multiply,
"/" : self.divide,
'--' : self.diff,
'grad' : self.gradient,
'hist' : self.histogram,
'min' : self.min,
'max' : self.max,
'mean' : self.mean,
'std' : self.std
}
...
The Transform can initially populated from a DataSource, which therefore eases the model of reading in raw data from an existing database, using it as the basis for some derived data (e.g., rates, thresholds), and saving the derived data. Items can also be pushed onto the Stack from externally. This is illustrated below:
In python, you can store your setup material in a file and load it:
> cat loads.py
from sosdb import Sos
from numsos.DataSource import SosDataSource
from numsos.DataSink import SosDataSink
from numsos.Transform import Transform
from numsos.Stack import Stack
src = SosDataSource()
src.config(path='/dir/my-container')
src.show_schemas()
src.show_schema('meminfo_E5-2698')
src.select(['timestamp','component_id','Active'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')
dst1 = src.get_results()
then:
> python
In [1]: %run loads.py
You can push the current DataSet to the Stack in a Transform:
# Here we create a Transform, which is _not_ associated with a DataSource or DataSink:
In [9]: t = Transform(None, None)
# There is nothing on the Stack
In [10]: t.show()
# Push the current DataSet to the Stack:
In [11]: t.push(dst1)
Out[11]: <sosdb.DataSet.DataSet at 0x7f22dc97e450>
In [12]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Active']
You can examine the DataSet at the top of the Stack:
In [13]: t.top().show(limit=3)
timestamp component_id Active
---------------- ---------------- ----------------
2018-02-16T17:59:13.003055 12.0 82672.0
2018-02-16T17:59:14.002905 12.0 82672.0
2018-02-16T17:59:15.002761 12.0 82672.0
---------------- ---------------- ----------------
3 results
To get delta values for Active, you need to subtract values of Active, grouped by component_id and in timestamp order. You can do this by the following:
# The operation will operate on the DataSet at the top of the Stack.
# _keep_ is a list of columns from the original DataSet which will be retained in the result.
# This is a feature of operations in the _Transform_ class, which would not be available in
# numpy or basic _DataSet_ operations
In [15]: t.diff(['Active'], group_name="component_id",keep=['timestamp'])
Out[15]: <sosdb.DataSet.DataSet at 0x7f22d6c29fd0>
In [16]: t.show()
[TOP] 900 ['component_id', 'timestamp', 'Active_diff']
In [21]: t.top().show(limit=3)
component_id timestamp Active_diff
---------------- ---------------- ----------------
12.0 2018-02-16T17:59:13.003055 0.0
12.0 2018-02-16T17:59:14.002904 0.0
12.0 2018-02-16T17:59:15.002760 0.0
---------------- ---------------- ----------------
3 results
This leverages that fact that the select statement was ordered by 'comp_time', which means the results are time ordered for each component.
If you had passed a DataSource as a parameter, you can load its data onto the Stack:
In [24]: t_src = Transform(src, None)
In [25]: t_src.show()
In [26]: t_src.begin()
Out[26]: <sosdb.DataSet.DataSet at 0x7f22d6bfbe10>
# In this case,
In [27]: t_src.show()
[TOP] 1000 ['timestamp', 'component_id', 'Active']
When a Transform uses a DataSource, the data on the top of the Stack will appear to be that of the last select on the source. Additional calls to select and begin will add additional items to the Stack:
In [64]: src.select(['timestamp','component_id','Inactive'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')
In [65]: t_src.show()
[TOP] 1000 ['timestamp', 'component_id', 'Active']
In [66]: t_src.begin()
Out[66]: <sosdb.DataSet.DataSet at 0x7f22d6bbd950>
In [67]: t_src.show()
[TOP] 1000 ['timestamp', 'component_id', 'Inactive']
[ 1] 1000 ['timestamp', 'component_id', 'Active']
Changing the select for a DataSource once it has been pushed onto the Stack does not change the Stack:
In [69]: t = Transform(None, None)
In [70]: dst1 = src.get_results()
In [71]: dst1.series
Out[71]: ['timestamp', 'component_id', 'Inactive']
In [72]: t.push(dst1)
Out[72]: <sosdb.DataSet.DataSet at 0x7f22d6bbdb10>
In [73]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Inactive']
In [74]: src.select(['timestamp','component_id','Active','MemFree'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')
In [75]: dst1 = src.get_results()
In [76]: dst1.series
Out[76]: ['timestamp', 'component_id', 'Active', 'MemFree']
In [77]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Inactive']
Currently, operations that fail remove the item from the Stack. This is issue 2 -- under consideration is a debug mode where the entire stack would be copied and then be able to be restored, if an exception were thrown, however that would affect performance.
Operations that succeed replace the top item. You can duplicate items you want to preserve:
In [86]: t = Transform(None, None)
In [87]: src.select(['timestamp','component_id','Active','MemFree'],from_ = ['meminfo_E5-2698'],order_by = 'comp_time')
In [88]: dst1 = src.get_results()
In [89]: t.push(dst1)
Out[89]: <sosdb.DataSet.DataSet at 0x7f22d6bbd290>
In [90]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Active', 'MemFree']
# duplicate the top item
In [91]: t.dup()
Out[91]: <sosdb.DataSet.DataSet at 0x7f22d6bbd290>
In [92]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Active', 'MemFree']
[ 1] 1000 ['timestamp', 'component_id', 'Active', 'MemFree']
# A successful query replaces the top item:
In [93]: t.diff(['Active'], group_name='component_id',keep=['timestamp','component_id','Active'])
Out[93]: <sosdb.DataSet.DataSet at 0x7f22d6bbd150>
In [94]: t.show()
[TOP] 900 ['timestamp', 'component_id', 'Active', 'Active_diff']
[ 1] 1000 ['timestamp', 'component_id', 'Active', 'MemFree']
# a failed operation currently removes the item on the top of the stack that it failed upon:
In [93]: t.diff(['Inactive'], group_name='component_id',keep=['timestamp','component_id','Inactive'])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-93-d1b9acf0438c> in <module>()
----> 1 t.diff(['Inactive'], group_name='component_id',keep=['timestamp','component_id','Inactive'])
....
KeyError: 'Inactive'
In [95]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Active', 'MemFree']
Series can be renamed (This is a SOS feature, does not require numSOS):
# Starting from:
In [144]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Active', 'MemFree']
[ 1] 1000 ['timestamp', 'component_id', 'Active', 'Junk']
In [145]: t.top().rename('Active','Foo')
In [146]: t.show()
[TOP] 1000 ['timestamp', 'component_id', 'Foo', 'MemFree']
[ 1] 1000 ['timestamp', 'component_id', 'Active', 'Junk']
Next: need iteration examples
- SOS QuickStart - includes creating SOS from CSV
- Building
- Viewing Class Documentation
- numSOS overview - python queries to numSOS data objects.