Skip to content
robomc edited this page Sep 14, 2010 · 4 revisions

Dumped from mediawiki

The script uses [http://www.ruby-lang.org/en/downloads/ Ruby 1.8.7]

If you install ruby with a one-click installer with rails rolled in, the only extra libraries you should need are:

  1. highline 1.6.1 (sudo gem install highline)
  2. garb 0.7.6 (sudo gem install garb)

Don’t need to be root on linux, just OSX, I think.

highline is used for formatting some terminal output.

[http://github.com/vigetlabs/garb garb is the analytics library]

You need at least garb 0.7.6 to use analytics segments. (the more complex reports depend upon custom profile segments)

Basic structure===

h3. lib/analytics/reports====

reports boot up the session, and call data classes in lib/analytics/api, and are children of ‘Report’.

returned data from api objects is usually put into the array object @collector. This is then either printed to screen (report.to_screen) or to a file (report.to_file) (the file is defined in the initialization of the report object in as a string in @file_path).

For extremely long slow reports, it is suggested you use the new Array method “@collector.capture string” instead of “@collector << string”, as this will print the results to screen as they are collected, so you can follow output as it happens…

example of a report method:


def page_one
@collector << self.page_heading(1) visits = Visits.new visits.main_graph.each {|thing| @collector << thing} visits.three_with_changes.each {|thing| @collector << thing} uniques = Uniques.new uniques.three_monthly_averages.each {|thing| @collector << thing}

end

==$periods, $display, $profile, Startup, and /bin/==

=$periods=
every report should have a $periods object ($periods = Periods.new), this contains the dates for the report. There are getter and setter methods like: $periods.start_date_reporting, $periods.end_date_baseline etc…

This will usually be populated by methods in the Startup class, like Startup.new.select_reporting_period

Data classes then import date values from $periods on initialization.

$periods can also be changed on the fly with setter methods ($periods.start_date_previous = Date.new(yh1. 2010,mh1. 3,dh1. 9))

Reports are generally structured to a Reporting period (main period). A previous period, which is of the same length, up to the start of the Reporting period, and a baseline period, ending at the same point as the previous, but much longer.

ie:

Reporting period starts 2010-04-01 - 2010-04-30

Previous period starts 2010-03-01 - 2010-03-31

Baseline period starts 2009-03-31 - 2010-03-31

The previous period is usually calculated automatically with the method Startup.new.generate_previous_period

’’’note’’’

Life is a lot simpler when all periods are even sets of months. (ie not 4.5 months etc). This is because some methods use Ruby’s .month methods to adjust ranges, subdivide ranges etc.

Another month related limitation is that you must set the $periods values @reporting_number_of_months and @baseline_number_of_months.

This is usually taken care of by the relevant Startup methods, which will ask the user for the length of the baseline and reporting periods, in months.

These are used by some Crunch class methods which subdivide the baseline period so that it is an average of an equivalent period to the reporting period. So that comparing a 24 month baseline visits count to a 2 month reporting period visits count makes sense.

This allows (horrific) helper methods like:

    @dates << date #first date is returned unchanged
    
    intervals_in_range = months_in_range/month_interval #setting correct ammount of dates to collect
    intervals_in_range-h1.  1 #offset because first value is already entered
    
    while intervals_in_range > 0
      @dates << date.months_ago(month_interval)
      date = date.months_ago(month_interval)
      intervals_in_range-h1.  1    
    end

=$display=

Every report also requires a $display object. These are instances of classes that are children of ‘Display’, like ‘Windows’, ‘Unix’ etc ie $display = Unix.new

Different Display classes handle methods like ask_user, tell_user, alert_user etc differently.

They also manage differently the ‘arrows’ variables. For percentage changes, some increases are good, some are bad etc. The script tells you which it is “up red” for a bad increase, “down green” for a positive decrease etc. What these actually amount to in terms of script output is decided by which Display sub-class is used.

h2. h3. notes on “arrows”==

Every data class has methods up_is_nothing? and up_is_good? which are used by the class Crunch to decide how to interpret percentage changed.

Percentage change results strings will include something like #{self.arrow(@baseline_percentage_change)}; Crunch’s method “arrow” then runs through like:


if change > 0
if self.up_is_nothing?
@arrow = $display.grey_up
return @arrow…. etc etc…

Getting the actual string from $display, depending on what booleans are returned by the calling class’s up_is_nothing? and up_is_good? methods…

This should just work, but something to be aware of when writing new classes, or making major changes to an existing class…

h3. Startup class=

The Startup class contains methods for prompting the user to authenticate, enter date ranges etc. see startup.rb to see what is available.

The Startup class usually instantiates and populates $profile, which is basically a holder for the account, profile and segment details. Every report needs this. It has accessor methods for these to change throughout a report. ie $profile.segment.

Example of a Startup method that populates $profile:


def select_profile
$display.ask_user(‘Enter the profile you want stats for (ie 20425901)’)
chosen_profile = gets.chomp $profile = Profile.new $profile.string = chosen_profile $profile.garb = Garb::Profile.first(chosen_profile)

end

h3. Using segments with $profile=

Set segments like: $profile.segment = "18378974"

Be sure to set segments back to nil when not in use ($profile.segment = nil) (many data methods check for segments with $profile.segment? which is true if the segment variable isn’t nil).

The value segment_string is a sort of ad-hoc segment, which is used by some content related data classes to limit results by path. ie:

  def content_summaries
    
    content = Content.new
    
    $profile.segment_string = "/Contexts"
    
    content.info
    
    $profile.segment_string = "/Contexts/Earthquakes"
    
    content.info

Used by class Content like:

      if $profile.segment_string?
        report.filters :pageviews.gt => @limit, :pagePath.contains => $profile.segment_string
      else
        report.filters :pageviews.gt => @limit
      end

Again you need to set these back to nil to avoid problems…

And segments and segment_string can be combined (ie return values containing /Some_Section/, within segment 128744).

h3. Example /bin/ file=

These objects are generally instantiated in the /bin/ files.

	require File.expand_path(File.dirname(__FILE__) + "/../lib/analytics.rb")

	$display = Unix.new

        $periods = Periods.new
        
	interface = Startup.new

        interface.get_dates_with_options   #populates $periods

	interface.authenticate_session    #creates and populates $profile

	report = TKI_Check.new
	report.all

h3. lib/analytics/api====

The data crunching classes. inherit helper methods from class Crunch (methods like percentage_change, get_array_of_months make_bounces_rates)

Many methods in Crunch depend upon there being a suitable method called ‘arbitrary’ in the calling class. As a test for this, classes in /api contain a method ‘arbitrary?’ which returns true if it has an appropriate ‘arbitrary’ method.

Most classes in /api have a basic method called arbitrary, which takes arguments for date range and possibly other parameters, that gets the data. Then 3 or more methods which call arbitrary, one for each reporting period (‘reporting’, ‘previous’, ‘baseline’), which have the arguments for arbitrary baked in – pulling dates from the $periods object.

So go:


visits = Visits.new

p visits.reporting #don’t need arguments, they use the dates pulled out of $periods as arguments for arbitrary
p visits.previous
p visits.baseline

p visits.arbitrary(some_start_date, some_end_date) #needs date arguments (and possibly others…)

How to use it===
After your initial installation you should navigate to the root folder and run “ruby test/all” to make sure your dependencies are in place.

These tests aren’t as complete as they should be, but the most dodgy and frequently called data classes have a fairly complete range of unit tests. If these pass you can assume you have everything you need to run reports.

To run the tests you may need to enter authentication info in the mockup session classes in /test/session/.

Currently these require a [email protected] login and passwords to work, and test against the sciencelearn and biotechlearn hubs data.

A report should have a file in lib/analytics/report/ that grabs all the numbers and such.

And then a file in /bin/ which boots up the $display, $profile and $periods objects (probably via calls to the Startup class), and calls the main method for the report class.

So running the report should be a case of typing “ruby bin/the_report”, and then entering your date ranges, authentication details etc.

h3. Community usage statistics / CMIS====

~/analytics $ ruby bin/community_usage
~/analytics $ ruby bin/cmis

These reports run through a long list of communities, by profile id:

in /analytics/reports/community_usage.rb:

self.report("14745867", "Ako Panuku") #pass in profile id as string, and name

To get the profile id string, navigate to the profile’s dashboard, and take the first integer from the URL, after id=.

Example:

https://www.google.com/analytics/reporting/?reseth1. 1&idh1. 16845949&pdrh1. 20100707-20100806 The ID is “16845949”

The community_usage report outputs a csv file, for dumping in to the monthly community usage report spreadsheet.

h3. Generic report====

~/analytics $ ruby bin/prototype

This is a generic report. It asks for date ranges and a profile ID, and outputs a basic set of information. This doesn’t really have any advantages over simply using the Analaytics web interface to export a PDF of the dashboard, except that it provides previous and baseline percentage changes for metrics, and crunches monthly uniques for these periods too.

The report lives in lib/reports/prototype.rb, and should be reasonably self explanatory.

Example of the report: image:Science hubs analytics.pdf

h3. Hubs custom reporting====

Example of hubs report:

image:Biotech-March1-June30-v1.pdf‎

The biotech and sciencelearn hubs have custom quarterly reports produced with:

~/analytics $ ruby bin/hubs

You then authenticate – you need to use the [email protected] account at this point, as it contains the appropriate custom segments.

And enter the date range for the reporting period. Baseline and previous are generated for you based on some options.

The profile ID for both the hubs reports are hardcoded in (they are site specific because of segments and path filtering anyway).

This report makes extensive use of segments and ad hoc segment strings. See Custom_analytics_reporting#Using_segments_with_.24profile

Segment codes are laid out in the comments of each report.

Reports are @:

lib/reports/sciencelearn.rb

lib/reports/biotech.rb

h3. Quirks=

The Sciencelearn hubs had an upgrade at some point which changed urls from http://some_url/some_url to http://Some-Url/Some-Url.

This has resulted in some methods that work with pagePath etc to [http://wiki.github.com/vigetlabs/garb/filtering-with-andor use OR filters in blocks like]:

    report.filters do
      contains(:pagePath, path)
    end
    
    report.filters do
      contains(:pagePath, path.downcase)  #for domains that have capitalized and non capitalized instances in their history
    end
    
    report.filters do
      contains(:pagePath, path.downcase.gsub("-", "_"))  #for changes in how spaces are handled...
    end

Note that you can’t mix in filters for metrics with filters for dimensions using OR, in the same request. So where you see filtering OR blocks like above, you can’t then add in another filter on a ‘metric’, ie pageviews, visits etc.

The sciencelearn and biotech learn analytics accounts also make use of a trailing slash filter, taken from here:

[http://insightr.com/blog/2009/9/3/two-google-analytics-filters-that-will-fix-problems-with-dou.html http://insightr.com/blog/2009/9/3/two-google-analytics-filters-that-will-fix-problems-with-dou.html]

There is also a filter to exclude traffic from bugs.cwa.co.nz, which escapes the normal local-network filter, and makes its way into the top 10 or so sources, due to [http://en.wikipedia.org/wiki/Work-life_balance the extra-ordinary out-of-office commitment] of the CWA Hubs team :D

h3. Check====

~/analytics $ ruby bin/check
~/analytics $ ruby bin/cwa_check
~/analytics $ ruby bin/tki_check

The Check class (with children tki_check and cwa_check) collect visits and pageview numbers for the current day and yesterday, for multiple communities, and check for large decreases – flagging a 70%+ drop – and for 0 values for either day.

This is used as a regular last-ditch test to make sure a release hasn’t killed GA tracking.

Reports need to be kept up to date with the list of profiles you want checked.

Reports are @:

lib/reports/cwa_check.rb

lib/reports/tki_check.rb

Logic for the checks is in:

lib/reports/check.rb