Skip to content

Writing your own UI for Facade

Brian Warner edited this page Mar 20, 2018 · 1 revision

Facade is designed so that the UI remains separate from the analysis engine. This is because the original author is admittedly terrible at UI work, and cobbled something together using a quick and easy language, with the hopes that somebody, anybody, could do better than him.

Anyhow, from a logistical standpoint it should be pretty straightforward to introduce an alternate UI. At a high level, the separation of duties between frontend and backend are like this:

Frontend:

  • Add new projects
  • Add new repos associated with those projects
  • Add/edit mappings between domains/emails and affiliations
  • Add/edit mappings between primary emails and aliases
  • View results
  • Download a CSV of the results
  • Edit, export, or input settings

Backend:

  • All of the initial setup and configuration
  • All of the repository cloning, updating, and removal
  • All of the analysis
  • All of the caching
  • Creating an optional Excel file with custom contents

Basically, so long as you can feed new projects and repos into the database and edit the aliases and affiliations, the backend can do the rest. The specifics are documented below.

Mandatory stuff to feed the analysis engine

Adding a project

The projects table looks like this:

+---------------+------------------+------+-----+----------------------+--------------------------------+
| Field         | Type             | Null | Key | Default              | Extra                          |
+---------------+------------------+------+-----+----------------------+--------------------------------+
| id            | int(10) unsigned | NO   | PRI | NULL                 | auto_increment                 |
| name          | varchar(128)     | NO   |     | NULL                 |                                |
| description   | varchar(256)     | YES  |     | NULL                 |                                |
| website       | varchar(128)     | YES  |     | NULL                 |                                |
| recache       | tinyint(1)       | YES  |     | 1                    |                                |
| last_modified | timestamp(6)     | NO   |     | CURRENT_TIMESTAMP(6) | on update CURRENT_TIMESTAMP(6) |
+---------------+------------------+------+-----+----------------------+--------------------------------+

To create a new project, add the following:

  • name - The name of the project

You can optionally add description and/or website, but these are just cosmetic. Don't touch id, recache, or last_modified.

Removing a project

The frontend is responsible for some cleanup when removing a project, although facade-worker.py does the hard work. Here are the actions you should take to delete a project, in order:

  1. For each repo in repos where projects_id matches the project you're deleting:
    • If the repo's status is still '(New)' then there aren't any files on disk. You can just delete the entry from repos and move on.
    • If the repo's status is anything other than '(New)' then there may be files to clean up. Change status to 'Delete' in the repos table, and facade-worker.py will do the rest the next time it runs.
  2. Delete entries from exclude where projects_id matches the project you're deleting.
  3. Set the project name to '(Queued for removal)' in projects.

Next time facade-worker.py runs, it will

Adding a repo

The repos table looks like this:

+-------------+------------------+------+-----+----------------------+----------------+
| Field       | Type             | Null | Key | Default              | Extra          |
+-------------+------------------+------+-----+----------------------+----------------+
| id          | int(10) unsigned | NO   | PRI | NULL                 | auto_increment |
| projects_id | int(10) unsigned | NO   |     | NULL                 |                |
| git         | varchar(256)     | NO   |     | NULL                 |                |
| path        | varchar(256)     | YES  |     | NULL                 |                |
| name        | varchar(256)     | YES  |     | NULL                 |                |
| added       | timestamp(6)     | NO   |     | CURRENT_TIMESTAMP(6) |                |
| status      | varchar(32)      | NO   |     | NULL                 |                |
+-------------+------------------+------+-----+----------------------+----------------+

To associate a repo with a project, add the following:

  • projects_id - Must match an id in the projects table
  • git - Must be the full url, e.g. git://... or https://...
  • status - Set this to 'New'

Don't touch id, path, name, or added.

Deleting a repository

The frontend can sometimes handle cleanup when removing a repo, so long as it hasn't been cloned yet. If it has been cloned, facade-worker.py will clear it the next time it runs. Here are the actions you should take to delete a repo, in order:

  1. If the repo's status is still '(New)' then there aren't any files on disk. You can just delete the entry from repos and move on.
  2. If the repo's status is anything other than '(New)' then there may be files to clean up. Change status to 'Delete' in the repos table, and facade-worker.py will do the rest the next time it runs.

Configuration

The configuration table looks like this:

+---------------+------------------+------+-----+----------------------+--------------------------------+
| Field         | Type             | Null | Key | Default              | Extra                          |
+---------------+------------------+------+-----+----------------------+--------------------------------+
| id            | int(10) unsigned | NO   | PRI | NULL                 | auto_increment                 |
| setting       | varchar(32)      | NO   |     | NULL                 |                                |
| value         | varchar(128)     | NO   |     | NULL                 |                                |
| last_modified | timestamp(6)     | NO   |     | CURRENT_TIMESTAMP(6) | on update CURRENT_TIMESTAMP(6) |
+---------------+------------------+------+-----+----------------------+--------------------------------

To add a new setting, add the following:

  • setting - The name of the setting
  • value - The value of the setting

Don't touch id or last_modified. Facade works by taking the last-modified version of a given setting, so to modify a setting, just write a new one. This improves auditability, and ensures you can roll back to a previous setting if you messed something up.

The initial values of all settings are set by utilities/setup.py, or by facade-worker.py during a database upgrade. Currently the settings are:

  • start_date - The earliest date to look for data in the git log.
  • repo_directory - Where cloned repos should be stored.
  • utility_status - Used by facade-worker.py when running. Don't touch this.
  • log_level - Logging verbosity. Options are: ('Error','Quiet','Info','Verbose','Debug')
  • report_date - Whether to use author_date or committer_date when caching reports. Options are: ('author','committer')
  • report_attribution - Whether to use author_email or committer_email when caching reports. Options are: ('author','committer')
  • working_author - Used by facade-worker.py when running. Don't touch this.
  • affiliations_processed - Used by facade-worker.py to determine which affiliations are new. Don't touch this.
  • aliases_processed - Used by facade-worker.py to determine which aliases are new. Don't touch this.
  • google_analytics - A central place to store a Google Analytics ID for use on a web-facing UI.
  • update_frequency - How often Facade will try to pull any given repo. This allows facade-worker.py to run frequently without thrashing the remotes. Options are: (4,12,24)
  • database_version - Used by facade-worker.py to determine if a database update is pending. Don't touch this. It is incremented when facade-worker.py processes a database upgrade.
  • results_visibility - Whether to show the Results tab on the web UI. Generating huge CSVs is time and bandwidth intensive, so you may want to disable it on a public-facing site.

Getting results

You have a few options to get results.

Dumping the cache tables

Dumping the caches will be the fastest way to get consolidated data, and will result in the smallest files. It will have the least extraneous data. For example, the contents of the project_annual_cache will be organized like this:

+-------------+----------------------+------+-----+---------+-------+
| Field       | Type                 | Null | Key | Default | Extra |
+-------------+----------------------+------+-----+---------+-------+
| projects_id | int(10) unsigned     | NO   | MUL | NULL    |       |
| email       | varchar(128)         | NO   |     | NULL    |       |
| affiliation | varchar(128)         | YES  |     | NULL    |       |
| year        | smallint(5) unsigned | NO   |     | NULL    |       |
| added       | bigint(20) unsigned  | NO   |     | NULL    |       |
| removed     | bigint(20) unsigned  | NO   |     | NULL    |       |
| whitespace  | bigint(20) unsigned  | NO   |     | NULL    |       |
| files       | bigint(20) unsigned  | NO   |     | NULL    |       |
| patches     | bigint(20) unsigned  | NO   |     | NULL    |       |
+-------------+----------------------+------+-----+---------+-------+

It should go without saying, but don't edit the cache tables directly. They get overwritten each time facade-worker.py finds new data.

Dumping the analysis_data table

You can also dump the contents of analysis_data, but it's a bit more complicated. A sample export statement is given at the bottom of the page, including JOIN statements that provide the name of the project, the git repo, and which honor any defined excludes.

Please note that exports can take a long time, and you may want to use LIMIT statements to grab chunks of data at a time. This is particularly true for things like PHP that have short default timeouts for long-running scripts.

Optional stuff to improve accuracy

Adding affiliations

The affiliations table looks like this:

+---------------+------------------+------+-----+----------------------+--------------------------------+
| Field         | Type             | Null | Key | Default              | Extra                          |
+---------------+------------------+------+-----+----------------------+--------------------------------+
| id            | int(10) unsigned | NO   | PRI | NULL                 | auto_increment                 |
| domain        | varchar(64)      | NO   | MUL | NULL                 |                                |
| affiliation   | varchar(64)      | NO   |     | NULL                 |                                |
| start_date    | date             | NO   |     | 1970-01-01           |                                |
| active        | tinyint(1)       | NO   |     | 1                    |                                |
| last_modified | timestamp(6)     | NO   |     | CURRENT_TIMESTAMP(6) | on update CURRENT_TIMESTAMP(6) |
+---------------+------------------+------+-----+----------------------+--------------------------------+

To add or modify an affiliation, add or change the following:

  • domain - Can be either a domain or an email address
  • affiliation - This is what will be displayed in the analysis data

You can also define start_date as the time when the affiliation became active - for example, when someone uses the same email address but changes employers, or when a company is bought and the domain changes hands. In this case, you'd have multiple entries in the affiliations file:

This will result in all matched contributions prior to 2011-01-06 being attributed to IBM, all contributions between 2011-01-06 and 2015-07-05 being attributed to The Linux Foundation, and everything after 2015-07-05 being attributed to Samsung.

Don't touch id, active, or last_modified. Simply changing an entry in this table will cause the analysis engine to detect something is different, and make the right updates to the analysis data.

Adding aliases

Aliases allow you to group multiple emails under a single user, for example when somebody changes companies and you want a single view of their contributions, or if somebody fat-fingers their typical email in a commit. The aliases table looks like this:

+---------------+------------------+------+-----+----------------------+--------------------------------+
| Field         | Type             | Null | Key | Default              | Extra                          |
+---------------+------------------+------+-----+----------------------+--------------------------------+
| id            | int(10) unsigned | NO   | PRI | NULL                 | auto_increment                 |
| canonical     | varchar(128)     | NO   | MUL | NULL                 |                                |
| alias         | varchar(128)     | NO   | MUL | NULL                 |                                |
| active        | tinyint(1)       | NO   |     | 1                    |                                |
| last_modified | timestamp(6)     | NO   |     | CURRENT_TIMESTAMP(6) | on update CURRENT_TIMESTAMP(6) |
+---------------+------------------+------+-----+----------------------+--------------------------------+

To add or modify an alias, add the following:

  • canonical - The real email address
  • alias - The alternate email address you wish to mask

Don't touch id, active, or last_modified. Simply changing an entry in this table will cause the analysis engine to detect something is different, and make the right updates to the analysis data.

Removing affiliations and aliases

This is a little less obvious. Because changes are autodetected based upon the last_modified value in this table, you can't just delete stuff. Instead, you need to set active to false. This causes the analysis engine to recognize that an affiliation or alias is no longer in use, and will deassociate that data in the analysis data.

Grouping people into teams

Adding tags

Tags are a way of creating persistent mappings between individuals, so that you can quickly and easily see a subset of contributors within an organization. For example, you would use this if you want to distinguish contributions made by your own team against those of your company as a whole, and don't want to manually track them down each time you export the results. The special_tags table looks like this:

+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| id         | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| email      | varchar(128)     | NO   | MUL | NULL    |                |
| start_date | date             | NO   |     | NULL    |                |
| end_date   | date             | YES  |     | NULL    |                |
| tag        | varchar(64)      | NO   |     | NULL    |                |
+------------+------------------+------+-----+---------+----------------+

To add a new tag, add the following:

  • email - The email of the person you want to add
  • start_date - The date at which the person should be considered on the team
  • tag - The tag you want to use to identify them

You an optionally add an end_date if the person leaves the team. A person can be covered by multiple tags simultaneously. Please note that tags are ephemeral and only used when data is exported. They are not stored with the analysis data, so they must be handled upon export. The best way to do this is determine, for each line of the exported data, if a tag matches the email address and dates and, if so, to write it.

Tags are not stored in a central place, so if you want to remove a tag just delete all of its instances in the special_tags table.

Excluding emails and domains from the analysis

Sometimes there are broad categories of data you know you don't want. For example, if you have a bot which is doing janitorial work, or if you have a large company dominating a project and only want to see stats for everybody else. This is handled by the exclude table:

+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| id          | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| projects_id | int(10) unsigned | NO   |     | NULL    |                |
| email       | varchar(128)     | YES  |     | NULL    |                |
| domain      | varchar(128)     | YES  |     | NULL    |                |
+-------------+------------------+------+-----+---------+----------------+

To add a new exclude rule, add the following:

  • projects_id - The project to which the rule should apply, from the id field in projects
  • Either an email or domain to be excluded, but you don't need both

Don't touch id.

The exclude table is used when caching results for display. It should also be used when exporting data. This is done using a complicated JOIN statement. Here is an example of an export for the project with id = 1. The exclude statement is marked with arrows:

SELECT p.name AS 'Project name',
        r.path AS 'Repo Path',
        r.name AS 'Repo Name',
        a.author_date AS 'Author Date',
        a.author_name AS 'Author Name',
        a.author_raw_email AS 'Author Raw Email',
        a.author_email AS 'Author Email',
        a.author_affiliation AS 'Author Affiliation',
        a.committer_date AS 'Committer Date',
        a.committer_name AS 'Committer Name',
        a.committer_raw_email AS 'Committer Raw Email',
        a.committer_email AS 'Committer Email',
        a.committer_affiliation AS 'Committer Affiliation',
        a.added AS 'LoC Added',
        a.removed AS 'LoC Removed',
        a.whitespace AS 'Whitespace changes',
        a.commit AS 'Commit',
        a.filename AS 'Filename'
        FROM projects p
        RIGHT JOIN repos r ON p.id = r.projects_id
        RIGHT JOIN analysis_data a ON r.id = a.repos_id
-->     LEFT JOIN exclude e ON (a.author_email = e.email
-->         AND (r.projects_id = e.projects_id
-->         OR e.projects_id = 0))
-->     OR (a.author_email LIKE CONCAT('%',e.domain)
-->         AND (r.projects_id = e.projects_id
-->         OR e.projects_id = 0))
        WHERE p.name != '(Queued for removal)' AND
        r.projects_id = 1
        AND e.email IS NULL
        AND e.domain IS NULL
        ORDER BY a.committer_date ASC