Skip to content

Commit

Permalink
Merge pull request #27 from OpenRailAssociation/sbom-generate-other-g…
Browse files Browse the repository at this point in the history
…enerators

Support other SBOM generators
  • Loading branch information
mxmehl committed Sep 20, 2024
2 parents 9fd3953 + a91e53e commit 62015f4
Show file tree
Hide file tree
Showing 9 changed files with 278 additions and 86 deletions.
72 changes: 54 additions & 18 deletions .github/workflows/selftest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,41 +11,77 @@ on:
pull_request:

jobs:
# Generate SBOM using cdxgen, but with NPMJS package, not Docker container
sbom-gen:
# Generate SBOM using syft
sbom-gen-syft:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- run: mkdir -p ~/.local/bin
- name: Install syft
run: curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b ~/.local/bin
- name: Install compliance-assistant
uses: ./.github/actions/poetrybuild
- name: Generate SBOM with syft
run: poetry run compliance-assistant sbom generate -v -g syft -d . -o ${{ runner.temp }}/sbom-syft.json
- name: Store raw SBOM as artifact
uses: actions/upload-artifact@v4
with:
name: sbom-syft
path: ${{ runner.temp }}/sbom-syft.json

# Generate SBOM using cdxgen (npm package)
sbom-gen-cdxgen:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Install cdxgen
run: npm install -g @cyclonedx/cdxgen
- name: Generate CycloneDX SBOM with cdxgen
run: cdxgen -r . -o ${{ runner.temp }}/sbom-raw.json
- name: Install compliance-assistant
uses: ./.github/actions/poetrybuild
- name: Generate SBOM with cdxgen
run: poetry run compliance-assistant sbom generate -v -g cdxgen -d . -o ${{ runner.temp }}/sbom-cdxgen.json
- name: Store raw SBOM as artifact
uses: actions/upload-artifact@v4
with:
name: sbom-raw
path: ${{ runner.temp }}/sbom-raw.json
name: sbom-cdxgen
path: ${{ runner.temp }}/sbom-cdxgen.json

# Enrich the generated SBOM
sbom-enrich:
runs-on: ubuntu-22.04
needs: sbom-gen
needs: [sbom-gen-syft, sbom-gen-cdxgen]
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/poetrybuild
# Download raw SBOM
- uses: actions/download-artifact@v4
# Download raw SBOMs
- name: Download Syft SBOM artifact
uses: actions/download-artifact@v4
with:
name: sbom-syft
path: ${{ runner.temp }}
- name: Download cdxgen SBOM artifact
uses: actions/download-artifact@v4
with:
name: sbom-raw
name: sbom-cdxgen
path: ${{ runner.temp }}
# Run compliance-assistant sbom-enrich
- name: Enrich SBOM
run: poetry run compliance-assistant sbom enrich -v -f ${{ runner.temp }}/sbom-raw.json -o ${{ runner.temp }}/sbom-enriched.json
# Show and upload enriched SBOM
- name: Print SBOM content
run: cat ${{ runner.temp }}/sbom-enriched.json
- name: Store enriched SBOM as artifact
- name: Enrich Syft SBOM
run: poetry run compliance-assistant sbom enrich -v -f ${{ runner.temp }}/sbom-syft.json -o ${{ runner.temp }}/sbom-syft-enriched.json
- name: Enrich cdxgen SBOM
run: poetry run compliance-assistant sbom enrich -v -f ${{ runner.temp }}/sbom-cdxgen.json -o ${{ runner.temp }}/sbom-cdxgen-enriched.json
# Show enriched SBOMs
- name: Print enriched Syft SBOM content
run: cat ${{ runner.temp }}/sbom-syft-enriched.json
- name: Print enriched cdxgen SBOM content
run: cat ${{ runner.temp }}/sbom-cdxgen-enriched.json
# Compare licensing
- name: Print licenses as found in Syft SBOM
run: poetry run compliance-assistant licensing list -f ${{ runner.temp }}/sbom-syft-enriched.json
- name: Print licenses as found in cdxgen SBOM
run: poetry run compliance-assistant licensing list -f ${{ runner.temp }}/sbom-cdxgen-enriched.json
# Store SBOMs as artifacts
- name: Store enriched SBOMs as artifact
uses: actions/upload-artifact@v4
with:
name: sbom-enriched
path: ${{ runner.temp }}/sbom-enriched.json
name: sboms-enriched
path: ${{ runner.temp }}/sbom-*-enriched.json
43 changes: 17 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,16 @@ SPDX-License-Identifier: Apache-2.0
- **License and Copyright Information Retrieval**: Fetch licensing and copyright details for a single package from ClearlyDefined.
- **License compliance support**: Extract and unify licenses from SBOM, suggest possible license outbound candidates

Some of these features are made possible by excellent programs such as [flict](https://github.com/vinland-technology/flict) and [cdxgen](https://github.com/CycloneDX/cdxgen).
Some of these features are made possible by excellent programs such as [flict](https://github.com/vinland-technology/flict), [cdxgen](https://github.com/CycloneDX/cdxgen) and [syft](https://github.com/anchore/syft/).

## Requirements

- Python 3.10+
- Internet connection for accessing ClearlyDefined services
- [Docker](https://www.docker.com/) for generating SBOMs
- At least one SBOM generator:
- [syft](https://github.com/anchore/syft/)
- [cdxgen](https://github.com/CycloneDX/cdxgen)
- [Docker](https://www.docker.com/) for generating SBOMs with the dockerized cdxgen

## Installation

Expand Down Expand Up @@ -108,10 +111,11 @@ For each command, you can get detailed options, e.g., `compliance-assistant sbom

### Examples

* Create an SBOM for the current directory: `compliance-assistant sbom generate -d .`
* Create an SBOM for the current directory using [syft](https://github.com/anchore/syft/): `compliance-assistant sbom generate -g syft -d . -o /tmp/my-sbom.json`
* Enrich an SBOM with ClearlyDefined data: `compliance-assistant sbom enrich -f /tmp/my-sbom.json -o /tmp/my-enriched-sbom.json`
* Extract certain data from an SBOM: `compliance-assistant sbom parse -f /tmp/my-enriched-sbom.json -e purl,copyright,name`
* Gather ClearlyDefined licensing/copyright information for one package: `compliance-assistant clearlydefined fetch -p pkg:pypi/[email protected]`
* Get all licenses found in the enriched SBOM: `compliance-assistant licensing list -f /tmp/my-enriched-sbom.json -o plain`
* Get license outbound candidate based on licenses from SBOM: `compliance-assistant licensing outbound -f /tmp/my-enriched-sbom.json`

### Run as GitHub workflow
Expand All @@ -126,23 +130,8 @@ on:
types: [published]

jobs:
# Generate raw SBOM using cdxgen, but with NPMJS package, not Docker container
sbom-gen:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Install cdxgen
run: npm install -g @cyclonedx/cdxgen
- name: Generate CycloneDX SBOM with cdxgen
run: cdxgen -r . -o ${{ runner.temp }}/sbom-raw.json
- name: Store raw SBOM as artifact
uses: actions/upload-artifact@v4
with:
name: sbom-raw
path: ${{ runner.temp }}/sbom-raw.json

# Enrich the generated SBOM
sbom-enrich:
# Generate the SBOM with syft and enrich the generated SBOM
sbom-generate-and-enrich:
runs-on: ubuntu-22.04
needs: sbom-gen
steps:
Expand All @@ -154,12 +143,14 @@ jobs:
cache: "pip"
- name: Install compliance-assistant
run: pip install compliance-assistant
# Download raw SBOM
- uses: actions/download-artifact@v4
with:
name: sbom-raw
path: ${{ runner.temp }}
# Run compliance-assistant sbom-enrich
# Install syft
- run: mkdir -p ~/.local/bin
- name: Install syft
run: curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b ~/.local/bin
# Generate SBOM with syft via compliance-assistant
- name: Generate SBOM with syft
run: poetry run compliance-assistant sbom generate -g syft -d . -o ${{ runner.temp }}/sbom-raw.json
# Enrich SBOM with compliance-assistant
- name: Enrich SBOM
run: compliance-assistant sbom enrich -f ${{ runner.temp }}/sbom-raw.json -o ${{ runner.temp }}/sbom-enriched.json
# Upload enriched SBOM as artifact
Expand Down
88 changes: 60 additions & 28 deletions complassist/_clearlydefined.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,11 @@ def purl_to_cd_coordinates(purl: str) -> str:
}
coordinates["provider"] = replacer(coordinates["type"], type_to_provider)

return "/".join([v for _, v in coordinates.items()])
coordinates_string = "/".join([v for _, v in coordinates.items()])

logging.debug("Converted '%s' to '%s'", purl, coordinates_string)

return coordinates_string


def _cdapi_call(
Expand All @@ -74,7 +78,7 @@ def _cdapi_call(
basepath: str = "definitions",
json_dict: dict | list | None = None,
**params: str,
) -> dict:
) -> dict | None:
"""
Makes a request to the ClearlyDefined API.
Expand Down Expand Up @@ -111,12 +115,19 @@ def _cdapi_call(
# Return JSON response if possible
try:
return result.json()
except JSONDecodeError:
except (JSONDecodeError, AttributeError):
logging.debug("JSON return is no valid JSON")
return {"result": result.text}
except AttributeError:
logging.warning("API call did not return a valid response. No ClearlyDefined returned")
return {"result": "error"}
if basepath != "harvest":
try:
error_msg = result.content.decode("UTF-8")
except: # pylint: disable=bare-except
error_msg = result.content
logging.warning(
"Unexpected JSON decoding error as result from %s: %s",
url,
error_msg,
)
return None


def _extract_license_copyright(cd_api_response: dict) -> tuple[str, str]:
Expand Down Expand Up @@ -204,53 +215,74 @@ def get_clearlydefined_license_and_copyright(coordinates: str) -> tuple[str, str
"""
api_return = _cdapi_call(coordinates, expand="-files")

declared_license, copyrights = _extract_license_copyright(api_return)
if api_return:
declared_license, copyrights = _extract_license_copyright(api_return)

# Declared license couldn't be extracted. Add to harvest
if not declared_license:
_handle_missing_license_and_request_harvest(coordinates)
# Declared license couldn't be extracted. Add to harvest
if not declared_license:
_handle_missing_license_and_request_harvest(coordinates)

return declared_license, copyrights
return declared_license, copyrights

# If no valid API result, return empty license and copyright
return "", ""


def get_clearlydefined_license_and_copyright_in_batches(
purls: list[str],
) -> dict[str, tuple[str, str]]:
"""
Retrieves the declared license for multiple purls from ClearlyDefined.
Retrieves the declared license and detected copyright for multiple Package
URLs from ClearlyDefined.
Queries the ClearlyDefined API to get the declared license for the provided
packages via Package URLs. If no license is found, it initiates a
harvest request.
Queries the ClearlyDefined API to retrieve both the declared license and the
detected copyright attributions for multiple packages specified via Package
URLs. If no declared license is found for a package, a harvest request is
initiated.
Args:
coordinates (str): The ClearlyDefined coordinates or Package URL for
which to retrieve the license.
purls (list[str]): A list of Package URLs (purls) for which to retrieve
the license and copyright information.
Returns:
tuple[str, str]: A tuple containing:
- The declared license as a string, or an empty string if not found.
- The detected copyright attributions as a single string, with each
attribution separated by a newline, or an empty string if not
attribution separated by a newline, or an empty string if none are
found.
Returns a dict of the provided purls and empty tuples if the
ClearlyDefined API did not return valid data.
"""
# Create connections between coordinates <-> purl
coordinates_purls = {purl_to_cd_coordinates(purl): purl for purl in purls}
# Request the CD API for the coordinates
api_return = _cdapi_call(
path="", method="POST", json_dict=list(coordinates_purls.keys()), expand="-files"
)

result: dict[str, tuple[str, str]] = {}
for pkg_coordinates, cd_data in api_return.items():
pkg_purl = coordinates_purls[pkg_coordinates]
declared_license, copyrights = _extract_license_copyright(cd_data)
if api_return:
result: dict[str, tuple[str, str]] = {}
for pkg_coordinates, cd_data in api_return.items():
# Fetch the corresponding PURL for the coordinates
pkg_purl = coordinates_purls[pkg_coordinates]

# Declared license couldn't be extracted. Add to harvest
if not declared_license:
_handle_missing_license_and_request_harvest(pkg_coordinates)
# Extract license and copyright data from the CD API return
declared_license, copyrights = _extract_license_copyright(cd_data)

# Declared license couldn't be extracted. Add to harvest
if not declared_license:
_handle_missing_license_and_request_harvest(pkg_coordinates)

result[pkg_purl] = (declared_license, copyrights)
result[pkg_purl] = (declared_license, copyrights)

return result
return result

logging.warning(
"No valid data from ClearlyDefined received for the following packages: %s",
", ".join(purls),
)
return {purl: ("", "") for purl in purls}


def print_clearlydefined_result(results: tuple[str, str]) -> None:
Expand Down
1 change: 0 additions & 1 deletion complassist/_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ def replacer(string: str, replacement_dict: dict) -> str:
"""
if string in replacement_dict:
replacement = replacement_dict.get(string, "")
logging.debug("Replace '%s' by '%s'", string, replacement)
return replacement

return string
Expand Down
9 changes: 6 additions & 3 deletions complassist/_licensing.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
def _extract_license_expression_and_names_from_sbom(
sbom_path: str, flict_simplify: bool = False
) -> tuple[list[str], list[str]]:
"""Exract all SPDX expressions and license names from an SBOM"""
"""Extract all SPDX expressions and license names from an SBOM"""
lic_expressions = []
lic_names = []

Expand All @@ -32,11 +32,14 @@ def _extract_license_expression_and_names_from_sbom(
if lic_expression := entry.get("expression", ""):
lic_expressions.append(lic_expression)
# Use license name instead
else:
lic_dict: dict = entry.get("license", {})
elif lic_dict := entry.get("license", {}):
if lic_name := lic_dict.get("name", ""):
lic_names.append(lic_name)

# No license found. Warn user
if not licenses_short:
logging.info("No licensing data found for %s (%s)", item.get("name"), item.get("purl"))

# Make expressions and names unique, and sort them
expressions = sorted(list(set(lic_expressions)))
# If using flict, simplify these found licenses. Will reduce possible
Expand Down
2 changes: 1 addition & 1 deletion complassist/_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def configure_logger(args) -> logging.Logger:
level=logging.INFO,
)
# Adapt logging level
if args.verbose:
if getattr(args, "verbose", False):
log.setLevel("DEBUG")
# Activate extreme logging for requests to also get POST data
if hasattr(args, "http_debug") and args.http_debug:
Expand Down
Loading

0 comments on commit 62015f4

Please sign in to comment.