Portage CD

Build portage

Portage CD Logo

Portage CD is a secure, continuous delivery pipeline designed to orchestrate the process of building and scanning an application image for security vulnerabilities. It solves the problem of having to configure a hardened-predefined security pipeline using traditional CI/CD. Portage CD can be statically compiled as a binary and run on virtually any platform, CI/CD environment, or locally.

Getting Started

Install Prerequisites:

  • Container Engine
  • Docker or Podman CLI
  • Golang >= v1.22.0
  • Just (optional)

Prerequisite Tools (For running portage local)

  • Gitleaks
  • Semgrep
  • Syft
  • Grype
  • ClamAV (only the clamscan and freshclam cli utilities are needed)
  • ORAS

Compiling Portage CD

Running the just recipe will put the compiled-binary into ./bin

just build

OR compile manually

git clone <this-repo> <target-dir>
cd <target-dir>
mkdir bin
go build -o bin/portage ./cmd/portage

Optionally, if you care to include metadata about the version of portage (displayed when you run portage version), use the following build arguments

go build -ldflags="-X 'main.cliVersion=$(git describe --tags)' -X 'main.gitCommit=$(git rev-parse HEAD)' -X 'main.buildDate=$(date -u +%Y-%m-%dT%H:%M:%SZ)' -X 'main.gitDescription=$(git log -1 --pretty=%B)'" -o ./bin ./cmd/portage

Running A Pipeline

You can run the executable directory

portage run debug

Configuring a Pipeline

Configuration Options:

  • Configuration via CLI flags
  • Environment Variables
  • Config File in JSON
  • Config File in YAML
  • Config File in TOML

Configuration Order-of-Precedence:

  1. CLI Flag
  2. Environment Variable
  3. Config File Value
  4. Default Value

Note: (none) means unset, left blank

Config KeyEnvironment VariableDefault ValueDescription
codescan.enabledPORTAGE_CODE_SCAN_ENABLED1Enable/Disable the code scan pipeline
codescan.gitleaksfilenamePORTAGE_CODE_SCAN_GITLEAKS_FILENAMEgitleaks-secrets-report.jsonThe filename for the gitleaks secret report - must contain 'gitleaks'
codescan.gitleakssrcdirPORTAGE_CODE_SCAN_GITLEAKS_SRC_DIR.The target directory for the gitleaks scan
codescan.semgrepfilenamePORTAGE_CODE_SCAN_SEMGREP_FILENAMEsemgrep-sast-report.jsonThe filename for the semgrep SAST report - must contain 'semgrep'
codescan.semgreprulesPORTAGE_CODE_SCAN_SEMGREP_RULESp/defaultSemgrep ruleset manual override
codescan.semgrepexperimentalPORTAGE_CODE_SCAN_SEMGREP_EXPERIMENTALfalseEnable the use of the semgrep experimental CLI
deploy.enabledPORTAGE_IMAGE_PUBLISH_ENABLED1Enable/Disable the deploy pipeline
deploy.gatecheckconfigfilenamePORTAGE_DEPLOY_GATECHECK_CONFIG_FILENAME-The filename for the gatecheck config
gatecheckbundlefilenamePORTAGE_GATECHECK_BUNDLE_FILENAMEartifacts/gatecheck-bundle.tar.gzThe filename for the gatecheck bundle, a validatable archive of security artifacts
imagebuild.argsPORTAGE_IMAGE_BUILD_ARGS-Comma seperated list of build time variables
imagebuild.builddirPORTAGE_IMAGE_BUILD_DIR.The build directory to using during an image build
imagebuild.cachefromPORTAGE_IMAGE_BUILD_CACHE_FROM-External cache sources (e.g., "user/app:cache", "type=local,src=path/to/dir")
imagebuild.cachetoPORTAGE_IMAGE_BUILD_CACHE_TO-Cache export destinations (e.g., "user/app:cache", "type=local,src=path/to/dir")
imagebuild.dockerfilePORTAGE_IMAGE_BUILD_DOCKERFILEDockerfileThe Dockerfile/Containerfile to use during an image build
imagebuild.enabledPORTAGE_IMAGE_BUILD_ENABLED1Enable/Disable the image build pipeline
imagebuild.platformPORTAGE_IMAGE_BUILD_PLATFORM-The target platform for build
imagebuild.squashlayersPORTAGE_IMAGE_BUILD_SQUASH_LAYERS0squash image layers - Only Supported with Podman CLI
imagebuild.targetPORTAGE_IMAGE_BUILD_TARGET-The target build stage to build (e.g., [linux/amd64])
imagepublish.bundletagPORTAGE_IMAGE_PUBLISH_BUNDLE_TAGThe full image tag for the target gatecheck bundle image blob
imagepublish.enabledPORTAGE_IMAGE_PUBLISH_ENABLED1Enable/Disable the image publish pipeline
imagescan.clamavfilenamePORTAGE_IMAGE_SCAN_CLAMAV_FILENAMEclamav-virus-report.txtThe filename for the clamscan virus report - must contain 'clamav'
imagescan.enabledPORTAGE_IMAGE_SCAN_ENABLED1Enable/Disable the image scan pipeline
imagescan.grypeconfigfilenamePORTAGE_IMAGE_SCAN_GRYPE_CONFIG_FILENAME-The config filename for the grype vulnerability report
imagescan.grypefilenamePORTAGE_IMAGE_SCAN_GRYPE_FILENAMEgrype-vulnerability-report-full.jsonThe filename for the grype vulnerability report - must contain 'grype'
imagescan.syftfilenamePORTAGE_IMAGE_SCAN_SYFT_FILENAMEsyft-sbom-report.jsonThe filename for the syft SBOM report - must contain 'syft'

Running in Docker

When running portage in a docker container there are some pipelines that need to run docker commands. In order for the docker CLI in the portage to connect to the docker daemon running on the host machine, you must either mount the /var/run/docker.sock in the portage container, or provide configuration for accessing the docker daemon remotely with the DOCKER_HOST environment variable.

If you don't have access to Artifactory to pull in the Omnibus base image, you can build the image manually which is in images/omnibus/Dockerfile.

Using /var/run/docker.sock

This approach assumes you have the docker daemon running on your host machine.

Example:

docker run -it --rm \
  `# Mount your Dockerfile and supporting files in the working directory: /app` \
  -v "$(pwd):/app:ro" \
  `# Mount docker.sock for use by the docker CLI running inside the container` \
  -v "/var/run/docker.sock:/var/run/docker.sock" \
  `# Run the portage container with the desired arguments` \
  portage run image-build

Using a Remote Daemon

For more information see the Docker CLI and Docker Daemon documentation pages.

Using Podman in Docker

In addition to building images with Docker it is also possible to build them with podman. When running podman in docker it is necessary to either launch the container in privileged mode, or to run as the podman user:

docker run --user podman -it --rm \
  `# Mount your Dockerfile and supporting files in the working directory: /app` \
  -v "$(pwd):/app:ro" \
  `# Run the portage container with the desired arguments` \
  portage run image-build -i podman

If root access is needed, the easiest solution for using podman inside a docker container is to run the container in "privileged" mode:

docker run -it --rm \
  `# Mount your Dockerfile and supporting files in the working directory: /app` \
  -v "$(pwd):/app:ro" \
  `# Run the container in privileged mode so that podman is fully functional` \
  --privileged \
  `# Run the portage container with the desired arguments` \
  portage run image-build -i podman

Using Podman in Podman

To run the portage container using podman the process is quite similar, but there are a few additional security options required:

podman run --user podman  -it --rm \
  `# Mount your Dockerfile and supporting files in the working directory: /app` \
  -v "$(pwd):/app:ro" \
  `# Run the container with additional security options so that podman is fully functional` \
  --security-opt label=disable --device /dev/fuse \
  `# Run the portage container with the desired arguments` \
  portage run image-build -i podman

Getting Started

Required Tools

The following are required tools for building and running Portage

Go

The Portage CD is written in Go.

To install on a Mac, install using Homebrew:

brew install go

Optional: if you would like Go built tools to be available locally on the command line, add the following to your ~/.zshrc or ~/.zprofile file:

# Go
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

If you are new to Go, or would like a refresher, here are some recommended resources:

Optional Tools

The following are optional tools that may be installed to enhance the developer experience.

mdbook

mdbook is written in Rust and requires Rust to be installed as a pre-requisite.

To install Rust on a Mac or other Unix-like OS:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

If you've installed rustup in the past, you can update your installation by running:

rustup update

Once you have installed Rust, the following command can be used to build and install mdbook:

cargo install mdbook

Once mdbook is installed, you can serve it by going to the directory containing the mdbook markdown files and running:

mdbook serve

just

just is "just" a command runner. It is a handy way to save and run project-specific commands.

To install just on a Mac:

You can use the following command on Linux, MacOS, or Windows to download the latest release, just replace <destination directory> with the directory where you'd like to put just:

curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to <destination directory>

For example, to install just to ~/bin:

# create ~/bin
mkdir -p ~/bin

# download and extract just to ~/bin/just
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to ~/bin

# add `~/bin` to the paths that your shell searches for executables
# this line should be added to your shell's initialization file,
# e.g. `~/.bashrc` or `~/.zshrc`
export PATH="$PATH:$HOME/bin"

# just should now be executable
just --help

Portage CD CLI Configuration

The Portage CD CLI provides a set of commands to manage the configuration of your portage. These commands allow you to initialize, list variables, render, and convert configuration files in various formats.

This documentation provides a comprehensive overview of the configuration management capabilities available in the Portage CD CLI. For further assistance or more detailed examples, refer to the CLI's help command or the official documentation.

Configuring Using Environment Variables, CLI Arguments, or Configuration Files

The Portage CD supports flexible configuration methods to suit various operational environments. You can configure the engine using environment variables, command-line (CLI) arguments, or configuration files in JSON, YAML, or TOML formats. This flexibility allows you to choose the most convenient way to set up your portage based on your deployment and development needs.

Configuration Precedence

The Portage CD uses Viper under the hood to manage its configurations, which follows a specific order of precedence when merging configuration options:

  1. Command-line Arguments: These override values specified through other methods.
  2. Environment Variables: They take precedence over configuration files.
  3. Configuration Files: Supports JSON, YAML, and TOML formats. The engine reads these files if specified and merges them into the existing configuration.
  4. Default Values: Predefined in the code.

Using Environment Variables

Environment variables are a convenient way to configure the application in environments where file access might be restricted or for overriding specific configurations without changing the configuration files.

To use environment variables:

  • Prefix your environment variables with a specific prefix (e.g., WF_) to avoid conflicts with other applications.
  • Use the environment variable names that correspond to the configuration options you wish to set.

Using CLI Arguments

CLI arguments provide a way to specify configuration values when running a command. They are useful for temporary overrides or when scripting actions. For each configuration option, there is usually a corresponding flag that can be passed to the command.

For example:

./portage run image-build --build-dir . --dockerfile custom.Dockerfile

Using Configuration Files

Configuration files offer a structured and human-readable way to manage your application settings. The Portage CD supports JSON, YAML, and TOML formats, allowing you to choose the one that best fits your preferences or existing infrastructure.

  • JSON: A lightweight data-interchange format.
  • YAML: A human-readable data serialization standard.
  • TOML:A minimal configuration file format that's easy to read due to its clear semantics.

To specify which configuration file to use, you can typically pass the file path as a CLI argument or set an environment variable pointing to the file.

Merging Configuration

Portage CD merges configuration from different sources in the order of precedence mentioned above. If the same configuration is specified in multiple places, the source with the highest precedence overrides the others. This mechanism allows for flexible configuration strategies, such as defining default values in a file and overriding them with environment variables or CLI arguments as needed.

Commands - Managing the configuration file

config init

Initializes the configuration file with default settings.

config vars

Lists supported built-in variables that can be used in templates.

config render

Renders a configuration template using the --file flag or STDIN and writes the output to STDOUT.

config convert

Converts a configuration file from one format to another.

Examples

Render Configuration Template

Rendering a configuration template from config.json.tmpl to JSON format:

$ cat config.json.tmpl | ./portage config render

Output:

{
  "image": {...},
  "artifacts": {...}
}

Convert Configuration Format

Attempting to convert the configuration without specifying required flags results in an error:

$ cat config.json.tmpl | ./portage config render  | ./portage config convert

Error Output:

Error: at least one of the flags in the group [file input] is required

Successful conversion from JSON to TOML format:

$ cat config.json.tmpl | ./portage config render  | ./portage config convert -i json -o toml

Output:

[image]
buildDir = '.'
...

Image Build

Command Parameters

Build Directory

CLI FlagVariable NameConfig Field Name
--build-dirPORTAGE_BUILD_DIRimage.buildDir

The directory from which to build the container (typically, but not always, the directory where the Dockerfile is located). This parameter is optional, expects a string value, and defaults to the current working directory.

Dockerfile

CLI FlagVariable NameConfig Field Name
--dockerfilePORTAGE_BUILD_DOCKERFILEimage.buildDockerfile

Build Args

CLI FlagVariable NameConfig Field Name
--build-argPORTAGE_BUILD_ARGSimage.buildArgs

Defines build arguments that are passed to the actual container image build command. This parameter is optional, and expects a mapping of string keys to string values, the exact format of which depends on the medium by which it is specified.

CLI Flag

The --build-arg flag can be specified multiple times to specify different args. The key and value for each arg should be specified as a string in the format key=value.

Environment Variable

The PORTAGE_BUILD_ARGS environment variable must contain all the build arguments in a JSON formatted object (i.e. {"key":"value"}).

Configuration File

Similar to how build args are specified as an environment variable, build args in config files must be specified as a JSON formatted object. The following is an example YAML config file:

image:
  buildArgs: |-
    { "key": "value" }

Note that when specifying build args via the configuration file, special care must be taken to ensure that the case of the key is preserved. In the above example the value of buildArgs is a string, not a YAML object. When using a JSON config file this would need to be specified as follows:

{
	"image": {
		"buildArgs": "{ \"key\": \"value\" }"
	}
}

This is because the portage configuration file loader does not preserve the case of keys, and build args in Dockerfiles are case sensitive.

Tag

CLI FlagVariable NameConfig Field Name
--tagPORTAGE_BUILD_TAGimage.buildTag

Platform

CLI FlagVariable NameConfig Field Name
--platformPORTAGE_BUILD_PLATFORMimage.buildPlatform

Target

CLI FlagVariable NameConfig Field Name
--targetPORTAGE_BUILD_TARGETimage.buildTarget

For multi-stage Dockerfiles this parameter specifies a named stage to build.

Cache To

CLI FlagVariable NameConfig Field Name
--cache-toPORTAGE_BUILD_CACHE_TOimage.buildCacheTo

Cache From

CLI FlagVariable NameConfig Field Name
--cache-fromPORTAGE_BUILD_CACHE_FROMimage.buildCacheFrom

Squash Layers

CLI FlagVariable NameConfig Field Name
--squash-layersPORTAGE_BUILD_SQUASH_LAYERSimage.buildSquashLayers

Security Analysis

Overview

lorem ipsum blah blah text

Security Analysis Docs

Code-scan

Overview

lorem ipsum stuff

Using Code-scan On CLI

portage run code-scan [flags]

Flags

FlagsDefinition
--gitleaks-filename stringthe output filename for the gitleaks vulnerability report
-h, --helphelp for code-scan
--semgrep-experimentaluse the osemgrep statically compiled binary
--semgrep-filename stringthe output filename for the semgrep vulnerability report
--semgrep-rules stringthe rules semgrep will use for the scan

Code-scan Security Tools

Semgrep

Semgrep Logo

Table of Contents

  1. Overview
  2. Configuration
  3. Rulesets
  4. Logging Semgrep with portage
  5. Handling False Positives & Problematic File(s)
  6. Official Semgrep Documentation & Resources

Overview

Semgrep is a static code analysis tool that provides a range of features for detecting and preventing security vulnerabilities and bugs in software. It is designed to help businesses improve their applications' security, increase reliability, and reduce the complexity and cost of performing code analysis. As applications become more complex and interconnected, it becomes increasingly difficult to identify and fix security vulnerabilities and bugs before they are exploited or cause problems in production. This can result in security breaches, data loss, and other issues that can damage a business's reputation and success.

Supported Languages

Apex · Bash · C · C++ · C# · Clojure · Dart · Dockerfile · Elixir · HTML · Go · Java · JavaScript · JSX · JSON · Julia · Jsonnet · Kotlin · Lisp · Lua · OCaml · PHP · Python · R · Ruby · Rust · Scala · Scheme · Solidity · Swift · Terraform · TypeScript · TSX · YAML · XML · Generic (ERB, Jinja, etc.)

Supported Package Managers

C# (NuGet) · Dart (Pub) · Go (Go modules, go mod) · Java (Gradle, Maven) · Javascript/Typescript (npm, Yarn, Yarn 2, Yarn 3, pnpm) · Kotlin (Gradle, Maven) · PHP (Composer) · Python (pip, pip-tool, Pipenv, Poetry) · Ruby (RubyGems) · Rust (Cargo) · Scala (Maven) · Swift (SwiftPM)

Configuration

Under the hood, Workflow engine runs Semgrep with certain flags as its base. Workflow engine then continues on to do further improvements on functionality as a security tool and user experience based on the output of on one of the two optional commands.

Runs this over your git repository:

semgrep ci --json --config [semgrep-rule-config-file]

or this when affixed with the --semgrep-experimental flag:

osemgrep ci --json --experimental --config [semgrep-rule-config-file]

Semgrep with portage Code-scan

On the command line use the following with the necessary flags below in your git repo:

  portage run code-scan [semgrep-flags]

Flags


Input Flag:

  --semgrep-rules string

The input of a .yaml,.toml, or .json file with a ruleset Semgrep will use while scanning your code. More on rulesets here. This can be further configured by specifying the filename with path into an environment variable or portage config keys within portage-config.yaml.


Output Flag:

  --semgrep-filename string    

The filename for Semgrep to output as a vulnerability report. More on the vulnerability reports here.


Toggle Osemgrep Flag:

  --semgrep-experimental

Use Semgrep's experimental features that are still in beta that have the potential to increase vulnerability detection. Furthermore uses osemgrep a variant built upon Semgrep with OpenSSF Security Metrics in mind.


Env Variables

Config KeyEnvironment VariableDefault ValueDescription
codescan.semgrepfilenamePORTAGE_CODE_SCAN_SEMGREP_FILENAMEsemgrep-sast-report.jsonThe filename for the semgrep SAST report - must contain 'semgrep'
codescan.semgreprulesPORTAGE_CODE_SCAN_SEMGREP_RULESp/defaultSemgrep ruleset manual override
codescan.semgrepexperimentalPORTAGE_CODE_SCAN_SEMGREP_EXPERIMENTALfalseEnable the use of the semgrep experimental CLI

Rulesets

rules:
  - id: dangerously-setting-html
    languages:
      - javascript
    message: dangerouslySetInnerHTML usage! Don't allow XSS!
    pattern: ...dangerouslySetInnerHTML(...)...
    severity: ERROR
    files:
      - "*.jsx"
      - "*.js"

Semgrep operates on a set of rulesets given by the user to determine on what terms are best to scan your code. These rulesets are given by files with the .yaml, .json or .toml extension.

To identify vulnerabilities at a basic level Semgrep requires:

  • Language to target
  • Message to display on vulnerability detection
  • Pattern(s) to match
  • Severity Rating from lowest to highest:
    • INFO
    • WARNING
    • ERROR

Furthermore there are some advanced options, some which can even amend or exclude certain code snippets.

Typically rules and rulesets have already been written by various developers; thanks to Semgrep's open source nature you can find these below:

Or if you're the type to blaze your own path, here's some documentation on how to write your own custom including examples on advanced pattern matching syntax:


Here below is a rule playground you can test writing your own semgrep rules:

Semgrep Rule Playground

Logging Semgrep with portage

Within portage, semgrep-sast-report.json is the default value for a file that will be the output Semgrep it will appear in the artifacts directory if workflowengine is given read write permissions. As covered above in configuration using the flag --semgrep-filename filename will configure a custom file to output the semgrep-report to.

Furthermore Semgrep when enabled via code-scan, portage run code-scan -v will output the Semgrep outputs with verbosity along with other code-scan tools.

The contents of the semgrep-sast-report.json contains rules and snippets of code that have potential vulnerabilities as well as amended code that has been fixed with the tag fix in the rule.

Workflow engine uses Gatecheck to 'audit' the semgrep logs once Semgrep has finished. It does so by scanning for vulnerabilities defined by Open Worldwide Application Security Project IDs. portage reads STDERR, where other errors are gathered from code-scan tools, audits them via Gatecheck and outputs this audit to STDOUT. It also releases the logged output files into the artifacts/ directory in your working directory.

Ex. | Check ID | Owasp IDs | Severity | Impact | link | |--------------------------------|---------------------------------------------------------------|-----------|---------|-------| | react-dangerouslysetinnerhtml | A07:2017 - Cross-Site Scripting (XSS), A03:2021 - Injection | ERROR | MEDIUM | |

Handling False Positives & Problematic File(s)

Semgrep is a rather simplistic tool that searches for vulnerabilities in your code based on the rules given to it. It is up to you to handle these false positives and problematic file(s). There are a multitude of ways to handle this that will increase complexity of the base rule but increase its power and specificity.

False Positives

You notice that Semgrep is screaming at you from the console in portage. You rage and rage as your terminal is just polluted with messages for a vulnerability you know is just a false positive.

Nosemgrep

Just add a comment with nosemgrep on the line next to the vulnerability or function head of the block of code and boom, false positives away. This is a full Semgrep blocker, for best practice use // nosemgrep: rule-id-1, rule-id-2, .... to restrict certain rules that cause the false positive. Here's more info on nosemgrep.

Taint Analysis

Of course, the above is somewhat of a workaround and should only be considered mostly when there are only very few areas where false positives occur. The better way to handle false positives is by adding taints to rules when you understand what the root of the false positive, taints can be applied to places with false positive vulnerabilities, prepended with taint_assume_safe_ and given a boolean value. False positive taints are for:

  • Boolean inputs
  • Numeric inputs
  • Index inputs
  • Function names
  • Propagation (must taint its initialization)

Taints can also be used to track variables that can lead to vulnerabilities in code. It allows the developers to see the flow of this potential vulnerability in a large code base. This can be used by tainting the source variable, and the sink, where the variable ends up at a potential vulnerable function. If it mutates it is best to track the propagators and sanitizers of this variable as well. At a high level, these are functions that modify the tainted variable in some way and therefore the taint should change in someway. Here's an example of such a rule with taints. Of course if you'd like to know more, click here to see the official ondocumentation on Semgrep taint analysis.

Problematic File(s)

At a grander scale, if a whole file or directory of files is causing a false positive, or you just don't need to scan these files, there are multitudes of ways to handle this.

Down below are some examples of both:

.Semgrepignore

.semgrepignore is just like a .gitignore file, it simply will show semgrep a list of things to not look at and it will skip over them. Place this file in your root directory or in your working directory. The below specifies don't include the .gitignore to scan and ANY node_modules directory, denoted by '**', will be excluded if this is placed at the root directory.

.gitignore
.env
main_test.go
resources/
**/node_modules/**

Rules with Certain Paths

Semgrep allows two ways inside of a rule to disregard or specify files and directories. These are indicated by first adding the paths field and then adding the exclude and include subfields each with their own lists of files/directories. These values are strings.

Example of Rules with Path Specification and Taints
rules:
  - id: eqeq-is-bad
    mode: taint
    source: $X
    sink: $Y
    sanitizer: clean($X)
    pattern: $X == $Y
    paths:
      exclude:
        - "*_test.go"
        - "project/tests"
      include:
        - "project/server"

Official Semgrep Documentation & Resources

Troubleshooting

Mac M1 Docker Container Execution Failure

If you are running on a Mac M1, and are getting an error similar to:

ERR execution failure error="input:1: container.from.withEnvVariable.withExec.stdout process \"echo sample output from debug container\" did not complete successfully: exit code: 1\n\nStdout:\n\nStderr:\n"

You may need to install colima.

To install colima on a Mac using Homebrew:

brew install colima

Start colima:

colima start --arch x86_64

Then go ahead and run the portage.

Developer Guide

TODO: Project info, goals, etc.

Getting Started

Project Layout

TODO: Add the philosophy behind the project layout

Shell

The Shell package (pkg/shell) is a library of commands and utilities used in portage. The standard library way to execute shell commands is by using the os/exec package which has a lot of features and flexibility. In our case, we want to restrict the ability to arbitrarily execute shell commands by carefully selecting a sub-set of features for each command.

For example, if you look at the Syft CLI reference, you'll see dozens of commands and configuration options. This is all controlled by flag parsing the string of the command. This is an opinionated security pipeline, so we don't need all the features Syft provides. The user shouldn't care that we're using Syft to generate an SBOM which is then scanned by Grype for vulnerabilities. The idea of Portage CD is that it's all abstracted to the Security Analysis pipeline.

In the Shell package, all necessary commands will be abstracted into a native Go object. Only the used features for the given command will be written into this package.

The shell.Executable wraps the exec.Cmd struct and adds some convenient methods for building a command.

syft version -o json

How to execute regular std lib commands with exec.Cmd

cmd := exec.Command("syft", "version", "-o","json")
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// some other options
err := cmd.Run()

There's also additional logic with the os.exec standard library command. Since portage is built around executing external binaries, there is an internal library called the pkg/shell used to abstract a lot of the complexities involved with handling async patterns, possible interrupts, and parameters.

Commands can be represented as functions.

func SyftVersion(options ...OptionFunc) error {
	o := newOptions(options...)
	cmd := exec.Command("syft", "version")
	return run(cmd, o)
}

The OptionFunc variadic parameter allows the caller to modify the behavior of the command with an arbitrary number of OptionFunc(s).

newOptions generates the default Options structure and then applies all of passed in functions. The o variable can now be used to apply parameters to the command before execution.

Returning the run function handles off the execution phase of the command to another function which bootstraps a lot of useful functionality without needing to write supported code for each new command.

For example, if you only want to output what the command would run but not actually run the command,

dryRun := false
SyftVersion(WithStdout(os.Stdout), WithDryRun(dryRun))

This would log the final output command without executing.

The motivation behind this architecture is to simply the Methods for all sub-commands on an executable.

Implementing a new sub command is trivial, just write a new function with the same pattern

func SyftHelp(options ...OptionFunc) error {
	o := newOptions(options...)
	cmd := exec.Command("syft", "--help")
	return run(cmd, o)
}

If we wanted to build an optionFunc for version to optionally write JSON instead of plain text, it would go in the pkg/shell/shell.go function.

Since there aren't many commands, they all share the same configuration object Options.

func WithJSONOutput(enabled bool) OptionFunc {
	return func(o *Options) {
		o.JSONOutput = true
	}
}

Now, the version function can reference this field and change the shell command

func SyftVersion(options ...OptionFunc) error {
	o := newOptions(options...)
	cmd := exec.Command("syft", "version")
  if o.JSONOutput {
    cmd = exec.Command("syft", "version", "-o", "json")
  }
	return run(cmd, o)
}

See pkg/shell/docker.go for a more complex example of a command with a lot of parameters.

Pipelines

Concepts

Concurrency

An AsyncTask is used to simplify concurrency by providing a few convenient methods.

StreamTo: allows the caller to block and read the stderr log while the command is running. Close: Closes the internal pipe writer, signaling to the pipe reader that it is done writing data. Wait: can be called multiple times, it blocks until Close() is called on the task. Under the hood it uses a ctx.

The general idea is that a "task" can be used in a goroutine in the background until the command is complete. This strategy enables a bunch of jobs to be kicked off in goroutines and stream stderr output in any order.

The pattern used in the image-scan and code-scan pipelines uses methods with parameters than defines task to task dependencies. For example, grype cannot be run until syft runs and generates the SBOM, so the function for the grypeJob has a syftTask parameter so it can wait for the syft command to finish.

For situations where the output of one command can be piped into another, there is a stdoutBuf field on AsyncTask that can be used to temporarily store the output in memory until the command is complete.

Originally, io.Pipes were used here but it makes the logic of the command very complicated and not very readable even though it would technically be more efficient than storing in memory.

The Async Task also wraps the stderr output with a label and timing capability, so the user can see how long each task takes to complete.

Documentation

Too Long; Might Read (TL;MR)

A collection of thoughts around design decisions made in Portage CD, mostly ramblings that some people may or may not find useful.

Why CI/CD Flexible Configuration is Painful

In a traditional CI/CD environment, you would have to parse strings to build the exact command you want to execute.

Local Shell:

syft version

GitLab CI/CD Configuration let's use declare the execution environment by providing an image name

syft-version:
  stage: scan
  image: anchore/syft:latest
  script:
    - syft version

What typically happens is configuration creep. If you need to print the version information in JSON, (one of the many command options), you would have to provide multiple options in GitLab, only changing the script block, hiding each on behind an env variable

.syft:
  stage: scan
  image: anchore/syft:latest

syft-version:text:
  extends: .syft
  script:
    - syft version
  rules:
    - if: $SYFT_VERSION_JSON != "true"

syft-version:json:
  extends: .syft
  script:
    - syft version -o json
  rules:
    - if: $SYFT_VERSION_JSON == "true"

The complexity increase exponentially in a GitLab CI/CD file for each configuration option you wish to support.

Developer Notes

Tidy First

By: Kent Beck

"Tidy First?" suggests the following:

  • There isn’t a single way to do things, there are things that make sense in context, and you know your context
  • There are many distinct ways to tidy code, which make code easier to work with: guard clauses, removing dead code, normalizing symmetries, and so on
  • Tidying and logic changes are different types of work, and should be done in distinct pull requests
  • This speeds up pull request review, and on high-cohesion teams tidying commits shouldn’t require code review at all
  • Tidying should be done in small amounts, not large amounts
  • Tidying is usually best to do before changing application logic, to the extent that it reduces the cost of making the logical change
  • It’s also OK to tidy after your change, later when you have time, or even never (for code that doesn’t change much)
  • Coupling is really bad for maintainable code

Effective Go

link: Effective Go

Formatting in Go

To format the Go source files, run the following command:

go fmt .

VSCode Setup

Install the Go extension for VSCode for Go language support and highlighting.

If you would like to automatically format on save in VSCode, use the following settings in VSCode:

  1. Press Command ⌘ + , to view the settings.
  2. Search for editor.formatOnSave and set it to true
  3. Search for editor.defaultformatter and set it to Go