When to run CI: A bash script to save CI time

Nick Felker
4 min readJan 13, 2022

Let’s say you manage an open source project. You have your source code and docs all stored in a repo. A volunteer comes across a typo in one of your Markdown files and creates a pull request.

Seems like an easy thing to merge, right? If you spent the time to build a CI/CD pipeline, this pull request now gets stuck in the queue. It may take longer than anyone wants, and may use computing resources just to verify what we already know works.

Transient issues in your CI environment may also raise false warnings, which don’t make sense for the commit. It just changes a Markdown file. Now this helpful change is stuck in a sort of purgatory, creating a poor experience for this helpful volunteer.

Alright, maybe these issues are rare or low priority. But I’ve created a script for CI servers that make this easy to check and verify. I was inspired by this IEEE paper published in the IEEE Transactions on Software Engineering.

In their paper, they focus solely on Java projects for determining which commits to skip. In my version, I wanted to make it thoroughly generic.

That’s why it was written in Bash. It’s an environment generic enough that any developer should be able to use it, whereas a Python script would be harder to justify adding to a Node project.

Usage is very easy. It’s just one line to execute and a few to check.

The git-presubmit-linter project contains a few different rules and tools that I’ve developed during my time at Google to use in the CI environments for some projects I’ve worked on.

Further in this article, I’ll explain how I built the require-ci tool.

State Machine

The script receives the entirety of the git diff output, which is shown below.

You can see the output is broken up into a predictable set of lines:

  • Announcing the filename being modified
  • Setting the position where the diff is located
  • The content being removed
  • The content being added

Then it loops to the start again with another modification. This allows our script to move through each line and keep track of its current context. There are several states that we are going to be defining:

  • The header for a specific diff
  • The line is a modification of code
  • The line is a comment
  • The line is the start of a multiline comment
  • The line is in the middle of a multiline comment
  • The line is the end of a multiline comment

This is generally all of the states we need, though there may be some niche exceptions.

State diagram for script

Above is the state diagram I developed. Each line gets a classification as it is read, and then that affects the behavior of the how the script acts upon the line.

Iterating through lines

In the primary case, where the line is TYPE_DIFF_HEAD, I want to parse the filename and in particular its file extension. I want to know if this is a file extension that can quickly be ignored, such as a .md or .html file. Then I can just jump to the next file without needing to do a more thorough analysis.

Defining file parse parameters

How do I know which files can be ignored? The start of the script defines a number of extensions that should be ignored and those that are source code. A .md can be ignored and a .java needs to be analyzed. But what will happen for file extensions I haven’t anticipated, like a .mp3? Those will raise a flag in the script that CI must proceed, just in case.

I also need to define how comments are formatted in each of my source code languages.

These maps are used in the second half of the line iterator, in the cases where the type is not TYPE_DIFF_HEAD. A grep is used to match against any line added or removed to the code. Then it checks that line against the state machine defined earlier.

If the line is code, and not a comment in any sense, then a flag is raised. Source code has been changed, and CI needs to be run.

At the end of the script, it simply needs to return CURRENT_RESULT, which is set to true for any flag that is raised. If CURRENT_RESULT is non-zero, that means there was a flag. This makes it easy to combine with existing or other CI scripts and tools.

If someone is reading the logs, they know exactly what source files have been changed in a helpful set of messages. But they don’t need to, as this script acts conservatively to only allow changes that do not affect source code or your project at all, which would be documentation and code comments.

I’ve also added a bunch of unit tests to make sure that it runs exactly as expected.

This tool and many others can be viewed on GitHub with a bunch of documentation in the README for how to use it.

--

--

Nick Felker

Social Media Expert -- Rowan University 2017 -- IoT & Assistant @ Google