Skip to content

003: Select the changes to record

Status

Accepted.

Based on: 002, 004

Context

We need to create some logic that reads from the git log to:

Proposals

Extract the last published changes

We can:

  • Add meaningful tags to the repository: If we create a tag last_weekly on the commit of the last published changed for the weekly feed, then we can add to the changes to publish only those posterior to that commit.
  • Save the last commit ids in a file in the repo: If we create a hidden .last_published.json with the last commit id of each feed type, we can add to the feeds only the changes that are posterior to those commits.
  • Save the last commit ids in the mkdocs.yaml file: We can save the last commit ids in the configuration section of the plugin in the mkdocs.yaml file.
  • Deduce the last published date from existent articles: We already have this information in the existent articles. All we would need to do is analyze the files in the on_files and deduce the last content publication.

Add meaningful tags to the repository

Using tags has the advantage that we could reuse the Repo object to extract the information, which we already use in the services that process the changes. The disadvantage is that we'll pollute the git repository with tags that may not be interesting to the user. In the digital garden case, it's not a problem, but it can be in the changelog one.

Save the last commit ids in a file in the repo

Storing the commit ids in a hidden json file has the advantage that we don't pollute the git repository, we can use the json export and import functionality of pydantic. The disadvantage is that we'll create an additional file in the repository. cruft uses this method.

Save the last commit ids in the mkdocs.yaml file

Storing the commit ids in the mkdocs.yaml file has the advantage that it doesn't pollute either the git repository nor the project directory. The disadvantage is that it pollutes the mkdocs configuration file.

Deduce the last published date from existent articles

Using the existent articles has these advantages:

  • It doesn't pollute either the git repository, nor the project directory nor the mkdocs configuration file.
  • We can fetch the last published date per feed before parsing the commits, so instead of analyzing the whole history, we can go from the newest till the desired dates, making the algorithm more efficient..
  • We don't overwrite the existent articles, so if the user makes changes on them, they are respected.

The disadvantage is that we need to code the extraction of that dates from the files collection, but it should not be difficult.

Assuming that we have the existent articles organized following 004, we can create a function that reads the files attribute from the on_files event and deduces the date of the last change of the last newsletter article for each feed.

Parse the commit messages

We can:

After reviewing the existent libraries, I've found that python-semantic-release exposes the parsed data in a friendlier way. The only drawback is that it uses NamedTuples for the objects that hold the parsed data, so if we want to change one, we need to create a new one from the old.

Using an existent library, if it's well maintained, is always better than writing your own.

Extract file context from the MkDocs nav

For each file that contains changes, we need to deduce the relative position in the nav against the rest of articles. This is necessary so we're able to group changes together by category, subcategory and file later in the newsletter creation.

We can use the Deepdiff to locate the file in the nav tree, and then extract the context information from the path.

Get the changes that need to be added to the newsletter articles

For a change to be added to each feed it needs to:

  • year: Be made before the first day of the year and after the last published change in the year feed.
  • month: Be made before the first day of the month and after the last published change in the month feed.
  • week: Be made before the last Monday and after the last published change in the week feed.
  • day: Be made before today and after the last published change in the day feed.

Decision

We will:

  • Run this phase in the on_files MkDocs event to be able to have the list of articles to process.
  • Deduce the last published date from existent articles as it's the cleanest solution in terms of repository pollution.
  • Extract the last published dates before we parse the commit messages.
  • Process only the commits that are posterior to those dates.
  • Use python-semantic-release library to parse the commit messages.
  • Extract file context from the MkDocs nav
  • Get the changes that need to be added to the newsletter articles with the different feeds.

Consequences

We don't have to write the parsing logic, it's maintained by the community and it needs few changes to fulfill our needs.

Advantages:

  • We'll know between builds which changes have been already published.
  • We'll reduce the amount of commits that we parse, thus making the process more efficient.
  • We wont overwrite manual user changes in the published articles.

Disadvantages:

  • We need to code and maintain the extraction of last published date from the list of repository files.