User:Robbat2:ChangeLog-Generation

= ChangeLog Generation = Project to improve ChangeLog generation.

what
egencache

what's wrong
very slow

what it does

 * 1) For each package, in portdir:
 * 2) Fork AsyncFunction
 * 3) Get timestamp of top commit, run:
 * 4) If older than , break
 * 5) Get the list of commits from start of history:
 * 6) For each commit:
 * 7) Get changed-files, run:
 * 8) Parse git output
 * 9) Build ChangeLog entry
 * 10) Combine ChangeLog entries in expected direction (forward or reverse)
 * 1) Combine ChangeLog entries in expected direction (forward or reverse)

Every fork out to git means reading all of the git data structures on disk, and while they will be cached, it's still a lot of work.

best case
This will happen if there have been no changes between two runs. It still evaluates every single package in PortDB.


 * 1) It still evaluates every single package in PortDB.
 * 2) Fork for each package.
 * 3) Call a single    (2-4 seconds per package).
 * 4)   (50ms per commit)
 * 5) Commits touching multiple package paths are evaluated MORE than once.
 * 6) This makes tree-wide and category-wide commits VERY expensive for changelogs.

Data structures

 * For each commit:
 * metadata, message, files-changed
 * For each package:
 * List of commits

Process

 * 1) Data-collect phase:
 * 2) Run a SINGLE git command
 * 3) Should be able to take last-seen commit and only output NEW history
 * 4) Output should make it easy to split commits.
 * 5) In the worst case, this will output the ENTIRE history; could trivially be restricted (eg by date).
 * 6) Parsing phase:
 * 7) Split the output of the git command
 * 8) * Responsibilities:
 * 9) *# populate the commits database
 * 10) *# return a list of changed packages
 * 11) * Two possible designs
 * 12) Pure single-threaded
 * 13) * For each commit, examine all of the data and split out to data structure
 * 14) Hybrid
 * 15) * First part is single-threaded, and just scans for the commit separator.
 * 16) * Pipe the text of each commit into a separate process that parses & returns data.
 * 17) * Pre-format the commit messages into the data structure (they can be re-formatted latter, but it will save a lot of future compute)
 * 18) Output phase:
 * 19) Trivially multi-threaded from this point, needs only reads from the commits data structure.
 * 20) For each changed package:
 * 21) If most recent commit was a package deletion, delete the ChangeLog file on disk (or stash elsewhere).
 * 22) Grab (commit-hash, changetime)
 * 23) Sort by changetime
 * 24) Write changelog by fetching commits in order.
 * 25) Optional:
 * 26) * write latest hash & changetime to changelog, for ease of validation/processing
 * 27) * use it to append or prepend new entries.
 * 28) * Do not trust mtime on changelog file