It’s been a journey of trial and error, but migrating the CakePHP documentation from Sphinx/RST format to the more universally friendly GitHub Flavored Markdown (GFM) has proven that sometimes the best solution isn’t the one you start with.
The Initial Challenge: Bridging Two Markup Worlds
The project began with a core member’s casual comment on the complexity of maintaining the RST documentation. Intrigued, I decided to tackle the conversion myself, despite the complexity posed by the source format.
The Source of Complexity
The documentation was written using the Python-based Sphinx generator and reStructuredText (RST). RST is powerful, relying heavily on directives (like .. toctree::
) and specialized semantic roles (such as :ref:
, :method:
, :class:
, and countless others) for deep, structured linking and advanced formatting—features that have no direct equivalent in simple GitHub Flavored Markdown (GFM). The conversion challenge was translating this semantic richness.
The False Start: Custom PHP with Claude Code
I started with a classic “how hard can it be?” attitude, choosing to build a custom PHP parser, accelerated by Claude code.
The Problem of Nuances
The initial results of the PHP script were blazingly fast—a huge motivator! However, while Claude code quickly generated initial logic for simple conversions (like headings and basic code blocks), the devil was in the details. PHP proved ill-suited for the complex, nested parsing required:
- Complex Backtick Situations: Differentiating between inline code roles (`:method:
Class::method()
) and regular backticks in prose was a constant source of errors. - Nested Content: Parsing multi-line RST directives that contained complex elements, such as code blocks nested within markdown notes, caused the regex-based PHP approach to constantly break.
This was difficult to solve and incredibly frustrating. Every time I made an adjustment, it seemed to break the conversion somewhere else. I could have solved it with countless more iterations, but I decided to pivot and try a different approach: digging into Pandoc. Why reinvent the wheel? Building a custom, bulletproof parser for a complex markup language like RST is a huge undertaking; better to rely on a tool specifically designed for this task.
The Final Solution: Pandoc, Bash, and Lua Filters 🚀
I decided to pivot and leverage Pandoc, the venerable document converter, combined with a custom scripting layer.
Phase 2: Bash Scripts + Pandoc + Custom Lua Filters = Success
This approach proved to be the key to success.
- Pandoc: Handled the bulk of the standard RST-to-Markdown conversion with impressive reliability, relying on its mature internal parsers and Abstract Syntax Tree (AST).
- Custom Lua Filters: This was the game-changer. I used Lua as Pandoc’s scripting language to write small, targeted filters. These filters intercepted the AST generated by Pandoc’s RST parser, allowing me to precisely translate the remaining difficult elements:
- Mapping the complex semantic roles (like
:ref:
,:method:
, etc.) to standard GFM link nodes. - Cleaning up and correctly formatting custom directives that Pandoc couldn’t natively handle.
- Mapping the complex semantic roles (like
- Bash Scripts: Provided the necessary orchestration, chaining the filters and Pandoc commands together to batch-process thousands of documentation files.
This modular, dedicated toolchain solved the complexity issues that had plagued the custom PHP solution.
The Final Result: josbeir/cakephp-docs-md
The entire successful conversion, which now forms the basis of the new official documentation, is captured in the repository: josbeir/cakephp-docs-md.
What this project does:
This repository contains the full CakePHP documentation in GFM format, providing a clear blueprint for:
- Robust Migration: A showcase of how to successfully transition deeply semantic RST documentation to GFM.
- The Power of Pandoc/Lua: A practical example of using custom Lua filters within Pandoc to solve complex, niche migration problems that simple find-and-replace scripts cannot handle.
The official, migrated documentation, which is currently a work in progress, can be viewed here: https://newbook.cakephp.org/.
This project is a perfect example of how combining the right tools—even after a false start—ultimately leads to a clean, maintainable solution, making the fantastic CakePHP documentation more accessible for the entire open-source community.
Have you tackled a format migration using Pandoc/Lua? Share your stories below! 👇