Considerations for Python API documentation

Russ Allbery

Considerations for Python API documentation#

Abstract

SQuaRE is currently generating Python API documentation using Sphinx plus sphinx-automodapi and autodoc_pydantic. This mostly works, but exposes a few bugs and imperfections in the Sphinx extension stack. This tech note discusses the goals for API documentation, current shortcomings, and possible alternatives.

Requirements#

SQuaRE generates API documentation for two types of packages: library packages we upload to PyPI, and some web applications where API documentation may be helpful for further development. Of those cases, API documentation for libraries is more important.

We have the following requirements for API documentation:

Documentation should be generated from docstrings, rather than having to be written separately.
Documentable objects inside a module should be discovered automatically. We should not have to update a list of objects to document when we add new classes, functions, constants, etc. (Apart from the tedium, it’s also very easy to forget to do.)
Not all modules should be forced to be included in the documentation. FastAPI route handlers are generally pointless to include in API documentation (they have no useful API), and SQLAlchemy schemas usually don’t have useful autogenerated documentation. We want to be able to pick and choose by module whether it is included in the API documentation.
The generated documentation should be reasonably easy to navigate: broken into separate pages in some reasonable way, with a summary of documented objects and links to the full documentation.
Pydantic models should be documented properly with their fields, since we use Pydantic heavily.

Current API documentation approach#

Currently, API documentation is built using the following stack:

Either Napoleon or numpydoc (both Sphinx extensions) preprocess a numpy-style docstring into Sphinx roles. Originally we used numpydoc, but we have started switching to Napoleon.
sphinx-autodoc-typehints preprocesses the docstring and adds type information.
sphinx-automodapi creates pages for each API object, generates a summary page linking to all of them (its own version of the standard autosummary extension), and adds inheritance diagrams to that summary page (its version of the standard inheritance_diagram extension). The page for each class includes an extracted summary of attributes and methods at the top, with links to the more detailed documentation, that is generated directly by sphinx-automodapi. It has to know about the class to generate a correct summary, which is why the summary is missing for Pydantic models. sphinx-automodapi also decides whether to generate attribute and method documentation using the standard autodoc extension, which is why there is no documentation for methods and attributes of exceptions.
autodoc and autodoc_pydantic process the pages generated by sphinx-automodapi and generate the fully documentation for API objects.

Current issues#

The current approach using sphinx-automodapi has several known drawbacks.

Issues with API objects#

sphinx-automodapi doesn’t document methods and attributes for exceptions. Since we use rich structured exceptions for FastAPI error generation and Slack error reporting, this means substantial API documentation is missing. It’s possible to work around this by overriding the template used by sphinx-automodapi for exception objects, although the result is not quite right. There is no way to generate an accurate attribute or method summary before the detailed documentation.
dataclasses are not recognized as such (ideally, they would receive special support similar to that for Pydantic models with autodoc_pydantic). The docstrings for their attributes are recognized and picked up by the attribute summary and documentation, but they’re not associated with the constructor parameters. The constructor should either be suppressed or should use the first lines of the attribute docstrings. This can be worked around by duplicating the attribute docstrings in a Parameters section of the class docstring, but this is tedious.
The core Sphinx autodoc extension supports an exclusion list with the :inherited-members attribute to exclude inherited members from some base classes. Unfortunately, sphinx-automodapi doesn’t support the exclusion list, only binary presence of :inherited-members: or its absence. This means there’s no way to exclude inherited members from base classes that are not useful to document.
sphinx-automodapi doesn’t fully integrate with autodoc_pydantic, which means there’s no way to pass :inherited-members:. One has to instead override the template used for Pydantic classes. It also doesn’t generate a summary for Pydantic models, which would be useful for models that have normal instance or class methods.
sphinx-automodapi generates stub files for every documented object, but doesn’t remove them when they become obsolete, resulting in confusing Sphinx errors. This can be worked around by deleting all of the stub files before recreating the documentation.

Issues with type information#

Sphinx normally can resolve bare class references in method types and corresponding docstrings (such as the docstring for the return value) using the imports of the module in which the class occurs. Unfortunately, this doesn’t work with :inherited-members:. When the documentation of the inherited member has bare class references that would be resolved by the local imports of the module in which the source is located, they are shown as unresolved in the generated documentation and create Sphinx warnings.
The return type of a function or method has to be reiterated in the docstring to document the return value. This is an artifact of using the NumPy documentation style. If we were using the native Sphinx documentation style, sphinx-autodoc-typehints would add the :rtype: directive automatically and we could write only a :returns: directive.

Other issues with docstrings#

Exceptions in our code bases often use inheritance heavily so that code can catch the parent exception. Each child exception has its own docstring to explain the specific meaning of that exception. Unfortunately, this breaks documentation of constructor arguments, since the docstring overrides the parent class docstring and thus its Parameters section. (It may be possible to work around this by documenting the parameters in a separate docstring for the __init__ method. We haven’t yet experimented with this.)

Formatting issues#

Instance and class methods are sorted together in generated method documentation. Good Python practice is to list the class methods first, followed by the instance methods.
Method parameters without detailed documentation (which happens most often with inherited constructors) show the parameter and type and then a trailing em-dash with nothing after it. Ideally, this dash should be suppressed if there is no documentation.
Every documented object gets its own separate HTML file. In some cases, such as classes, this probably makes sense, but it’s awkward for functions and undesirable for constants or type variables.
The generated module inheritance diagrams don’t support dark mode.

Support issues#

sphinx-automodapi is documented as not really being supported or actively developed, only changed as required for astropy’s internal uses. Meanwhile, autodoc and autosummary have received considerable further development, but don’t approach the problem in quite the same way.

Possible improvements#

We have briefly explored a few alternative approaches to fix some of these issues. None of these have yet been explored in depth or turned into a concrete plan.

Improve sphinx-automodapi#

Some of these issues could be addressed in sphinx-automodapi with a moderate amount of work:

It’s reasonably straightforward to add support for exceptions with similar method and attribute documentation by adding a template for exceptions that’s roughly the same as classes. (However, exceptions inherit some default methods that we probably do not want to include in the API documentation, so this introduces additional problems due to sphinx-automodapi’s lack of support for exclusion lists for :inherited-members:.)
Similarly, it’s fairly easy to add Pydantic model support by adding a new template, although generating a reasonable summary of methods is harder since Pydantic models should normally include inherited methods but should not include methods inherited from pydantic.BaseModel.
Adding support for exclusion lists, similar to what the core autodoc extension supports, is presumably possible by copying the code from autodoc, although the existing code is more complicated.

However, this is somewhat unappealing given the largely unmaintained state of the module. Its documentation warns that pull requests may not be reviewed in a timely fashion, for instance.

This is also not where the effort and energy in Python API documentation is currently going. The autodoc and autosummary extensions are a core part of Sphinx and are gaining new features and getting more active attention. sphinx-automodapi now largely duplicates functionality provided by other extensions, and it’s not clear that both should continue to exist.

Switch to autodoc and autosummary#

It’s appealing to adopt the Sphinx core extensions instead of using a third-party extension that isn’t well-maintained.

One possible configuration would be to change the top-level API page to use autosummary with the :members: directives. The content is a list of all of the modules that should be included in the documentation. This will recursively generate documentation for the members of every module, with summaries.

This mostly works, but an experiment with Gafaelfawr uncovered a few issues.

The page structure is by module rather than by documented object. Whether this is better or worse is somewhat debatable, since the per-object pages are also awkward for things like constants, but it means the top-level page contains only a list of modules rather than the summary of the contents of each module. The summary is instead at the start of each documented object, which makes it much less useful.
Any submodules of a module are automatically included, so there’s no way to document a module and not the modules beneath it. The :members: directive, when applied to a module, apparently includes all modules hierarchically beneath it, with no way to change this behavior. This most obviously affects the top-level module of the library or application, which cannot be included in the documentation without including every module in the library or application.
Using one autosummary directive at the top level means there’s no way to pass configuration down to specific modules or objects. Specifically, this means that there’s no way to selectively set :inherited-members:. This may not be a serious problem, although it means the exclusion list of parent modules has to be maintained globally.
The summary of modules that include Pydantic modules is wrong. It only includes the members of that module that are not Pydantic models, presumably because autodoc_pydantic uses object types that autosummary doesn’t recognize.
Inheritance diagrams are not included by default the way that they are with sphinx-automodapi. This probably just requires configuring the core inheritance diagram extension.

autodoc and autosummary do not fix the problems with inherited member documentation not resolving Python symbols properly, or with not inheriting docstrings for inherited constructor arguments. It does work better with Pydantic, but doesn’t have any better support for dataclasses.

Rejected options#

Changing the docstring format#

Currently, the standard for all Rubin projects is to use the NumPy documentation format. For newer projects, we use the core Napoleon extension to format those docstrings for Sphinx consumption.

While this mostly works well, particularly combined with sphinx-autodoc-typehints, it does add significant complexity to the documentation rendering process. Sphinx has to analyze the mini-format used for the docstring, convert that to standard Sphinx reStructuredText directives, and then reprocess it with Sphinx. It also causes a few minor problems, such as having to repeat the return type in the description of the return value due to the requirements of the NumPy documentation format. There are some directives that have no NumPy equivalent, such as :meta private: and :meta public:.

Switching to straight Sphinx markup in docstrings would require less complexity in the documentation stack and has more straightforward behavior. It would also avoid the rare but confusing bugs where the translation between the NumPy format to Sphinx fails or generates syntactically invalid reStructuredText. Also, while there is no true standardization across Python, there does seem to be slightly more use of native reStructuredText than the other docstring formats (Google and NumPy) outside of the numpy and astropy worlds.

However, none of these reasons seem compelling at this point. The NumPy documentation format is widely used in scientific Python, is arguably more human-readable and thus easier to understand when working directly on the source code, and (probably most importantly) is already universally used in the project. It seems unlikely at present that any transition to another format would be worth the required effort.

Open questions#

This is a preliminary look at the problem and would benefit from further exploration. Here are some known open questions.

Can autodoc and autosummary be configured to generate a page layout that is as navigable as what we have today with sphinx-automodapi? Having one page per module instead of one page per object may be an improvement in some cases (constants), but modules that provide a lot of symbols would produce documentation that is hard to navigate unless the pages use internal headings that would enable an in-page outline.
How hard would it be to fix autosummary to properly summarize Pydantic models documented with autodoc_pydantic?
Could we get better results by generating, committing, and then maintaining the stub pages instead of regenerating them afresh on each documentation build?
Could we build on top of sphinx-autogen and customize it for our own purposes? For example, we could mark some parts of the API as being fully automatable and others as needing local customizations and therefore exempt from being replaced by new autogenerated pages.