Rationale for guidelines adoption
Why documenting your code (if that’s not already obvious)?
Beyond enabling the sharing and reuse of your code, the practical benefits of documenting it are in enabling reproducibility and verification, as well as possible extension and potential migration:
- “A critical barrier to reproducibility in many cases is that the computer code is [not] available.” (Peng, 2011): computational resources should facilitate the participation of all and the integration of any additional contribution,
- “Publish your code (even the small bits)” (Goodman et al., 2014): even if there is no guarantee of quality, it can still potentially contribute to new experiments and help develop/deploy more advanced in-house analysis products,
- “Freely provided working code - whatever its quality - improves programming and enables others to engage [with your research]” (Barnes, 2010): thanks to a good documentation, any skilled person can modify the code to suit his/her needs, learn from its use and further contribute to its improvement.
Hence, a good documentation is not only useful for the users to run and (re)use your code, but it will also help developers to maintain, share, extend, and migrate this code.
As stated in (Ince et al., 2011), “with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation”. Ultimately, we believe that one should “provide public access to scripts, runs, and results” (Sandve et al., 2013), hence not only the outcomes of a given analysis, but the whole processes, data and tools necessary to produce it should be open and shared. Source code documentation overall supports these objectives.
Why adopting markdown for the documentation?
Lightweight markup languages, e.g. markdown, AsciiDoc, provide formats that are both processable by documentation generators, and easily readable by human produsers (see also comparison between languages).
Language | Supported implementations | Output formats | ||||
XHTML | DocBook | ODF | Doc | |||
AsciiDoc | Python , JavaScript , Ruby |
Yes | Yes | Yes | Yes | Yes |
markdown (and variants) | C , C# , Java , R , Python , JavaScript , Ruby , PHP , Perl , Haskell |
Yes (HTML) | Yes | Yes | Yes | Yes |
MediaWiki | PHP , Perl , Haskell |
Yes | No | No | No | No |
reStructuredText | Java , Python , Haskell |
Yes (HTML,XML) | Yes | Yes | Yes | No |
For some languages, the literature may provide “consistent” examples of documentation, still they are often not generic enough and do not go beyond the inline documentation (targeting the developer, not the user).
For instance, there is, to our knowledge, no documentation framework or built-in tool that is compatible with SAS
(a tool like DocItOut is not maintained since 2008 anymore).
Based on the results reported in the wiki mentioned above, we preselected 4 markup languages that are: (i) widely adopted opens-source, (ii) enable HTML import/export (note though that Textile does not enable HTML import), and (iii) are supported (possibly through different documentation generators) by more than one language. Finally, markdown language shall be adopted:
- it is human-readable, easy-to-learn,
- it is common to many languages, and in particular, in view of future migration, to
R
through its Rmarkdown variant, - it is supported by different documentation generators (see below).
Note that it is also important that the use of a specific documentation style (possibly associated to a given generator) does not alter the natural documentation of a language (intrinsic to the language itself). In many languages (like SAS
or Stata
), it does not represent an issue since the documentation is inserted as comments like in C
language.
Why using Doxygen as the documentation generator?
In order to create portable documentation, documentation generators can be used, Such tools - e.g. well-known javadoc - generate software documentation from internal code comments.
generator | |||||||
Doc++ | Doxygen | HeaderDoc | Natural Docs | RoBODoc | Sphinx | ||
programming languages | C/C++ |
Yes | Yes | Yes | Yes (partial) | Yes | Yes |
C# |
Yes | Yes | |||||
Java |
Yes | Yes | Yes | Yes (partial) | Yes | ||
Python |
Yes | Yes | Yes (partial) | Yes | Yes | ||
JavaScript |
Yes | Yes | Yes | Yes | |||
IDL |
Yes | Yes | Yes | Yes | |||
PHP |
Yes | Yes | Yes (partial) | Yes | Yes | ||
Perl |
Yes | Yes | Yes | ||||
Ruby |
Yes | Yes (partial) | Yes | Yes | |||
SQL |
Yes | Yes (partial) | Yes | ||||
Visual Basic |
Yes (plugin) | Yes (partial) | Yes (plugin) | ||||
R |
|||||||
output types | HTML |
Yes | Yes | Yes | Yes | Yes | Yes |
XML |
Yes | Yes | Yes | ||||
DocBook |
Yes | Yes | |||||
man |
Yes | Yes | Yes | Yes | |||
RTF |
Yes | Yes | Yes | ||||
PDF/PS |
Yes | Yes | Yes | Yes | |||
LaTex |
Yes | Yes | Yes | Yes |
Based on the results reported in the previously mentioned wiki, we preselected 6 documentation generators that are: (i) open source, (ii) multi-platform, i.e. running on Windows, Linux, Unix, Mac OS X and BSD operating systems (note though that HeaderDoc is not directly running on Windows), and (iii) support more than one language only. Our final choice is Doxygen also because it provides support to markdown.
- Guidelines and best practices from Write the Docs.
- Google documentation style reference and guidelines.
- A list of beautiful docs.
- Goodman A. et al. (2014): Ten simple rules for the care and feeding of scientific data, PLoS Computational Biology, 10(4):e1003542, doi:10.1371/journal.pcbi.1003542.
- Sandve G.K. et al. (2013): Ten simple rules for reproducible computational research, PLoS Computational Biology, 9(10):e1003285, doi:10.1371/journal.pcbi.1003285.
- Peng R.D. (2011): Reproducible research in computational science, Science, 6060(334):1226-1227, doi:10.1126/science.1213847.
- Ince D.C., Hatton L., and Graham-Cumming J. (2011): The case for open computer programs, Nature, 482:485-488, doi:10.1038/nature10836.
- Barnes N. (2010): Publish your computer code: it is good enough, Nature, 467:753, doi:10.1038/467753a.
- Wikipedia comparison of document markup languages.
- Wikipedia comparison of documentation generators.
- A beginner’s guide to writing documentation.
- Mastering cheatsheet and markdown quick reference.