Formatting automatically generated C programs

Introduction

One of the areas I work in is writing compilers which generate C code as an intermediate step (sometimes known as transpilers), for older languages such as Algol 60, Atlas Autocode, or IMP. I also write static binary translators for video games which convert binaries of old 8-bit microcomputers into C sources for recompilation in order to modify the games or port them to more modern systems.

Sometimes the generated C code is just used as a step in creating a binary, and remains unseen by the programmer, but in other cases you want to generate readable and maintainable C code in order to port the program to a modern system in C, with the intention of maintaining the C translation in the future.

When this happens, it would be nice to reformat the generated C in a readable style, since machine-generated code can often look pretty ugly! Unfortunately the best known source code reformatter, indent, is not the best, so if you are in a similar position and looking for a good reformatter, let me show you the options.

If you're a 'TLDR' person and don't have the patience to read all this, my conclusion is that clang-format from the clang compiler package is the best simple choice currently available.

I will be including some samples of the outputs of each of these formatters on this page for comparison (when I get around to it ;-) ). Just to make them a little easier to read, they'll also be run through a 'c to html' filter, but that will not reformat them at all, it only colourizes the output and ensures that characters such as '<' are readable on any file hosted on this web server. You can remove the .html part from any of those links to see the raw C file.

The programs:

As well as actual formatting programs, you may need to look at utilities to remove carriage returns from source files and utilities to remove/insert tabs or convert tabs to the appropriate number of spaces. These will not be covered here. Similarly, programs exist to extract procedure headings from .c files to generate .h files, to create Makefiles, and to convert procedure definitions between K&R syntax and ANSI syntax. These also will not be covered here.

GNU indent

Indent is probably the best known of these among old-timey programmers and is almost ubiquitous as it has come as standard in most unix systems since about 1976 when it was a UC Berkeley product. However it has not been updated for some years and does not understand all the modern (C99 etc) C constructs, or the unofficial language extensions supported by gcc (GNU cc) and clang, etc, which is rather ironic as it is now a GNU-maintained program. Since language translation to C can frequently rely on language extensions (such as nested procedure definitions in the Algol languages) this can be a problem, especially in cases where the breakage throws off the formatting for the rest of the file. I did submit a request a few years ago to update it to support nested procedures but I doubt that feature has been added yet as it is a feature used by a tiny minority of C programmers and which I am always afraid might be removed from GCC some day due to 'lack of interest'. (Clang does not support the same style of nested procedures although I think clang-format might.)

You will need to tune the output of indent to your taste with a profile file named .indent.pro, which can be per-directory or global. Here are the contents of mine:
-nut -bad -dj -bap -ci1 -cli0.5 -di4 -nbc -nfc1 -i2 -npsl -nsc -br -brf -cdw -ce -sob -fca -l180 -ss --no-tabs

I can point you at two distinct sources of indent — version http://www.gtoal.com/src/c-utils/indent/indent-2.2.9/ which I found on the net, and http://www.gtoal.com/src/c-utils/indent/indent-acorn/ which is the version I used to use in the 80's when I worked at Acorn Computers and which I cleaned up considerably at the time as ANSI C was just coming into its own. I believe both of those will still compile OK on modern systems. The main repository for this version of indent may be https://www.gnu.org/software/indent/, although it appears there's a rival version called cindent at https://invisible-island.net/cindent/cindent.html

bcpp

I honestly don't know much about this program. I found a copy on my drive at home so I must have tried it once, but I don't appear to have gone on to use it after that so I'm guessing it was underwhelming. Source appears to be at https://invisible-island.net/bcpp/bcpp.html where it is described as:
bcpp indents C/C++ source programs, replacing tabs with spaces or the reverse. Unlike indent, it does (by design) not attempt to wrap long statements.
(Actually that sounds quite similar to my Algol60 reformatter, so I can't criticize it on those grounds! Sometimes just reindenting is all you want...)

cb

cb is roughly contemporary with indent, but much smaller and more lightweight code, which means it is also not as functional. It's worth including in a project in source form as a backup formatting utility if the user is on a system with no installed formatter at all. Source is online here at http://www.gtoal.com/src/c-utils/cb.c

I would not be in the slightest surprised if cb does not handle things like modern "//" C++ style comments.

astyle

“Artistic Style” — see https://astyle.sourceforge.net/

c-format

        Archive:  c-format.zip
        A program to format C programs under VMS
        (Compiled with VAX C v3.2, linked under VMS v5.1)
        (VMS file attributes saved---use UnZip v5.x+ to unzip)
        Length      Date    Time    Name
        ---------  ---------- -----   ----
        882  1992-12-21 19:31   aaareadme.txt
        98  1992-12-21 19:31   build.com
        18088  1992-12-21 19:31   c-format.c
        8192  1992-12-21 20:34   c-format.exe
        8082  1992-12-21 20:33   c-format.obj
        ---------                     -------
        35342                     5 files
      
— another short program in the style of cb.

crokus

crokus is more than just a formatter — it is a full parser of C and can output a reformatted source from the parse tree. This is potentially quite powerful but it does mean that it is unlikely to handle any C that contains parse errors or extensions not supported by crokus. There are undoubtedly several other C parsers out there which include a utility to regurgitate their parse tree as an indented source, but I haven't gone specifically looking for them — the formatters on this page (except for crokus perhaps) are primarily formatters, not parsers; the difference being that formatters make some effort to recover from syntactically incorrect input.
https://github.com/JC-LL/crokus/blob/master/README.md (My own IMP and Algol 60 (and Atlas Autocode now I think of it) formatters work on a similar principle.)

uncrustify

This is an interesting one. Although on its own it had some minor problems formatting C with gcc extensions, I was able to get it to format rather nicely by building a Frankenstein of a script that combined a couple of passes of uncrustify along with programmed edits using the Ecce command-line editor. I'm not going to include my uncrustify config and associated script because it is just too much effort for people not already familiar with ecce to get involved with, when the next formatter does a similar (indeed slightly better) job more easily.
https://github.com/uncrustify/uncrustify

clang-format

This is the winner. It just works, for everything.
On most systems it's easy to install with sudo apt-get install clang-format
Here's my ~/.clang-format file:
---
Language:        Cpp
# BasedOnStyle:  Google
AccessModifierOffset: -1
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlines: Left
AlignOperands:   true
AlignTrailingComments: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: All
AllowShortIfStatementsOnASingleLine: true
AllowShortLoopsOnASingleLine: true
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: true
BinPackArguments: true
BinPackParameters: true
BraceWrapping:   
  AfterClass:      false
  AfterControlStatement: false
  AfterEnum:       false
  AfterFunction:   false
  AfterNamespace:  false
  AfterObjCDeclaration: false
  AfterStruct:     false
  AfterUnion:      false
  AfterExternBlock: false
  BeforeCatch:     false
  BeforeElse:      false
  IndentBraces:    false
  SplitEmptyFunction: true
  SplitEmptyRecord: true
  SplitEmptyNamespace: true
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Attach
BreakBeforeInheritanceComma: false
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
BreakConstructorInitializers: BeforeColon
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: true
# 80 was unreasonable.  0 might be better.
ColumnLimit:     120
CommentPragmas:  '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DerivePointerAlignment: true
DisableFormat:   false
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros:   
  - foreach
  - Q_FOREACH
  - BOOST_FOREACH
IncludeBlocks:   Preserve
IncludeCategories: 
  - Regex:           '^'
    Priority:        2
  - Regex:           '^<.*\.h>'
    Priority:        1
  - Regex:           '^<.*'
    Priority:        2
  - Regex:           '.*'
    Priority:        3
IncludeIsMainRegex: '([-_](test|unittest))?$'
IndentCaseLabels: true
IndentPPDirectives: None
IndentWidth:     2
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd:   ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBlockIndentWidth: 2
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: false
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 200
PointerAlignment: Left
RawStringFormats: 
  - Delimiter:       pb
    Language:        TextProto
    BasedOnStyle:    google
ReflowComments:  false
# are you serious?  Sorting includes can break programs!
SortIncludes:    false
SortUsingDeclarations: true
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 2
SpacesInAngles:  false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard:        Auto
TabWidth:        8
UseTab:          Never
...