CScan Users Guide

(a.d. 2004, by DoDi)

The CScan project implements an scanner and preprocessor for C source files, intended for the use in cross compilers and other tools. The implementation has been tested with Windows.h and various GNU/gcc header files. This document describes the API and usage of CScan. Information on the general and particular operation of an C scanner and preprocessor can be found in the project description page.

The CScan project is provided as open source code. Feel free to use it in your own private or open source projects. Further copyright and license details will be specified when the project is ready for use.

Comments, suggestions, contributions and error reports are welcome (VBDis@aol.com).


The CScan API

This is a short and possibly incomplete overview of the CScan application interface. Please use it to find the according declarations and descriptions in the supplied source code.

uScanC.pas (eventually renamed into uCScan.pas) exports the following:

function  ScanFile(const fn: string): TFile;
function  nextTokenC: eToken;
function  TokenText(const t: RPreToken): string;
function  TokenString(fFirst: boolean = False): string;

var
  ScanToken: RPreToken;
  ScanString: string; //for symbols, string literals...
  ScanSym:  TSymbol;  //for symbols
  fVerbose: boolean;  //log source lines?

uTokenC.pas defines the preprocessor token record:

type
  RPreToken = record
  eToken = (<enum>)
  eTokenAttrs = (<enum>)

uTables.pas exports:

type
  eSymbolKind = (<enum>) //subject to removal
  TSymbol = class(TDictEntry)
  TSymList = class(TDict)

var
  Symbols:  TSymList;
  StringTable:  TStringList;

uFiles.pas exports:

type
  TFile = class(TStringList)
  TFileList = class(THashList)

var
  fIncludeOnlyOnce: boolean;  //include files only once?
  Files:  TFileList;
  IncludePath:  TStringList;

function  AddIncludeDir(dir: string): integer;

uUI.pas contains the GUI interface:

type
  eLogKind = (<enum>)
 TLog = procedure(const msg: string; kind: eLogKind = lkProgress) of object;
var
  Log: TLog;


Head Start

Refer to the sample projects FlexScan (console) or WinScan (GUI) for details. In uParseC.pas a real parser for C declarations is implemented. Your parser code has to perform the following tasks.
  1. Create an stub file with your preferred settings, and #include the file to parse.
  2. Open the main source file with ScanFile(filename).
  3. Retrieve tokens with nextTokenC(), until it returns t_eof.
  4. Done!
In uParseC.pas you'll find an more illustrative sample parser (data type converter).

Before I forget to mention this, a GUI application ($APPTYPE GUI) has to supply a callback method in the uUI.Log variable. This method will receive all diagnostic output from the scanner. See fScanLog for a possible implementation of such a method.

AddIncludeDir(path)

Calling this procedure is no more required. Instead include directories can be specified with #pragma Include.

Specify to the preprocessor the directories, where #included header files shall be searched. These directories later are searched in the order of the calls to AddIncludeDir.

Every path shall end with a '\', but otherwise a backslash is automatically appended. Even '/' can be used for directory separators, because all file names are internally unified to lower case and '\' separators.

ScanFile(filename)

Start the scanner for this C source file.

nextTokenC()

This function stores the next token record in the ScanToken, ScanSym and ScanString variables, and returns the token kind.

Token Record

The RPreToken record contains the fixed fields .kind and .attrs, with the token kind (enum eToken) and an set of token attributes (sTokenAttrs). The pc and len fields refer to the token text in the input file. The values in the remaining fields depend on the token kind. Please refer to uTokenC.pas for the actual values of these and all the other fields.

There exist several token classes, which you'll have to treat differently. The simple cases are:

You may wonder where the C keywords are? Keywords are not recognized by the scanner, they are returned as symbols instead. Your parser will typically have to implement its own token record or class type, together with a converter from the preprocessor token definition to your application specific token type. In that conversion all the preprocessor symbols can be translated into application specific identifiers, keywords, typenames, and names for constants, variables or procedures. You also may have to translate ambiguous names, which only differ in case - C is a case sensitive language, Pascal is case insensitive.

Some of the remaining token kinds are used internally by the scanner/preprocessor. You may have to recognize the following token kinds:

Some helper functions return a textual representation of a token:

function  TokenText(const t: RPreToken): string;

TokenText returns the token value as text, with string literals in the "internal" format, i.e. with possibly embedded control characters, and without quotes.

function  TokenString(fFirst: boolean = False): string;

TokenString is not so useful in an parser, more in a pretty printer. It returns strings in the original C format and quoting, and with a leading space if fFirst is False and the token was preceded by whitespace.

Scanner Flags

Various boolean variables or constants modify the handling of certain tokens. Please refer to the actual source code for these variables and their meaning.

The related declarations may be moved into the uTokenC unit in the next version...

Sample Applications

Two sample applications are provided, with a console (FlexScan) or graphical (WinScan) UI. You'll have to adopt the source file and searchdir names to your system, in FlexScan.dpr or fScanLog.pas.

The wintest.c file contains some #defines which are required to process windows.h. These and other symbols normally are provided by the C compiler, but since CScan is not a compiler you may want to use your own standard root files. In a future CScan version it shall be possible to specify the search directories and other settings by a #pragma, then all adaptations can be made in the root file(s), without recompiling the applications. Then the root file can be used like an INI or Make file, with the option to #include further files with commonly used settings. Yes, I'm too lazy to implement command line argument handling myself ;-)

I hope that some last-minute changes don't cause trouble...

Outlook

The CScan project is still under development. Please consult the actual source code in case of differences from the beforementioned behaviour. The current preview version mainly is supplied for demonstration and debugging, please report all observed errors to VBDis@aol.com. The code was implemented and compiled with Delphi 4, please also report problems and possible solutions with FreePascal and other compilers.

The next project is the implementation of an data type converter, from C to Pascal/Delphi. This project now has become part of WinScan, the FlexScan project was not updated accordingly. During the development of this project some more missing features will be implemented in the scanner, making it a really usable library package. I plan to only implement a very rudimentary framework around the scanner myself, just sufficient to debug the scanner. Feel free to contribute your own more elaborated framework, with a handler for command line switches or options dialogs etc.

My final project will be a C to Pascal cross compiler, which may require the addition of some more features to the scanner. This cross compiler can be extended with back-ends for other languages or compilers, and possibly with front-ends for languages other than C. But I don't plan to implement such extensions myself, please contact me if you want to implement such extensions yourself, so that I can provide the according documentation and assistance. I could imagine that the Delphi back-end can be extended into back-end for Borland Pascal, FreePascal or VisualPascal, and I'm willing to coordinate all further development in these directions.

DoDi
(Dr. H.-P. Diettrich, February 2004)