/*
Take an image of a page, and apply shears in X and Y to straighten
lines in the page. The shearing is a "poor man's" approximation
to rotation. The assumption is that the skewed text will still be
acceptable to the accompanying OCR algorithm.
The algorithm is to sample a path across the page, the same as drawing a
line across the page with a DDA. The nearly-horizontal line is slid
vertically over the whole page, and this is repeated for various slopes
of the line. The angle which provides the most intersections between
the line and the image data which have a high number of white pixels
are presumably those scans where the scanline managed to squeeze
between horzontal lines of type.
Since the total number of white vs black pixels on the page remains the
same, no matter what the scanning angle, the discriminating function
has to specifically weigh long runs of white more heavily, so the
score will probably be something like (white-black) or white^2 ...
maybe some sort of root-mean-square algorithm to be determined by
experiment. Whether raw black vs white data is sufficient or whether
we need to specifically count the lengths of runs of white pixels
will be determined by experiment.
The first hack will be crude and assume 1-bit images. Later versions
may use greyscale or colour if needed. Code should accept any input
format including high-res colour - we'll probably use the 'netpbm'
package so that we can one with one standard format and use their
large range of conversion tools (notably "anytopnm") to convert
any sort of graphics file on input. During development we will
probably want immediate graphical feedback but this may be too cumbersome
under the command-line unix I use, so I'll probably generate jpg's to
view on a web page by way of feedback.
Note: we could possibly also find horizontal lines (underlines, rules
etc) with the same algorithm, looking for black rather than white runs. Perhaps
best approach may be to look for both at once. In fact the RMS test
could be extra useful here as (white-black)^2 will be positive for
an imbalance of black or white pixels.
This is fast code. Time to analyse a page is about equivalent to the
time to load and save the graphics image. No attempt has been made yet
to do bit-level optimisation, and a faster line-drawing algorithm
should speed it up by a factor of say 4 or more.
Once the angle of skew is determined, we can rotate the page by using
essentially the same algorithm; moving across the page horizontally,
following the slope of the DDA. This will give an output page of the
same width as the input page. Actually to be more precise we should
also apply a shear along the other axis, which can be done with
the very same algoritm, just rotated 90deg. In other words, instead
of just moving the horizontal scanning line up the page, we move it
up and also across a couple of pixels to account for the vertical
skew too. The skew is simply the horizontal skew rotated 90deg. This
is a fast way to do rotation; the only problem being that the resulting
bitmap will be slightly stretched in X and Y, by a few pixels. For OCR
this is not an issue. However the code could be adapted to become a
general-purpose rotator by simply adding a post-processing phase where
the bitmap is scaled in X and X by an appropriate factor. Code to
do this (rectilinearly) already exists elsewhere (eg Acorn had
Floyd-Steinberg dithered x/y re-scaling on the ARM years ago). It is
this latter stage which slows the process of rotation down, no matter
how it is implemented; fortunately for us we don't need it.
By carefully chosing the order of processing it should be possible
to update the bitmap in place, but that is probably overkill because
we need to write out an output file anyway and can do the scanning on
the fly as we write to the output.
Note: the complexity of this algorithm is primarily x*y, however
if we scan with slopes whose slant increases by only one pixel per
attempt, we end up with a complexity of x*y*y. A much more sensible
thing to do would be to quantise the scan into say 32 fixed angles,
so that a larger bitmap does not require more passes. The accuracy
of the algorithm shouldn't be overly affected. Perhaps having found
a local maximum with this crude filter, we could fill in the gaps
by doing a +/- scan in the gaps immediately adjacent to the rougher
estimate.
*/
#include
#include
void read_image(int *maxx, int *maxy)
{
/* Read a bitmap, return its size;
Numbers are inclusive, ie bounds are 0:maxx-1, 0:maxy-1 */
*maxx = 8 * 400;
*maxy = 12 * 400;
}
int samplepixel(int x, int y)
{
/* get the value of a pixel. For now, apply threshholding and return 0/1 */
return(0);
}
/*
The code below is a DDA trivially adapted from a line-drawing algorithm;
I have faster algorithms than this - even an anti-aliased one - but
this is rock-solid reliable code that can be trusted during the debugging
phase (and also it appears to be fast enough already!)
*/
int countwhite(int y1,int x1,int y2, int x2)
{
/* calculate a DDA from (xl,yb) to (xr,yt). */
/* instead of plotting pixels, add them up somehow */
int temp,dx,dy,x,y,x_sign,y_sign,flag;
int tot = 0;
dx=abs(x2-x1); // Delta of X
dy=abs(y2-y1); // Delta of Y
if (((dx >=dy) && (x1>x2)) || // Make sure that first coordinate
((dy>dx) && (y1>y2))) // is the one with least value
{
temp=x1;
x1=x2;
x2=temp;
temp=y1;
y1=y2;
y2=temp;
};
if ((y2-y1)<0) y_sign=-1; // The direction into which Y-coord shall travel
else y_sign=1; // Same for X
if ((x2-x1)<0) x_sign=-1; // ---- " ----
else x_sign=1; // ---- " ----
if (dx>=dy) // Which one of the deltas is the greatest one
{
for (x=x1,y=y1,flag=0;x<=x2;x++,flag+=dy) // From x1 to x2
{ // Also increase the
if (flag>=dx) // Increase/decrease // flag (displacement value)
{ // y!
flag-=dx;
y+=y_sign;
};
tot += samplepixel(x,y); // Plot the pixel
};
}
else
{
/* This is the same as above, just with x as y and vice versa.*/
for (x=x1,y=y1,flag=0;y<=y2;y++,flag+=dx)
{
if (flag>=dy)
{
flag-=dy;
x+=x_sign;
};
tot += samplepixel(x,y);
};
};
return(tot);
}
int main(int argc, char **argv)
{
int y, v, delta, maxx, maxy;
int step;
int **score;
int *total;
read_image(&maxx, &maxy); /* get width and height of image */
/* calculate shear in vertical pixels for -5deg to +5deg */
/* Sin alpha = opp/adj. (opp = Y, adj = x) -> y = x * sin.alpha */
v = (maxx / 16); /* very crude approximation for sin - approx 6deg. is it enough? */
/* NOTE: some care is taken here to get the boundary conditions right;
be very careful when tweaking this code to avoid +1 errors */
score = malloc(((v*2)+1) * sizeof(int *));
total = malloc(((v*2)+1) * sizeof(int));
if (score == NULL || total == NULL) {
fprintf(stderr, "straighten: very low on ram\n"); exit(1);
}
score += v; /* adjust origin, so have -v:v */
total += v;
step = (2*v)/32; /* 32 fixed angles. */
if (step == 0) step = 1;
for (delta = -v; delta <= v; delta += step) { /* typically -32:32 */
score[delta] = malloc(maxy * sizeof(int));
if (score[delta] == NULL) {
fprintf(stderr, "straighten: low on ram\n"); exit(2);
}
total[delta] = 0;
for (y = v; y < maxy-v; y++) {
/* tweaking by +/-v is so we don't run off the page */
/* draw a line from 0,0 to maxx,v for all v */
total[delta] += score[delta][y] = countwhite(0,y, maxx-1,y+delta);
}
}
/* Choose best score from -v to v-1 and generate image with this skew.
With a little care we can apply the skew either in situ or as we write
the output file, and avoid a second buffer */
for (delta = -v; delta <= v; delta += step) {
fprintf(stdout, "total[%d] = %d\n", delta, total[delta]);
}
exit(0);
}