The Flesch Index - Villanova Computer Science

0 downloads 0 Views 325KB Size Report
a text editing system written in RPG II. Included in this paper is a sample program written for the VAX 11/780 in PL/I. In 1949. Dr. Rudolph. Flesch published.

The Flesch Index: An Easily Programmable Readability Analysis Algorithm JOHN TALBURT University

This and

paper

is an exposition

documentalists.

programmed as a function is

In In

this

book,

analyze based per

Dr.

text the

average This

sentence.

100.

Only

Scores

of

Index

graduages are

of

are

high-school

of

designed

level

and

implemented

word

follow

Art

ease

sample

90 -

to

writers

be easily

Included

"The

is and

so that to

value

this

in

this

test

paper

PL/I.

reading

per

syllables

to

RPG II.

titled

Each

it

be of

successfully

in in

can

allows

has

a book

supposdd

that

written

method

is

analysis

author

100 words.

number Flesch

college 50 -60

about

Rock

algorithm

11/780

published

a manual

of

VAX

at LittIe

text

this

system the

Flesch

he described samples

upon

for

for

The

editing

written

Rudolph

of

systems.

a text

program

1949

simplicity

computer

within

a sample

of an algorithm

The

on most

of Arkansas

of

the

prose

100 should

Writing."

This

analysis.

assigned

most

Readable

method

a readability

average

number

index of

scores

range

in

0 - 30 range.

the

from

be readable

by

was

words

0 to

fourth

graders.

Though the

index

through. the

crude, is This

since

useful.

it It

test

has

readability

of

policies.

Flesch's

algorithm

is

designed

gives

a

basic,

been

used

by

was

automated

simply

to

idea

objective

some

in

state

the

early

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

0 1986 ACM

0-89791-186-5/86/0600/0114

$00.75

II4

reward

short of

how hard

insurance

1970s

words

and prose

commissions

by

the

Service

to

sentences, is

to

enforce

Research

wade

to

Group

of

Simple more

General

Test

is

the

was

portion

2.

Each

group

of

3.

Each

vowel

(a,

the

in

the

shop

BASIC

number

program

manuals

of

The

syllables the

marks,

(General

could

language.

uses

question

GM-STAR

to

this

a word.

following

colons

be made

key

in

Motors

rules:

and

semi-colons

marks.

continuous e,

so that

the

points,

end-of-sentence

following

used

count of

explamation as

there

to

called

program,

written

algorithm

count

to

was

originally

analysis

Periods,

The

Corporation.

Readability)

simple

text

1.

of

for

GM-STAR a very

general

Although

Motors

Approach

readable.

program In

the

i,

non-blank u,

o,

y)

characters

in

a word

counts

counts

as

as a word.

one

syllable

subject

sub-rules:

A.

Ignore

final

B.

Words

C.

Consecutive

are

many

of

-ES,

three

-ED,

letters

vowels

exceptions

to

-E

(except

or

less

count

as

these

rules,

for

-LE)

count

one

as

one

syllable

syllable. it

works

in

a remarkable

number

cases.

The

Flesch

Index

(F)

a given

text

sample

1.

The

total

number

of

sentences

2.

The

total

number

of

words

3.

The

total

number

of

syllables

the

following

according

to

F = 206.835 The

for

Grade

-

Level

is

ca lculated

from

three

statistics:

(N),

(W), CL),

formula: 1.015

x (W/N)

- 84.6 the

x (L/W).

Equivalent

(G)

of

Flesch

F

50,

then

G =

(140

- F)/6.66 F)/3.33

Index

is

given

by the

following

table: If If

50

F

60,

then

G :

(93'-

If

60

F

70,

then

G =

(110

- F)/5.0

If

70

F

G

(150

- F)/lO.O

this

algorithm

A PL/I output. text

program For

with test

for

to

translate

that

Lower

1.

Nothing

2.

Ignoring

can

can

as well

letters

that

program

letters

case

q

implements this

case

all

amenities

, then

simplicity,

lower

to

and

-50

to

which periods

all

as

upper

to

the

characters that

are are

or

There basic

are

considered

115

are

for

in

along

modifying

a multitude Among sentence

abbreviations

with

upper

by preprocessing

analysis.

used

below

by either

case,

case.

listed

letters

be accomplished

upper

be added

assumes

is

case. the

the of these

Processing

program

text

other

sample

sample

refinements

are:

terminators. rather

than

sentence

terminators. 3.

Ignoring

$4.

Noting

word which

as numerals 5.

Shallpening

connecting character

and the

dollar syllable

hyphens groups

in

compound

should

probably

words. be spelled

out,

such

amounts. counting

116

routine

to

detect

exceptional

cases.

8 ty outfile.dat PROGRAM SOURCE

PAGE

2/27/85

GMSTAR: PROCEDURE OPTIONS(MAIN); /* THIS PROGRAM READS A USER CREATED TEXT FILE /* COMPUTES THE FLESCH READABILITY INDEX. DCL TEXT-IN FILE INPUT STREAM; DCL REPORT FILE OUTPUT PRINT; /*** TEXT WORK VARIABLES ***/ DCL LINE CHAR( 80 1; DCL WORD CHAR(30) ; DCL TCHAR CHAR(l); CHARI 2) ; DCL SUFFIX DCL NORD-START BINARY; DCL WORD-END BINARY; BINARY; DCL WORD-LEN BINARY; DCL V-COUNT BINARY; DCL (J,K,L) /*** FLAGS *Jr*/ BIT(l) INIT(‘l’B); DCL MORE-LINES DCL MORE-WORDS BIT(l) ; DCL SEARCH-FLAG BIT(l) ; DCL VOWEL-FLAG BIT(l) ; /*** STAT I ST I CS ***/ DCL FLESCH-INDEX FLOAT ; FLOAT INIT( 0) I DCL GRADE-LEVEL FLOAT ; DCL AVG-SENT DCL AVG-SYLL FLOAT ; BINARY; DCL SENT-COUNT BINARY; DCL WORD-COUNT BINARY; DCL SYLL-COUNT /*** CHARACTER CONSTANTS ***/ CHAR( 6) INIT(‘AEIOUY’1; DCL VOWELS CHAR( 5) INIT; K = INDEX(TERMINATORS,TCHAR); IF K > 0 THEN DO; SENT-COUNT = SENT-COUNT + 1; WORD-END = WORD-END - 1; END; IF TCHAR = COMMA THEN bJORD_END = WORD-END - 1; END TEST-TERM; /k**Q*******Jt~k***~********~*~~~~*~k****~~***********~******/ /‘*+A TEST-SUFFIX FINDS SUFFIXES -ED, /*3r*************~**********~~************~*****************~~ TEST-SUFFIX: PROCEDURE; SUFFIX = SUBSTR(LINE,WORD-END-1,2); TCHAR = SUBSTR(LINE,WORD-END,lj; IF SUFFIX = ‘ED’ THEN WORD-END = WORD-END - 2; IF SUFFIX = ‘ES’ THEN WORD-END = WORD-END - 2; IF TCHAR = ‘E’ & SUFFIX *= ‘LE’ THEN WORD-END = WORD-END - 1; END TEST-SUFFIX; /**-k*********************************************/ /Jr** VOWEL COUNT EQUATES VOWELS AND ,‘***k*****~************~~~*********************~*/

VOWEL-COUNT : PROCEDURE ; K = WORD-START; UOWEL-FLAG = ‘O’B; V-COUNT = 0 j DO WHILE (K = -50 6, FLESCH-INDEX < SO THEN GRADE-LEUEL = (140 - FLESCH-INDEXjs’6.66; ELSE IF FLESCH-INDEX < 60 THEN GRADE-LEVEL = (93 - FLESCH-INDEX)/3.33; ELSE IF FLESCH-INDEX < 70 THEN GRADE-LEVEL = (110 - FLESCH-INDEX)/J. 0; ELSE GRADE-LEVEL = (150 - FLESCH-INDEX)/lO.G; PUT SKIP EDIT(‘GRADE LEVEL EQUIUALENT’,GRADE-LEVEL) IR(FORM-2)); END SUMMARY; END GMSTAR;

120

4

* ty report.dat ***-k-k*******+*****

SAMPLE

TEXT

ANALYSIS

****-kJr**Jt*t*t**~****k

MOST PEOPLE WHO BANK WITH US ARE COURTEOUS AND FRIENDLY BUT OCCASIGNALLY A CUSTOMER MAY BE RUDE, OVERBEARING, EVEN UNKIND AND DISCOURTEOUS . WHEN THIS SITUATION ARISES, WE CAN BEST HANDLE IT BY FIRST, KEEPING CALM. A CUSTOMER MAY APPEAR TO BE ANGRY AT THE BANK BUT ODDS ARE HE IS ANGRY AT SOMEONE ELSE, A MEMBER OF HIS FAMILY OR SOMEONE AT WORK OR THE CUSTOMER MAY FEEL ILL OR SLIGHTLY UPTIGHT, SO KEEP CALM AND COLLECTED. TRY TO FIND OUT WHAT THE PROBLEM IS. USUALLY THE PROBLEM IS VERY SMALL BUT HAS BEEN MAGNIFIED OUT DF PROPORTION BY THE CUSTOMER. DO NOT ARGUE OR ENGAGE IN A VERBAL BATTLE. REMEMBER, THE WORST THING WE CAN SAY TO AN IRATE CUSTOMER IS YOU’RE WRONG. SINCERELY APOLOGIZE FOR ANY INCQN’JIENIENCE CAUSED. THAT’S RIGHT! APOLOGIZE EVEN IF THE CUSTOMER IS CLEARLY AT FAULT. REMEMBER OUR JOB IS TO HELP PEOPLE EVEN THOSE WHO FEEL NEGATIUE TOWARD US. NUMBER OF SENTENCES NUMBER OF WORDS NUMBER OF SYLLABLES AUERAGE SENTENCE LENGTH AVERAGE SYLLABLES PER WORD FLESCH INDEX GRADE LEUEL EGUI’JALENT

11 155 234 14.1 1.5 64.8 9.U

121

REFERENCES 1.

Flesch,

Rudolph,

2.

General

Motors

3.

Consumer

Reports,

"The Service

Art

of

Research

"A Bold

Step

Readable Group,

Writing",

Macmillan

"GM-STAR"

Computer

Against

too

122

Much

Fine

Print",

Publishing, Program,

1949

June

November,

6, 1973

1973