Posted to sci.lang on 1/18/93
Hello… I’m looking for references (journal titles, specific papers, etc.)
about programatic analysis of text. I have written a program that does
some simple analysis of texts without knowing anything specific about
the words in the text except that most adverbs (in English) end in ‘ly’,
and that ‘a’, ‘an’, and ‘the’ are articles, etc. It also counts sentence
clauses superficially by counting commas and semi-colons, averages number
of words in sentences, etc. My goal is to be able to run texts through the
program and produce output that is imported into a spreadsheet for
comparison to data from other texts, etc.
I have a working program and it does produce results that *might* be
interesting but…
1) I don’t know how much of a difference between two texts is "significant",
and…
2) I would like to include more sophisticated tests, but I can’t think
of how to do this without first classifying all the words in the text.
One approach to classification might be to use an on-line list of words
commonly used in English already classified (i.e. nouns, verbs, adjectives,
etc.). If someone knows of such a dictionary and where I could get it,
that would be very helpful also.
Commercial programs like RightWriter (which I have) contain various word
and "rules" dictionaries, but these are encoded so I can not read them
and use their rules for my purposes which are somewhat different from
those of RightWriter… Some un-encoded versions of these dictionaries
would be nice as I could use them to build my own algorithms….
Thank you for your time…
~r .signature
—
matthew rapaport Philosopher/Programmer At Large KD6KVH
m…@netcom.com 70371….@compuserve.com









Place your comment
You must be logged in to post a comment.