聯系方式

您當前位置:首頁 >> Java編程Java編程

日期:2019-07-15 10:34

Part 1

In the first section of this assignment, you will write methods to perform basic text analysis on a document.

1.Count the number of words in the text. You can assume that words are separated by a space.

2.Count the number of sentences in the text. You can assume that a new sentence begins following a punctuation mark.

3.Find the top 10 words in the text and return the number of times they occur. This information should be outputted to a text file in an easy-to-read format.

4.A method that accepts a string as a parameter and returns True if the word is found in the text document and False otherwise.

Here is a text file with a portion of President Barack Obama's acceptance speech. You can test your programs using this text file.PoliticalSpeech.txt

Here is the code that you can use to read in a text file from your computer. You will need to add the path in the appropriate place to identify where the text file is saved on your own computer.

You may not use any of the built-in Java methods or libraries that many entirely solve any of the above problems. You can use built-in methods such as "length()" for finding the length of a string.


Part2

This section of the assignment is based off of work that I completed as part of my Master of Science thesis at the Queen’s School of Computing under the supervision of Dr. David Skillicorn. If you’re interested in the background of this problem, here is the abstract of my thesis.

Authors of persuasive documents strive to influence the attitudes of their readers in such a manner that they are encouraged to respond in a desired way. For instance, companies produce advertisements with the goal of persuading consumers to purchase those companies' products or services. We designed a system to transform the language used in documents to either increase or decrease the evaluation, activity or potency (EPA) level of the documents. We validated the effectiveness of our system by transforming the language used in several persuasive documents and asking participants, by means of a survey, to identify whether the original version or the transformed version of the text was more positive, passive or potent. The purpose of the survey was to determine if participants are able to detect the transformations produced by our system. Participants were able to detect many of the transformations. Our survey revealed that the degree to which participants are able to detect the transformations is related to the EPA dimension of the transformation, as well as whether the transformation increased or decreased the EPA level of the text. There was no significant correlation between the gender of participants and the ability of participants to detect the transformations

Charles Osgood rated the evaluation, potency and activity levels (EPA) of the 1000 most frequently used English words. Evaluation loads highest on the adjective pair ‘good-bad’. The ‘strong-weak’ adjective pair defines the potency factor. Adjective pair ‘active-passive’ defines the activity factor. Osgood found that the EPA dimensions were universal across cultures.

The EPA values for the 1000 most frequently used English words are published in this article:

Heise, David. (1965). Semantic differential profiles for 1,000 most frequent English words. Psychological Monographs: General and Applied. 79. 10.1037/h0093884.

Please feel free to read the article, if you wish to learn more about how these values were determined.

I have transferred the values to a “.csv” file to make it easier for you to work with the data.

Click on this link to download the CSV file.

The provided data-set contains 4 columns.

The first column is an English word.

The second column is the evaluation rating of the word.

The third column is the activity rating of the word.

The last column is the potency rating of the word.

Here is starting code that you can use to read in the CSV file and store the data in an array:ReadWordList.java

Since the values in the first column of the CSV file are Strings and the values in the remaining three columns of the CSV file are  numbers, I am reading in the values from the CSV file and storing them in an array of type Object. This helps to get around the fact that we can't store?Strings in an array of type Float and vice versa.

Reminder about how to work with multidimensional arrays.

The ReadWordList.java file creates a multidimensional array: Object[][] wordSet = new Object[786][4];

The array has 786 rows and 4 columns.

The element in the tenth row and the third column would be assigned and accessed by:

wordSet[9][2] = 1.5;

System.out.println("wordSet[9][2] is " + wordSet[9][2]);

Here is sample code that demonstrates how to determine the part-of-speech of individual words:TagText.java

The TagText.java file includes a method called wordIsNoun(String currentWord) that returns True if currentWord is a noun and False otherwise. You will find it helpful to write and identical method to determine if a word is an adjective or an adverb. Recall that the POS tags for a noun are "NN", "NNS", "NNP" and "NNPS". The method that you write to determine whether a word is an adjective will be almost the same as the method for determining whether a word is a noun, except that you will need to replace the noun POS tags with the tags for adjectives. The same logic follows for writing a method to determine whether a word is an adverb. The list of POS tags can be found in the Description section of this assignment.

The SplittingStrings.java file provides a demonstration of how to separate the individual words in a string and how to loop through each of the words in the string.

Write a program that will calculate the EPA values for a given textual document. Your program should calculate the evaluation, activity and potency levels of the text by only considering the adjectives and adverbs found in the text. That is, we will be ignoring nouns and other parts of speech, even if they are included on the Osgood wordlist. You can use the part-of-speech tagging system (Stanford POS) that you worked with in Part 1 of this assignment to determine the part of speech of each word in the text document.

Here’s an overview of the logic that your program will follow:

For each word in the text-file:

oIf the word is an adjective or an adverb:

If the word is included on the Osgood CSV wordlist:

add the evaluation score of the word to the “evaluation” sum.

add the activity score of the word to the “activity” sum.

add the potency score of the word to the “potency” sum.

Once the end of the text file has been reached, return the three sums and print out a message to the console as follows. Assume that to calculate the values in the example output below, we are working with the string "The little dog was wonderful!".?There is a political speech text file at the bottom of this page that you can use to test your program.

Evaluation Score: 2.82

Activity Score: 0.28

Potency Score: -4.09

Example:

Assume we are working with a text file that contains the following sentence:The little dog was wonderful!

Steps:

Create a variable called evaluationSum. This variable will store the running sum of evaluation scores.

Create a variable called activitySum.This variable will store the running sum of activity scores.

Create a variable called?potencySum.This variable will store the running sum of potency scores.

Our program would loop through the set of words in the text file one-at-a-time.

1."The" is not an adjective or an adverb, therefore, it is skipped.

2."little" is an adjective?

oLoop through the first (0th) column of the wordSet array to see if the word "little" exists in the CSV file.?It does! It's found at row 391 in the wordSet array.

add the value at wordSet[391][1] to the evaluationSum?

add the value at wordSet[391][2] to the activitySum

add the value at wordSet[391][3] to the potencySum

3.Repeat this process for each adjective and adverb in the text file.

Here is a text file with a portion of President Barack Obama's acceptance speech. Test your program by computing the evaluation, activity and potency score of this text.PoliticalSpeech.txt

The reason for considering only adjectives and adverbs is that these words carry the most emotional power. If a word from the text document is not included in the EPA word-list of the 1000 most frequently used English words, you can simply ignore the word and not include it in the EPA calculation.

Part 3

In this part of the assignment, you will code a game that I invented called WordMania. This is a game that is played between two players – let’s call these players the “reader” and the “selector’.

Here’s how it works:

There is a story in which some of the words are replaced by part-of-speech placeholder. For example, the story may be presented as follows:

Once upon a time, there was a %noun% and a %noun%. It was a %adjective% day when the %noun% %verb% to Kingston.

The reader will ask the selector to provide words for each of the part-of-speech “blanks”. In this case, the selector will be asked to provide 3 nouns, an adjective and a verb. The reader will then substitute each of the part-of-speech placeholders with the actual word chosen by the selector. When all the part-of-speech placeholders are replaced, the stories usually turn out to be quite humorous!

Reader: Please provide a noun

Selector: apple

Reader: Please provide a noun

Selector: boat

Reader: Please provide an adjective

Selector: spooky

Reader: Please provide a noun

Selector: umbrella

Reader: Please provide a verb

Selector: ran

Reader: Okay – once upon a time, there was an apple and a boat. It was a spooky day when the umbrella ran to Kingston.

Keep in mind that when the selector is choosing the words that will be used when replacing the placeholders, the selector has NOT seen the text. The user is presented with the text after they have provided words to replace all of the part-of-speech placeholders.The program that you are writing will simulate the READER. The user of your program will be the SELECTOR.

Here is a sample story that you can use:My Day at the Zoo

Additional Notes:

Your program will simulate the “reader”. That is, your program will prompt the user to provide a word for each of the part-of-speech placeholders in the story.

Your program will parse the text file by examining each word. If the current word is a part-of-speech placeholder, your program will prompt the user to enter a word. You can assume that all part-of-speech placeholders will be of the form: %POS%. For example, %noun% indicates that this specific part-of-speech placeholder should be replaced by a noun.

You will need to verify that the user enters a word that is of the correct part-of-speech using the part-of-speech tagger that we used in Part 2. For instance, if the part-of-speech placeholder is %noun% and the user enters the word “huge”, you will print out a message such as “Incorrect Part of Speech Supplied” or some other meaningful message and prompt the user to re-enter the word. If you’re familiar with exceptions, you may use exceptions, but this is not required. A simple console message will do for now.


版權所有:編程輔導網 2018 All Rights Reserved 聯系方式:QQ:99515681 電子信箱:[email protected]
免責聲明:本站部分內容從網絡整理而來,只供參考!如有版權問題可聯系本站刪除。

黑龙江体彩22选5