Regular Expressions

Extracting Text from a Larger File

4 points

 

Introduction

This exercise does not use a regular expression, but it is in the spirit of a regular expression. In this program, you read in a series of words (from cummingsPoemWords.txt) that occur in a poem by American poet e.e. cummings. You also read in a file (cummingsPoemIsInThere.txt) that has a lot of words, and the words of the poem are embedded in the file, in the order in which they occur in the poem. Your job is to write a program that locates the words that are in the poem, and ultimately print the poem.

 

Read in the words from both files. In the loop that reads in cummingsPoemIsInThere.txt, set up an inner loop that steps through each word, and check if the word occurs in cummingsPoemWords.txt. If it does, add the word onto a cumulative String. When the loops terminate, print the poem to the screen.

 

Some possibly useful Java syntax:
- After reading a line from an input file, it is possible to break up the line into separate words and store them in an array of Strings.
words = line.split(" "); accomplishes this, where " " indicates that a split occurs when a space is encountered.
- The String method trim() deletes any spaces before or after a String: word = word.trim().

 

To Get Started

Download cummingsPoemWords.txt, which lists the words that occur in the poem in alphabetical order. Also download cummingsPoemIsInThere.txt, which has a lot of irrelevant words but also has the poem in it. Finally, download regex3.java, where you can write the program.

 

Resources

Regexr website
https://regexr.com

 

Video:
https://www.youtube.com/watch?v=rhzKDrUiJVk

 

Regular Expressions tutorial (Oracle)
https://docs.oracle.com/javase/tutorial/essential/regex/

 

Pattern class in Java API:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html

 

Matcher class in Java API:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Matcher.html