Regular Expressions

Positive LookAhead

4 points

 

Introduction

You can use a regular expression to match characters at the end of a line or word. This is called a "positive lookahead." In this program, you'll read in a lot of words, while looking for words that end with 'the', like 'Novemberthe'. You'll want to identify these words and print them to an output file. Positive lookahead is discussed at 14:30 in the video linked to below.

 

The syntax for a positive lookahead is: (?=INF) where INF is what you're looking for at the end of the word. The 'INF' at the end of the word will not be included in the match; what is matched is what precedes INF. The leftmost thing you want to match is the beginning of the word, and for that you use the boundary matcher \\b. Then you have the word. Then, following the word, you may have a period and you may have a comma. You can use the syntax \\.? for the period. The 2 slashes indicate that are looking for an actual '.', not trying to use '.' as a special character. The ? matches the preceding element if the element occurs either 0 or 1 time, so it works for an element that may be present once.

 

Regexr: Feel free to experiment with any of these regular expressions ideas at https://regexr.com, which is also linked below.

 

So your task is to use a positive lookahead to find those words that end with 'the'. Print these words to an output file, and a well-known quote will appear.

 

To Get Started

Download LookAhead.txt, the input file. Also download regex8.java, which is the program that you'll modify.

 

Resources

Regexr website
https://regexr.com

 

Video:
https://www.youtube.com/watch?v=rhzKDrUiJVk

 

Regular Expressions tutorial (Oracle)
https://docs.oracle.com/javase/tutorial/essential/regex/

 

Pattern class in Java API:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html

 

Matcher class in Java API:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Matcher.html