{"id":444,"date":"2014-07-04T14:40:08","date_gmt":"2014-07-04T18:40:08","guid":{"rendered":"http:\/\/josephpcohen.com\/cs210-summer2014\/?p=444"},"modified":"2014-07-04T14:40:08","modified_gmt":"2014-07-04T18:40:08","slug":"hw7","status":"publish","type":"post","link":"https:\/\/josephpcohen.com\/teaching\/cs210\/hw7\/","title":{"rendered":"HW7 : Word Count"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>Due: Monday 7\/14\/14 @5:30pm via users.cs.umb.edu:cs210\/hw7<\/p>\n<p><strong>Purpose:<\/strong><br \/>\nGain experience using a Map (aka Symbol Table aka Association List)<\/p>\n<p>1. Write a program WordCount that takes a file as an argument (as args[0])<\/p>\n<p>2. Read the file and gather statistics to be printed (using java HashMap or TreeMap)<\/p>\n<p>3. Find the total unique words. Add them to a Map and then count the size of the keyset.<\/p>\n<p>4. Store a count for each word and increment it every time you see that word. Hint (get() the count and then put() an updated count. Use this count to find the most and least common words.<\/p>\n<p>5. Keep track of the length of words using another Map. Use the length as a key and increment the number of words of that length. Sort the keyset for the map and then get the values when iterating over the sorted keyset. You can use new ArrayList(map.keySet()) to get an ArrayList of the keys. You can also use Collections.sort to sort the ArrayList.<\/p>\n<p>6. Write a memo.txt describing any problems you had during this assignment and what you learned. <\/p>\n<p>Notes:<br \/>\nMake sure you ignore case (Hello and hello are the same word).<br \/>\nMake sure you only count words. Words are composed of letters and &#8216;. (Edited to remove &#8211; from being in a valid word)<\/p>\n<p>Sample input files:<br \/>\nThe War in the Air by H. G. Wells &#8211; warair.txt<br \/>\nThe Art of War by Sun Tzu &#8211; artofwar.txt<br \/>\nRFC-2616 Hypertext Transfer Protocol (HTTP\/1.1) &#8211; rfc2616.txt<\/p>\n<p>The files needed are here: <a href=\"https:\/\/github.com\/ieee8023\/cs210-summer2014\" title=\"https:\/\/github.com\/ieee8023\/cs210-summer2014\" target=\"_blank\">https:\/\/github.com\/ieee8023\/cs210-summer2014<\/a><\/p>\n<p>Sample HashMap Usage:<\/p>\n<pre>\/\/ make a new map\nMap map = new HashMap();\n\t   \n\/\/ sample string to count\nString word = \"hello\";\n\t   \n\/\/ get the existing count of null\nInteger count = map.get(word);\n\n\/\/ if null then initialize\nif (count == null)\n\tcount = 0;\n\t   \n\/\/add one to count\ncount++;\n\t   \n\/\/ replace or add value\nmap.put(word, count);\n\n\/\/ get all the keys\nSet keys = map.keySet();\n\n\/\/ make the set into a list\nList keylist = new ArrayList(keys);\n\n\/\/ get all the values\nCollection values = map.values();\n\n\/\/ make the set into a list\nList valuelist = new ArrayList(values);\n<\/pre>\n<p>To configure your scanner to produce tokens which are only letters using the following code:<\/p>\n<pre>Scanner in = new Scanner(file);\nin.useDelimiter(\"[ tnrf,.()\/_?!-;:&amp;%@\"]+\");\n<\/pre>\n<p>A sample run of the program:<\/p>\n<pre>$ java WordCount artofwar.txt \n==============\nTotal unique words: 2183\nMost common word \"the\" used 697 times\nLeast common word \"omens\" used 1 time(s)\n==============\nLength 1: 288\nLength 2: 2161\nLength 3: 2243\nLength 4: 1736\nLength 5: 1135\nLength 6: 944\nLength 7: 784\nLength 8: 538\nLength 9: 357\nLength 10: 222\nLength 11: 118\nLength 12: 61\nLength 13: 29\nLength 14: 8\nLength 15: 5\nLength 19: 1\n==============\n<\/pre>\n<p>and<\/p>\n<pre>$ java WordCount warair.txt \n==============\nTotal unique words: 10072\nMost common word \"the\" used 6503 times\nLeast common word \"encased\" used 1 time(s)\n==============\nLength 1: 4507\nLength 2: 15753\nLength 3: 23732\nLength 4: 17943\nLength 5: 10903\nLength 6: 8211\nLength 7: 6768\nLength 8: 4752\nLength 9: 3238\nLength 10: 2157\nLength 11: 914\nLength 12: 507\nLength 13: 232\nLength 14: 106\nLength 15: 38\nLength 16: 5\nLength 17: 3\n==============\n<\/pre>\n<p><strong>Grading (total 10 points):<\/strong><\/p>\n<p>Turn in the following files: WordCount.java, memo.txt<\/p>\n<p>2 points: Total unique words<br \/>\n2 points: Most common word<br \/>\n2 points: Least common word<br \/>\n2 points: Distribution of word lengths<br \/>\n2 points: memo.txt, easy to grade.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Due: Monday 7\/14\/14 @5:30pm via users.cs.umb.edu:cs210\/hw7 Purpose: Gain experience using a Map (aka Symbol Table aka Association List) 1. Write a program WordCount that takes a file as an argument (as args[0]) 2. Read the file and gather statistics to be printed (using java HashMap or TreeMap) 3. Find the total unique words. Add&#8230;  <a href=\"https:\/\/josephpcohen.com\/teaching\/cs210\/hw7\/\" class=\"more-link\" title=\"Read HW7 : Word Count\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,1],"tags":[],"_links":{"self":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/posts\/444"}],"collection":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/comments?post=444"}],"version-history":[{"count":0,"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/posts\/444\/revisions"}],"wp:attachment":[{"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/media?parent=444"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/categories?post=444"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/josephpcohen.com\/teaching\/cs210\/wp-json\/wp\/v2\/tags?post=444"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}