Search for text contained in a text file and remove them from another text file in java -


i have text file output java program finds frequency of people's names mentioned in multiple documents , writes them file (peoplenames.txt) this:

article1location\article1 name1:countofname1# name2:countofname2# name3:countofname3# ... article2location\article2 name1:countofname1# name2:countofname2# name3:countofname3# ... article3location\article3 name1:countofname1# name2:countofname2# name3:countofname3# ... 

the names correspond people names identified in each article along frequency appear in article, there 90,000 articles. have text file (titles.lst) contains list of 40 different titles , abbreviations (such mr., mrs., president, sir etc.) use list in file search , remove these titles peoplenames.txt. not sure how go in java new java , need modify original code in java produced peoplenames.txt accommodate title removal.

my program identifying person such mr john smith different john smith, removing titles give me more accurate count of names mentioned in articles.

thanks in advance help.

you can use regular expressions remove instances: public class test {

    public static void main( string[] args ) throws exception {         string s = "mr tom , ms jane";         s = s.replaceall("\\bmr\\b|\\bms\\b", "");         system.out.println(s);     } 

for sake of explaining comments:

    public static void main( string[] args ) throws exception {         string [] titles = args;         string regex = "\\b"+titles[0]+"\\b";         (int i=1; i<titles.length; i++) {             regex += "|\\b" + titles[i] + "\\b";         }          string s = "mr tom , ms jane";         s = s.replaceall(regex, "");         system.out.println(s);     } 

you can use replace option repeatedly rather building regular expression. don't know quicker. hazard guess depends on java implementation.

    public static void main( string[] args ) throws exception {         string [] titles = args;         string s = "mr tom , ms jane";         (int i=1; i<titles.length; i++) {             s = s.replaceall("\\b"+titles[0]+"\\b", "");         }         system.out.println(s);     } 

Popular posts from this blog

How to calculate SNR of signals in MATLAB? -

c# - Attempting to upload to FTP: System.Net.WebException: System error -

ios - UISlider customization: how to properly add shadow to custom knob image -