Mister Muv

Mister Muv

  • NA
  • 1
  • 0

C# - Remove duplicate lines from a text file

Dec 11 2008 4:04 PM

Hi,

I am trying to remove duplicate lines from a text file. To make things difficult the lines contain non unique timestamps but a unique reference number. Some of the duplicates amount to 10 lines whereas others can only be 2 lines.

1. Here are some examples of duplicates lines:<timestamp>,<reference>,<error message>

08:47:22,95847170050,Problem inputting data.
08:48:28,96672540040,More problems inputting data.
08:49:29,95847170050,Problem inputting data.
08:55:28,106622510040,Extra issues inputting data.
08:56:35,95847170050,Problem inputting data.
08:57:35,106622510040,Extra issues inputting data.
09:02:35,96672540040,More problems inputting data.
09:03:41,96672540040,More problems inputting data.
09:04:41,106622510040,Extra issues inputting data.

I want to delete all but KEEP the most recent duplicate line.

I am new to c#, I originally wrote a java program to do this but was told to rewrite in c#.

 

To assist here is the java code.

/*
Contents of the text file is read into an ArrayList (allData)
Unique reference values are then extracted from allData and populated into references (another ArrayList)
*/
 
static DateFormat df = new SimpleDateFormat("HH:mm:ss");
 
...
 
ArrayList latest = getLatestEntries(allData, references);
 
 
private static ArrayList getLatestEntries(ArrayList allData, ArrayList references) {
        // For each reference, save the latest entry.
        ArrayList list = new ArrayList();
        for(int i = 0; i < references.size(); i++) {
               String ref = references.get(i).toString();
               Date date = null;
               int maxValIndex = i;
               //System.out.printf("ref = %s%n", ref);
               for(int j = 0; j < allData.size(); j++) {
                       String next = allData.get(j).toString();
                       if(next.split(",")[1].equals(ref)) {
                               Date nextDate = parse(next.split(",")[0]);
                               if(date == null) {
                                      date = nextDate;
                                      maxValIndex = j;
                                      continue;
                               }
                               if(nextDate.compareTo(date) > 0) {
                                      date = nextDate;
                                      maxValIndex = j;
                               }
                       }
               }
               list.add(allData.get(maxValIndex));
        }
return list;
} // getLatestEntries
 
private static Date parse(String s) {
        try {
               return df.parse(s);
        } catch(ParseException e) {
               System.out.println("read error: " + e.getMessage());
               System.out.println("parse error: " + e.getMessage());
               return null;
        }
} //parse

 

I know the code will be more or less similar with some capitalisation changes and System.out.println to Console.WriteLine but I am struggling with the Date to DateTime conversion.

 

Can someone help?


Thank you in advance.

 


Answers (1)