Comparing .txt files and getting the difference [duplicate] - c#

This question already has answers here:
Get the differences of two files
(3 answers)
Closed 6 years ago.
I'm trying to compare two text files a.txt and b.txt, I want to get the difference between the two.
a.txt is the result from yesterday.
b.txt is the current result.
The tricky thing is that I wonna find out what is missing in "b.txt" compared to "a.txt" even tho there might have been added something new in "b.txt", these new objects needs to be excluded.
The two files is not ordered so what is in index 1 in 'a.txt' can be index 2 in 'b.txt'. I'm comparing string like "mano - mathias rønnow nørtoft".
All I had tried just ends up displaying the new objects aswell.
What I've tried:
string[] File1Lines = File.ReadAllLines(path);
string[] File2Lines = File.ReadAllLines(newPath);
List<string> NewLines = new List<string>();
for (int lineNo = 0; lineNo<File1Lines.Length; lineNo++)
{
if (!String.IsNullOrEmpty(File1Lines[lineNo]) && !String.IsNullOrEmpty(File2Lines[lineNo]))
{
if(String.Compare(File1Lines[lineNo], File2Lines[lineNo]) != 0)
NewLines.Add(File2Lines[lineNo]) ;
}
else if (!String.IsNullOrEmpty(File1Lines[lineNo]))
{
}
else
{
NewLines.Add(File2Lines[lineNo]);
}
}
if (NewLines.Count > 0)
{
File.WriteAllLines(resultpath, NewLines);
}
This just gives me the file merged. Hope I've explained my self correctly.
tried this, why is that not working? it displays nothing.
List<string> a = File.ReadAllLines(path).ToList();
List<string> b = File.ReadAllLines(newPath).ToList();
List<string> copy = new List<string>(a);
foreach (string s in copy)
{
if (b.Contains(s))
{
a.Remove(s);
}
else
{
continue;
}
}
myWriter.WriteLine(a);

That really depends on how accurate you want the diff to be and how fast you want it to be.
An easy implementation would be to get all lines of both A and B, foreach line in A, if B contains that line then remove that line from both A and B once. What's left would be the lines in A but not in B or wise versa.
Note that this method does not take ordering into consideration, so
Log 1
C
B
A
and
Log 2
A
B
C
are considered identical.
List<string> A;
List<string> B;
List<string> aCopy = new List(A);
foreach(string s in aCopy)
{
if (B.Contains(s))
{
A.Remove(s);
B.Remove(s);
}
}
//Whats in A are whats missing in B
//Whats in B are whats missing in A

You can join , sort and remove the equality string with a regex command
using System;
using System.Text;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string strFile4xf = File.ReadAllText(#"a.txt");
strFile4xf = Regex.Replace( strFile4xf, #"(.*?)\r", "$1a\r");
File.WriteAllText(#"a1.txt", strFile4xf);
string strFile4xe = File.ReadAllText(#"b.txt");
strFile4xe = Regex.Replace( strFile4xe, #"(.*?)\r", "$1b\r");
File.WriteAllText(#"b1.txt", strFile4xe);
string s4 = File.ReadAllText(#"a1.txt");
string s2 = File.ReadAllText(#"b1.txt");
string sn = string.Concat(s4, s2);
File.WriteAllText(#"join.txt", sn);
var contents = File.ReadAllLines("join.txt");
Array.Sort(contents);
File.WriteAllLines("join.txt", contents);
string strFile4x = File.ReadAllText(#"join.txt");
strFile4x = Regex.Replace( strFile4x, #"\n(.*?)a\r\n\1b\r", "");
File.WriteAllText(#"removeequal.txt", strFile4x);
var contents2 = File.ReadAllLines("removeequal.txt");
Array.Sort(contents2);
File.WriteAllLines("removeequal.txt", contents2);
string strFile4x2 = File.ReadAllText(#"removeequal.txt");
strFile4x2 = Regex.Replace( strFile4x, #"\n\r", "");
File.WriteAllText(#"blanklines.txt", strFile4x2);
}
}
this command match the repeat string \n(.*?)\r\n\1\r when this is sorted

Related

C# Replace a part of line and Write to new txt

Here the code that I create to replace some value at txt file.
I want to replace the value 0 with 3 for lines which do not start with "#".
using (StreamWriter sr = new StreamWriter(#"D:\Testing\Ticket_post_test.txt"))
foreach(string line in File.ReadLines(#"D:\Testing\Ticket_post.txt"))
{
string[] getFromLine = line.Split(' ');
if (getFromLine[0].Equals("#") == false)
{
if (getFromLine[10].Equals("0") == true) ;
(getFromLine[10]).Replace("0", "3");
}
sr.WriteLine(line);
}
Stuck at how to replace the 0 by 3 at line split[10] and write to a new txt file.
The txt file show below
*#* start time = 2021-12-03-15-14-55
*#* end time = 2021-12-03-15-15-41
*#* name = SYSTEM
bot 10 pad 11 d 4 e 6 t #0 **0** 2021-12-03-15-14-55 # - 2021-12-03-15-15-41
bot 11 pad 12 d 5 e 7 t #0 **0** 2021-12-03-15-14-55 # - 2021-12-03-15-15-41
bot 12 pad 13 d 6 e 8 t #0 **1** 2021-12-03-15-14-55 # - 2021-12-03-15-15-41
and more
Your code makes some erroneous assumptions, which I will correct here:
When you split a string using .Split or .Substring, you are not creating a little window/little windows into the original string. You are producing altogether new strings.
When you use .Replace, you are creating a new string with the altered values, not modifying the original in-place. See this question for more info on that.
This means that:
Your replace is a no-op (it does nothing of any meaning).
Your WriteLine is just writing the original line value back to the file without your changes.
We need to both fix your replace, and create the updated string to write to the file. As we are checking the value of getFromLine[10], we don't need .Replace at all, we can just set a new value:
using (StreamWriter sr = new StreamWriter(#"D:\Testing\Ticket_post_test.txt"))
{
foreach (string line in File.ReadLines(#"D:\Testing\Ticket_post.txt"))
{
string[] getFromLine = line.Split(' ');
if (getFromLine[0] != "#" && getFromLine[10] == "0")
{
getFromLine[10] = "3";
}
sr.WriteLine(String.Join(" ", getFromLine));
}
}
This isn't especially efficient, but it should get the job done. You could potentially modify it like this to avoid creating a new string when no changes have been made:
using (StreamWriter sr = new StreamWriter(#"D:\Testing\Ticket_post_test.txt"))
{
foreach (string line in File.ReadLines(#"D:\Testing\Ticket_post.txt"))
{
string[] getFromLine = line.Split(' ');
if (getFromLine[0] != "#" && getFromLine[10] == "0")
{
getFromLine[10] = "3";
sr.WriteLine(String.Join(" ", getFromLine));
}
else
{
sr.Write(line);
}
}
}
Note that you should probably also check the length of the array (i.e. if (getFromLine.Length >= 11 && getFromLine[0] != "#" && getFromLine[10] == "0") so that you don't get any IndexOutOfRangeException errors if you reach a line of the file that has less spaces than you expect (e.g. a blank line).
P.S. I've not tested this, so I've assumed that the rest of your logic is sound.
I think this simple example will suit to your working problem.
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
namespace Stack5
{
/// <summary>
/// Interaction logic for MainWindow.xaml
/// </summary>
public partial class MainWindow
{
public MainWindow()
{
InitializeComponent();
ReadAndWriteTextFile();
}
private static void ReadAndWriteTextFile()
{
// Read text file
// (?<string> ...) = name it with "string" name
// ^(?!#) = select line not begin with "#"
// (\S+\s){10} = bypass 10 group strings include a visible characters (\S) and a space (\s)
string pattern = #"(?<one>^(?!#)(\S+\s){10})0(?<two>.+)";
string substitution = #"${one}3${two}";
Regex regex = new Regex(pattern);
// Change the path to your path
List<string> lines = File.ReadLines(#".\settings.txt").ToList();
for (int i = 0; i < lines.Count(); i++)
{
// Check its value
Debug.WriteLine(lines[i]);
var match = Regex.Match(lines[i], pattern);
if (match.Success)
{
string r = regex.Replace(lines[i], substitution);
// Check its value
Debug.WriteLine("Change the line {0} to: {1}",i+1,r);
lines[i] = r;
}
}
// Write text file
File.WriteAllLines(#".\settings.txt",lines);
}
}
}
I tested it in my local machine, hope this helps you!
Ask me if you need, good to you good to me : D

C# How can I assign one identifier to multiple substrings without using If statements?

I created a program that removes homonyms from two strings. Then it compares the two strings to see if they are equal. It works well, but I am having a problem.
I have a huge array of Homonyms. This isn't the full list:
string[] Homonyms = new string[] {
"to", "too", "two", "for", "four", "theyre", "there", "their", "see", "sea", "by", "buy", "bye", "past", "passed",
"witch", "which", "whose", "whos", "hole", "whole", "right", "write", "serial", "cereal", "principle", "principal",
These two strings need to be equal:
s1 = "There are three seas";
s2 = "Their are 3 sees";
These two strings should not be equal:
s1 = "The bee is by the sea";
s2 = "The sea is by the bee";
I could create hundreds of If statements like this:
if (word == "sea" || word == "see")
word = "ID123"; // identify which homonym
Do you know how I can do this easily without hundreds of If statements?
As suggested in the comments the fact that a word exists in the homonym array doesn't mean that two words are indeed homonyms of each other.
With that in mind what we have to do is to create a container that does groups the homonyms words.
I have created an exhaustive list of homonyms in a text file grab it here
The text file can be added into your solution.
So the a huge array of Homonyms problem is solved.
As you can see each line in the file contains the homonyms that refer to each other ( they are separated by a '/' ) - now we can traverse the file and store them into an object:
public class Mapping
{
public char Sort { get; set; }
public List<string> Homonyms { get; set; }
}
For example there/they’re/their will be stored in the same object.
The logic will be simple and it assumes that the two strings (s1 and s2) have the same numbers of words.
We analyse one word at the time from both strings(wordString1 and wordString2). If the word are the same is all good.
If the words are not the same - we search the list of mapping and see if it contains wordString1. If that is the case we check whether the same object contain wordString2.
If so wordString1 and wordString2 are homonyms and we continue the analysis. Otherwise we stop as it is obvious that the string s1 and s2 are not the same.
Here you can see both examples provided. Note that 3 and three are not considered homonyms in the English dictionary - therefor I have removed them. Feel free to add your numeric mapping if necessary.
Here is the complete code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApp2
{
public class Program
{
public class Mapping
{
public char Sort { get; set; }
public List<string> Homonyms { get; set; }
}
static void Main()
{
bool areTheSame = true;
var string1 = "There are seas";
var string2 = "Their are sees";
//var string1 = "The bee is by the sea";
//var string2 = "The sea is by the bee";
var s1 = string1.Split(' ').Select(x => x.ToLower()).ToArray();
var s2 = string2.Split(' ').Select(x => x.ToLower()).ToArray();
List<string> homonyms = File.ReadAllLines(#"C:\Users\Alex\source\repos\ConsoleApp2\ConsoleApp2\TextFile1.txt").ToList();
List<Mapping> mapping = new List<Mapping>();
foreach (string item in homonyms)
{
var g = item.Split('/');
Mapping element = new Mapping();
element.Sort = g[0].ToUpper()[0];
element.Homonyms = new List<string>();
element.Homonyms.AddRange(g.Select(x => x.ToLower()).ToList());
mapping.Add(element);
}
Console.WriteLine("Analising...'{0}' and '{1}'", string1,string2);
for (int i = 0; i < s1.Count(); i++)
{
string wordString1 = s1[i];
string wordString2 = s2[i];
Console.WriteLine("Word '{0}' and word '{1}'", wordString1, wordString2);
if (wordString1 != wordString2)
{
//check whether they are Homonyms
var sort = wordString1.ToUpper()[0];
var potentiallHomonym = mapping.Where(item => item.Sort == sort && item.Homonyms.Contains(wordString1)).ToList().FirstOrDefault();
if (potentiallHomonym != null)
{
if (potentiallHomonym.Homonyms.Contains(wordString2))
{
Console.WriteLine("Those words are Homonyms, enter to continue analising.");
Console.ReadLine();
}
else
{
areTheSame = false;
Console.WriteLine("Those words are not Homonyms.");
Console.ReadLine();
break;
}
}
}
else
{
Console.WriteLine("Those words are the same");
Console.ReadLine();
}
}
if (areTheSame)
{
Console.WriteLine("The strings are the same");
}
else
{
Console.WriteLine("The strings are not the same");
}
Console.ReadLine();
}
}
}

Splitting string into an array on 16th char [duplicate]

This question already has answers here:
Splitting a string into chunks of a certain size
(39 answers)
Split string after certain character count
(4 answers)
Closed 8 years ago.
I have a text file with various 16 char strings both appended to one another and on separate lines. I've done this
FileInfo f = new FileInfo("d:\\test.txt");
string FilePath = ("d:\\test.txt");
string FileText = new System.IO.StreamReader(FilePath).ReadToEnd().Replace("\r\n", "");
CharCount = FileText.Length;
To remove all of the new lines and create one massively appended string. I need now to split this massive string into an array. I need to split it up on the consecutive 16th char until the end. Can anyone guide me in the right direction? I've taken a look at various methods in String such as Split and in StreamReader but am confused as to what the best way to go about it would be. I'm sure it's simple but I can't figure it out.
Thank you.
Adapting the answer from here:
You could try something like so:
string longstr = "thisisaverylongstringveryveryveryveryverythisisaverylongstringveryveryveryveryvery";
IEnumerable<string> splitString = Regex.Split(longstr, "(.{16})").Where(s => s != String.Empty);
foreach (string str in splitString)
{
System.Console.WriteLine(str);
}
Yields:
thisisaverylongs
tringveryveryver
yveryverythisisa
verylongstringve
ryveryveryveryve
ry
One possible solution could look like this (extracted as extension method and made dynamic, in case different token size is needed and no hard-coded dependencies):
public static class ProjectExtensions
{
public static String[] Chunkify(this String input, int chunkSize = 16)
{
// result
var output = new List<String>();
// temp helper
var chunk = String.Empty;
long counter = 0;
// tokenize to 16 chars
input.ToCharArray().ToList().ForEach(ch =>
{
counter++;
chunk += ch;
if ((counter % chunkSize) == 0)
{
output.Add(chunk);
chunk = String.Empty;
}
});
// add the rest
output.Add(chunk);
return output.ToArray();
}
}
The standard usage (16 chars) looks like this:
// 3 inputs x 16 characters and 1 x 10 characters
var input = #"1234567890ABCDEF1234567890ABCDEF1234567890ABCDEF1234567890";
foreach (var chunk in input.Chunkify())
{
Console.WriteLine(chunk);
}
The output is:
1234567890ABCDEF
1234567890ABCDEF
1234567890ABCDEF
1234567890
Usage with different token size:
foreach (var chunk in input.Chunkify(13))
{
Console.WriteLine(chunk);
}
and the corresponding output:
1234567890ABC
DEF1234567890
ABCDEF1234567
890ABCDEF1234
567890
It is not a fancy solution (and could propably be optimised for speed), but it works and it is easy to understand and implement.
Create a list to hold your tokens. Then get subsequent substrings of length 16 and add them to the list.
List<string> tokens = new List<string>();
for(int i=0; i+16<=FileText.Length; i+=16) {
tokens.Add(FileText.Substring(i,16));
}
As mentioned in the comments, this ignores the last token if it has less than 16 characters. If you want it anyway you can write:
List<string> tokens = new List<string>();
for(int i=0; i<FileText.Length; i+=16) {
int len = Math.Min(16, FileText.Length-i));
tokens.Add(FileText.Substring(i,len);
}
Please try this method. I haven't tried it , but used it once.
int CharCount = FileText.Length;
int arrayhold = (CharCount/16)+2;
int count=0;
string[] array = new string[arrayhold];
for(int i=0; i<FileText.Length; i+=16)
{
int currentleft = FileText.Length-(16*count);
if(currentleft>16)
{
array[count]=FileText.Substring(i,16);
}
if(currentleft<16)
{
array[count]=FileText.Substring(i,currentleft);
}
count++;
}
This is the new code and provide a TRUE leftovers handling. Tested in ideone
Hope it works
Try this one:
var testArray = "sdfsdfjshdfalkjsdfhalsdkfjhalsdkfjhasldkjfhasldkfjhasdflkjhasdlfkjhasdlfkjhasdlfkjhasldfkjhalsjfdkhklahjsf";
var i = 0;
var query = from s in testArray
let num = i++
group s by num / 16 into g
select new {Value = new string(g.ToArray())};
var result = query.Select(x => x.Value).ToList();
result is List containing all the 16 char strings.

How to edit specific lines in a text document? [duplicate]

This question already has answers here:
Edit a specific Line of a Text File in C#
(6 answers)
Closed 8 years ago.
Could anyone tell me how to edit a specific line in a text document?
For instance, lets say that my document contains two phone numbers:
"0889367882
0887343160"
I want to delete the second number and write a new phone number, how can I do that?
I am printing the text in the document, but i don't know how to choose which line to edit
and how to do that.
string path = #"C:\Users\...\text1.txt";
string[] lines = File.ReadAllLines(path);
int i = 0;
foreach (var line in lines)
{
i++;
Console.WriteLine("{0}. {1}", i, line);
}
Thanks!
Simply use string.replace.
Like this:
if(line.Contains("0887343160")
line = line.Replace("0887343160", "0889367882");
and after replacing, write all lines back in the file.
A better version would be to iterate the lines in the file rather than loading the whole file lines to memory. Hence using an iterator would do best here.
We do a MoveNext() on the iterator object and write the current line pointed by the iterator to the file after executing the necessary replace logic.
StreamWriter wtr = new StreamWriter("out.txt");
var e = File.ReadLines(path).GetEnumerator();
int lineno = 12; //arbitrary
int counter = 0;
string line = string.Empty;
while(e.MoveNext())
{
counter++;
if(counter == lineno)
line = replaceLogic(e.Current);
else
line = e.Current;
wtr.WriteLine(line);
}
wtr.Close();
Solution 1: if you want to remove the Line based on user input String (matches with one of the line from file) you can try this.
string path = #"C:\Data.txt";
string[] lines = File.ReadAllLines(path);
String strRemove = "8971820518";
List<String> lst = new List<String>();
for(int i=0;i<lines.Length;i++)
{
if (!lines[i].Equals(strRemove)) //if string is part of line use Contains()
{
lst.Add(lines[i]);
}
}
File.WriteAllLines(path,lst.ToArray());
Solution 2: if you want to remove the Line based on user input LineNO (matched with exact line no in file) you can try this
string path = #"C:\Data.txt";
string[] lines = File.ReadAllLines(path);
int iRemoveLineNo = 6;
List<String> lst = new List<String>();
for(int i=0;i<lines.Length;i++)
{
if (iRemoveLineNo-1!=i)
{
lst.Add(lines[i]);
}
}
File.WriteAllLines(path,lst.ToArray());

Getting parts of a string and combine them in C#?

I have a string like this: C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg
Now, what I want to do is to dynamically combine the last 4 numbers, in this case its 10000080 as result. My idea was ti split this and combine them in some way, is there an easier way? I cant rely on the array index, because the path can be longer or shorter as well.
Is there a nice way to do that?
Thanks :)
A compact way using string.Join and Regex.Split.
string text = #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
string newString = string.Join(null, Regex.Split(text, #"[^\d]")); //10000080
Use String.Split
String toSplit = "C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
String[] parts = toSplit.Split(new String[] { #"\" });
String result = String.Empty;
for (int i = 5, i > 1; i--)
{
result += parts[parts.Length - i];
}
// Gives the result 10000080
You can rely on array index if the last part always is the filename.
since the last part is always
array_name[array_name.length - 1]
the 4 parts before that can be found by
array_name[array_name.length - 2]
array_name[array_name.length - 3]
etc
If you always want to combine the last four numbers, split the string (use \ as the separator), start counting from the last part and take 4 numbers, or the 4 almost last parts.
If you want to take all the digits, just scan the string from start to finish and copy just the digits to a new string.
string input = "C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
string[] parts = toSplit.Split(new char[] {'\\'});
IEnumerable<string> reversed = parts.Reverse();
IEnumerable<string> selected = reversed.Skip(1).Take(4).Reverse();
string result = string.Concat(selected);
The idea is to extract the parts, reverse them to keep only the last 4 (excluding the file name) and re reversing to rollback to the initial order, then concat.
Using LINQ:
string path = #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
var parts = Path.GetDirectoryName(path).Split('\\');
string numbersPart = parts.Skip(parts.Count() - 4)
.Aggregate((acc, next) => acc + next);
Result: "10000080"
var r = new Regex(#"[^\d+]");
var match = r
.Split(#"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg")
.Aggregate((i, j) => i + j);
return match.ToString();
to find the number you can use regex:
(([0-9]{2})\\){4}
use concat all inner Group ([0-9]{2}) to get your searched number.
This will always find your searched number in any position in the given string.
Sample Code:
static class TestClass {
static void Main(string[] args) {
string[] tests = { #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg",
#"C:\Projects\test\whatever\files\media\10\00\00\80\some\foldertest.jpg",
#"C:\10\00\00\80\test.jpg",
#"C:\10\00\00\80\test.jpg"};
foreach (string test in tests) {
int number = ExtractNumber(test);
Console.WriteLine(number);
}
Console.ReadLine();
}
static int ExtractNumber(string path) {
Match match = Regex.Match(path, #"(([0-9]{2})\\){4}");
if (!match.Success) {
throw new Exception("The string does not contain the defined Number");
}
//get second group that is where the number is
Group #group = match.Groups[2];
//now concat all captures
StringBuilder builder = new StringBuilder();
foreach (var capture in #group.Captures) {
builder.Append(capture);
}
//pares it as string and off we go!
return int.Parse(builder.ToString());
}
}

Categories

Resources