C# using regex to replace value only after = sign - c#

ok I have a text file that contains:
books_book1 = 1
books_book2 = 2
books_book3 = 3
I would like to retain "books_book1 = "
so far I have:
string text = File.ReadAllText("settings.txt");
text = Regex.Replace(text, ".*books_book1*.", "books_book1 = a",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book2*.", "books_book2 = b",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book3*.", "books_book3 = c",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
this results in:
books_book1 = a=1
output to file should be:
books_book1 = a
books_book2 = b
books_book3 = c
Thanks much in advance...

In a comment I stated:
"I would personally just go for recreating the file if it is that simple. Presumably you load all the values from the file into an object of some kind initially so just use that to recreate the file with the new values. Much easier than messing with Regularexpressions - its simpler and easier to test and see what is going on and easier to change if you ever need to."
I think having looked at this again it is even more true.
From what you said in comments: "when the program loads it reads the values from this text file, then the user has an option to change the values and save it back to the file". Presumably this means that you need to actually know which of the books1, books2, etc. lines you are replacing so you know which of the user supplied values to put in. This is fine (though a little unwieldy) with three items but if you increase that number then you'll need to update your code for every new item. This is never a good thing and will quickly produce some very horrendous looking code liable to get bugs in.
If you have your new settings in some kind of data structure (eg a dictionary) then as I say recreating the file from scratch is probably easiest. See for example this small fully contained code snippet:
//Set up our sample Dictionary
Dictionary<string, string> settings = new Dictionary<string,string>();
settings.Add("books_book1","a");
settings.Add("books_book2","b");
settings.Add("books_book3","c");
//Write the values to file via an intermediate stringbuilder.
StringBuilder sb = new StringBuilder();
foreach (var item in settings)
{
sb.AppendLine(String.Format("{0} = {1}", item.Key, item.Value));
}
File.WriteAllText("settings.txt", sb.ToString());
This has obvious advantages of being simpler and that if you add more settings then they will just go into the dictionary and you don't need to change the code.

I don't think this is the best way to solve the problem, but to make the RegEx do what you want you can do the following:
var findFilter = #"(.*books_book1\s*=\s)(.+)";
var replaceFilter = "${1}a"
text = Regex.Replace(text, findFilter, replaceFilter, RegexOptions.Multiline)
File.WriteLine("settings.txt", text);
....
The code between the ( and ) in the regex is in this case the first and only back reference capturing group and ${1} in the replace portion will use the matching group text to create the output you want. Also you'll notice I used \s for white space so you don't match book111 for example. I'm sure there are other edge cases you'll need to deal with.
books_book1 = a
...

Here's the start to a more generic approach:
This regular expression captures the last digit, taking care to account for variability in digit and whitespace length.
text = Regex.Replace(text , #"(books_book\d+\s*=\s*)(\d+)", DoReplace)
// ...
string DoReplace(Match m)
{
return m.Groups[1].Value + Convert.ToChar(int.Parse(m.Groups[2].Value) + 96);
}

How about something like this (no error checking):
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace TestRegex
{
class Program
{
static void Main( string[] args )
{
var path = #"settings.txt";
var pattern = #"(^\s*books_book\d+\s*=\s*)(\d+)(\s*)$";
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var contents = Regex.Replace( File.ReadAllText( path ), pattern, MyMatchEvaluator, options );
File.WriteAllText( path, contents );
}
static int x = char.ConvertToUtf32( "a", 0 );
static string MyMatchEvaluator( Match m )
{
var x1 = m.Groups[ 1 ].Value;
var x2 = char.ConvertFromUtf32( x++ );
var x3 = m.Groups[ 3 ].Value;
var result = x1 + x2 + x3;
return result;
}
}
}

Related

Optimize an iteration of IEnumerable<string> [duplicate]

for long time , I always append a string in the following way.
for example if i want to get all the employee names separated by some symbol , in the below example i opeted for pipe symbol.
string final=string.Empty;
foreach(Employee emp in EmployeeList)
{
final+=emp.Name+"|"; // if i want to separate them by pipe symbol
}
at the end i do a substring and remove the last pipe symbol as it is not required
final=final.Substring(0,final.length-1);
Is there any effective way of doing this.
I don't want to appened the pipe symbol for the last item and do a substring again.
Use string.Join() and a Linq projection with Select() instead:
finalString = string.Join("|", EmployeeList.Select( x=> x.Name));
Three reasons why this approach is better:
It is much more concise and readable
– it expresses intend, not how you
want to achieve your goal (in your
case concatenating strings in a
loop). Using a simple projection with Linq also helps here.
It is optimized by the framework for
performance: In most cases string.Join() will
use a StringBuilder internally, so
you are not creating multiple strings that are
then un-referenced and must be
garbage collected. Also see: Do not
concatenate strings inside loops
You don’t have to worry about special cases. string.Join()
automatically handles the case of
the “last item” after which you do
not want another separator, again
this simplifies your code and makes
it less error prone.
I like using the aggregate function in linq, such as:
string[] words = { "one", "two", "three" };
var res = words.Aggregate((current, next) => current + ", " + next);
You should join your strings.
Example (borrowed from MSDN):
using System;
class Sample {
public static void Main() {
String[] val = {"apple", "orange", "grape", "pear"};
String sep = ", ";
String result;
Console.WriteLine("sep = '{0}'", sep);
Console.WriteLine("val[] = {{'{0}' '{1}' '{2}' '{3}'}}", val[0], val[1], val[2], val[3]);
result = String.Join(sep, val, 1, 2);
Console.WriteLine("String.Join(sep, val, 1, 2) = '{0}'", result);
}
}
For building up like this, a StringBuilder is probably a better choice.
For your final pipe issue, simply leave the last append outside of the loop
int size = EmployeeList.length()
for(int i = 0; i < size - 1; i++)
{
final+=EmployeeList.getEmployee(i).Name+"|";
}
final+=EmployeeList.getEmployee(size-1).Name;

When I try to escape a double quote in C# I get multiple instead of just one

For the purposes of my program, I need to read in data from a CSV file, process it, and then create a new batch of CSV files based on the data I read in. I am using the CsvHelper library if that makes a difference.
The program works perfectly, except for one part. I need my output CSV files to have each field wrapped in double quotes. Normally I would just use the escape mechanism for c#, but I ran into a bizarre issue that I can't seem to work around. Whenever I try to escape a " mark, I end up getting more than just the one.
Desired output example:
date_start,date_end,current_year
"2017-08-01","2018-06-20","2017"
"2017-08-01","2018-06-20","2017"
Observed output:
date_start,date_end,current_year
"""2017-08-01""","""2018-06-20""","""2017"""
"""2017-08-01""","""2018-06-20""","""2017"""
It puts in a pair of double quotes when I am just looking for one. I've looked around online and tried several different approaches to try and get just the one double quote, but I can't seem to make it work. Here is the relevant code and the different methods I have tried:
HashSet<CustomDataWrapper> hs = new HashSet<CustomDataWrapper>();
foreach (CustomDataType record in records) {
string quote = "\"";
String method1 = (String)(record.date_start);
String method2 = quote + record.date_start + quote;
String method3 = "\"" + record.date_start + "\"";
String method4 = "\U0022" + record.date_start + "\U0022"
hs.Add(data);
}
csvWriter.WriteRecords(hs);
I am honestly not sure what's going on here, as I've tried adding the quote to just one side of the string and got this output:
date_start,date_end,current_year
"2017-08-01""","2018-06-20""","2017"""
"2017-08-01""","2018-06-20""","2017"""
Any help would be greatly appreciated. Thank you!
Stop adding quotes to the fields. Try creating configuration object and set the 'QuoteAllFields' to true and pass this object to CSVWriter as below
var config = new CsvHelper.Configuration.CsvConfiguration();
config.QuoteAllFields = true;
using (var csvReader = new CsvHelper.CsvReader(streamReader, config))
{
}
Hope this works.
Here's what's happening. Let's start with this code:
var records = new []
{
new { H1 = "Foo", H2 = "Bar" },
new { H1 = "Qaz", H2 = "Waz" }
};
using (var tw = new System.IO.StringWriter())
{
using (var cw = new CsvHelper.CsvWriter(tw))
{
cw.WriteRecords(records);
var output = tw.ToString();
Console.WriteLine(output);
}
}
If I run that I get this output:
H1,H2
Foo,Bar
Qaz,Waz
If I insert one double quote into one field of the source data, like so:
var records = new []
{
new { H1 = "Foo", H2 = "Bar" },
new { H1 = "Qaz\"", H2 = "Waz" }
};
...then the output becomes:
H1,H2
Foo,Bar
"Qaz""",Waz
The library knows to be able to output a double quote as part of the data then it must wrap the entire field in double quotes and it must escape the double quote within the field, turning a single " into "".

Finding longest word in string

Ok, so I know that questions LIKE this have been asked a lot on here, but I can't seem to make solutions work.
I am trying to take a string from a file and find the longest word in that string.
Simples.
I think the issue is down to whether I am calling my methods on a string[] or char[], currently stringOfWords returns a char[].
I am trying to then order by descending length and get the first value but am getting an ArgumentNullException on the OrderByDescending method.
Any input much appreciated.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Threading.Tasks;
namespace TextExercises
{
class Program
{
static void Main(string[] args)
{
var fileText = File.ReadAllText(#"C:\Users\RichardsPC\Documents\TestText.txt");
var stringOfWords = fileText.ToArray();
Console.WriteLine("Text in file: " + fileText);
Console.WriteLine("Words in text: " + fileText.Split(' ').Length);
// This is where I am trying to solve the problem
var finalValue = stringOfWords.OrderByDescending(n => n.length).First();
Console.WriteLine("Largest word is: " + finalValue);
}
}
}
Don't split the string, use a Regex
If you care about performance you don't want to split the string. The reason in order to do the split method will have to traverse the entire string, create new strings for the items it finds to split and put them into an array, computational cost of more than N, then doing an order by you do another (at least) O(nLog(n)) steps.
You can use a Regex for this, which will be more efficient, because it will only iterate over the string once
var regex = new Regex(#"(\w+)\s",RegexOptions.Compiled);
var match = regex.Match(fileText);
var currentLargestString = "";
while(match.Success)
{
if(match.Groups[1].Value.Length>currentLargestString.Length)
{
currentLargestString = match.Groups[1].Value;
}
match = match.NextMatch();
}
The nice thing about this is that you don't need to break the string up all at once to do the analysis and if you need to load the file incrementally is a fairly easy change to just persist the word in an object and call it against multiple strings
If you're set on using an Array don't order by just iterate over
You don't need to do an order by your just looking for the largest item, computational complexity of order by is in most cases O(nLog(n)), iterating over the list has a complexity of O(n)
var largest = "";
foreach(var item in strArr)
{
if(item.Length>largest.Length)
largest = item;
}
Method ToArray() in this case returns char[] which is an array of individual characters. But instead you need an array of individual words. You can get it like this:
string[] stringOfWords = fileText.Split(' ');
And you have a typo in your lambda expression (uppercase L):
n => n.Length
Try this:
var fileText = File.ReadAllText(#"C:\Users\RichardsPC\Documents\TestText.txt");
var words = fileText.Split(' ')
var finalValue = fileText.OrderByDescending(n=> n.Length).First();
Console.WriteLine("Longest word: " + finalValue");
As suggested in the other answer, you need to split your string.
string[] stringOfWords = fileText.split(new Char [] {',' , ' ' });
//all is well, now let's loop over it and see which is the biggest
int biggest = 0;
int biggestIndex = 0;
for(int i=0; i<stringOfWords.length; i++) {
if(biggest < stringOfWords[i].length) {
biggest = stringOfWords[i].length;
biggestIndex = i;
}
}
return stringOfWords[i];
What we're doing here is splitting the string based on whitespace (' '), or commas- you can add an unlimited number of delimiters there - each word, then, gets its own space in the array.
From there, we're iterating over the array. If we encounter a word that's longer than the current longest word, we update it.

Appending a string in a loop in effective way

for long time , I always append a string in the following way.
for example if i want to get all the employee names separated by some symbol , in the below example i opeted for pipe symbol.
string final=string.Empty;
foreach(Employee emp in EmployeeList)
{
final+=emp.Name+"|"; // if i want to separate them by pipe symbol
}
at the end i do a substring and remove the last pipe symbol as it is not required
final=final.Substring(0,final.length-1);
Is there any effective way of doing this.
I don't want to appened the pipe symbol for the last item and do a substring again.
Use string.Join() and a Linq projection with Select() instead:
finalString = string.Join("|", EmployeeList.Select( x=> x.Name));
Three reasons why this approach is better:
It is much more concise and readable
– it expresses intend, not how you
want to achieve your goal (in your
case concatenating strings in a
loop). Using a simple projection with Linq also helps here.
It is optimized by the framework for
performance: In most cases string.Join() will
use a StringBuilder internally, so
you are not creating multiple strings that are
then un-referenced and must be
garbage collected. Also see: Do not
concatenate strings inside loops
You don’t have to worry about special cases. string.Join()
automatically handles the case of
the “last item” after which you do
not want another separator, again
this simplifies your code and makes
it less error prone.
I like using the aggregate function in linq, such as:
string[] words = { "one", "two", "three" };
var res = words.Aggregate((current, next) => current + ", " + next);
You should join your strings.
Example (borrowed from MSDN):
using System;
class Sample {
public static void Main() {
String[] val = {"apple", "orange", "grape", "pear"};
String sep = ", ";
String result;
Console.WriteLine("sep = '{0}'", sep);
Console.WriteLine("val[] = {{'{0}' '{1}' '{2}' '{3}'}}", val[0], val[1], val[2], val[3]);
result = String.Join(sep, val, 1, 2);
Console.WriteLine("String.Join(sep, val, 1, 2) = '{0}'", result);
}
}
For building up like this, a StringBuilder is probably a better choice.
For your final pipe issue, simply leave the last append outside of the loop
int size = EmployeeList.length()
for(int i = 0; i < size - 1; i++)
{
final+=EmployeeList.getEmployee(i).Name+"|";
}
final+=EmployeeList.getEmployee(size-1).Name;

How can I efficiently process a delimited text file?

I'm simply trying to execute File.ReadAllLines against a specific file and, for every line, split on |. I have to use regex on this one.
This code below doesnt work, but you'll see what i'm trying to do:
string[] contents = File.ReadAllLines(filename);
string[] splitlines = Regex.Split(contents, '|');
foreach (string split in splitlines)
{
//Regex line = content.Split('|');
//content.Split('|');
string prefix = prefix = Regex.Match(line, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
It's not entirely clear to me what you are trying to do, but there are a number of errors in your code. I have tried to guess what you are doing, but if this isn't what you want, please explain what you do want preferably with some examples:
string inputFilename = "input.txt";
string outputFilename = "output.txt";
using (StreamWriter streamWriter = File.AppendText(outputFilename))
{
using (StreamReader streamReader = File.OpenText(inputFilename))
{
while (true)
{
string line = streamReader.ReadLine();
if (line == null)
{
break;
}
string[] splitlines = line.Split('|');
foreach (string split in splitlines)
{
Match match = Regex.Match(split, #"\S+\d+");
if (match.Success)
{
string prefix = match.Groups[0].Value;
streamWriter.WriteLine(prefix);
}
else
{
// Handle match failed...
}
}
}
}
}
Key points:
You seem to want to perform an operation on each line, so you need to iterate over the lines.
Use the simple string.Split method if you want to split on a single character. Regex.Split doesn't accept a character and "|" has a special meaning in regular expressions so it wouldn't have worked anyway unless you escaped it.
You were opening and closing the output file multiple times. You should open it just once and keep it open until you have finished writing to it. The using keyword is useful here.
Use WriteLine instead of appending "\r\n".
If the input file is large, use a StreamReader instead of ReadAllLines.
If the match fails, your program will throw an exception. You probably should check match.Success before using the match and if this returns false, handle the error appropriately (skip the line, report a warning, throw an exception with an appropriate message, etc.)
You aren't actually using groups 1 and 2 in the regular expression, so you can remove the parentheses to save the regular expression engine from having to store results that you won't use anyway.
You should pass the original string to Regex.Split and not an array.
Looks like you are using line instead of split when settings the prefix. Without knowing more about your code I cant tell if it's right or not but in any case it sticks out as the error.(it shouldnt build either)
This is a really inefficient on at least two levels :)
Regex.Split takes a string, not an array of strings.
I would recommend calling Regex.Split on each item of contents individually, then looping over the results of that call. This would mean nested for loops.
string[] contents = File.ReadAllLines(filename);
foreach (string line in contents)
{
string[] splitlines = Regex.Split(line);
foreach (string splitline in splitlines)
{
string prefix = Regex.Match(splitline, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
}
This, of course isn't the most efficient way to go about it.
A more efficient way might be to split on a regular expression instead. I think this works:
string splitlines = Regex.Split(File.ReadAllText(filename), "$|\\|");
I have to assume, based on the limited feedback, that this is what you're looking for:
string inputFile = filename;
string outputFile = Path.Combine( workingdirform2, "configuration.txt" );
using ( StreamReader inputFileStream = File.OpenText( inputFile ) )
{
using ( StreamWriter ouputFileStream = File.AppendText( outputFile ) )
{
// Iterate over the file contents to extract the prefix
string currentLine;
while ( ( currentLine = inputFileStream.ReadLine() ) != null )
{
// Notice the updated Regex - your's is a bit broken
string prefix = Regex.Match( currentLine, #"^(\S+?)\d+" ).Groups[1].Value;
ouputFileStream.WriteLine( prefix );
}
}
}
This would take a file full of:
Text1231|abc|abc
Text1232|abc|abc
Text1233|abc|abc
Text1234|abc|abc
and place:
Text
Text
Text
Text
into a new file.
I hope this, at least, gets you on the right path. My crystal ball is getting hazy.. haaazzzy..
Probably one of the best way to process text files in C# is to use fileHelpers. Give it a look. It allows you to strongly type your import data.

Categories

Resources