Replace text in file regardless of the end of the line

Replace text in file regardless of the end of the line - c#

I've been working on a tool to modify a text file to change graphics settings for a game. A few examples of the settings are as follows:
sg.ShadowQuality=0
ResolutionSizeX=1440
ResolutionSizeY=1080
bUseVSync=False
I want to be able to find sg.ShadowQuality=(rest of line, regardless of what is after this text), and replace it. This is so that a user can set this to say, 10 then 1 without having to check for 10 and 1 etc.
Basically, I'm try to find out what I need to use to find/replace a string in a text file without knowing the end of the string.
My current code looks like:
FileInfo GameUserSettings = new FileInfo(#SD + GUSDirectory);
GameUserSettings.IsReadOnly = false;
string text = File.ReadAllText(SD + GUSDirectory);
text = text.Replace("sg.ShadowQuality=0", "sg.ShadowQuality=" + Shadows.Value.ToString());
File.WriteAllText(SD + GUSDirectory, text);
text = text.Replace("sg.ShadowQuality=1", "sg.ShadowQuality=" + Shadows.Value.ToString());
File.WriteAllText(SD + GUSDirectory, text);
SD + GUSDirectory is the location of the text file.
The file must have readonly Off to be edited, otherwise the game can revert the settings back, hence the need for this.(It is turned back to readonly On after any change, its just not included in this code provided)

You can do it like you do, if you use a regular expression to match all the line
FileInfo gameUserSettings = new FileInfo(Path.Combine(#SD, GUSDirectory)); //name local varaible in camelCase, use Path.Combine to combine paths
gameUserSettings.IsReadOnly = false;
string text = File.ReadAllText(gameUserSettings.FullName); //use the fileinfo you just made rather than make the path again
text = Regex.Replace(text, "^sg[.]ShadowQuality=.*$", $"sg.ShadowQuality={Shadows.Value}", RegexOptions.Multiline); //note switch to interpolated strings
File.WriteAllText(gameUserSettings.FullName, text);
That regex is a Multiline one (so ^ and $ have altered meanings):
^sg[.]ShadowQuality=.*$
start of line ^ (not start of input)
followed by sg
followed by period . (in a character class it loses its "any character" meaning)
followed by ShadowQuality=
followed by any number of any character(.*)
followed by end of line $ (not end of input)
The vital bit is "any number of any character" that can cope with the vlaue in the file being 1, 2, 7, hello and so on..
The replacement is:
$"sg.ShadowQuality={Shadows.Value}"
This is an interpolated string; a neater way of representing strings that mix constant content (hardcoded chars) and variable content. When a $tring contains a { that "breaks out" of the string and back into normal c# code so you can write code that resolves to values that will be included in the string -> if Shadows.Value is for example a decimal? of 1.23 it will become 1.23
You can format data too; calling for $"to one dp is {Shadows.Value:F1}" would produce "to one dp is 1.2" - the 1.23 is formatted to 1 decimal place by the F1, just like calling Shadows.Value.ToString("F1") would

Related

How to write Regex pattern to extract output from SOX info in C#?

I have the following string (output from sox --info command):
Input File : 'C:\Users\source\repos\dotnetcore\audio\1000.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Duration : 00:05:11.64 = 13743363 samples = 23373.1 CDDA sectors
File Size : 27.5M
Bit Rate : 706k
Sample Encoding: 16-bit Signed Integer PCM
I need to extract the file path (without the single quote), channels, sample rate etc.
I have a method where I pass in the whole string (the output) and the property I want to extract. Like this:
private static string Extract(string inputStr, string property)
{
string pattern = string.Format(#"\s+{0}\s+: '?(.*)\r\n", property);
Match result = Regex.Match(inputStr, pattern);
if (result.Success)
{
return result.Groups[1].Value;
}
return string.Empty;
}
This almost returns what I need, except for the last single quote in the Input File. How do I not include that in the pattern
Extract(output, "Input File") //returns C:\Users\source\repos\dotnetcore\audio\1000.wav' --> How to remove the last single quote
Extract(output, "Channels") //returns 1 --> Good
Extract(output, "Sample Rate") // returns 44100 --> Good
I have tried these patterns also
\s+Input File\s+: '?(.*)'? //Still returns with the last single quote
\s+Input File\s+: '?(.*)'+ //This works for Input File but doesn't work for other properties

Edit: Based on the original author’s comment and my not inspecting all the lines closely enough
\s+{0}\s*: '?([^\r\n']*)'?
Original:
\s+{0}\s+: '?([^\r\n']*)'?
This is because * is greedy--it will keep pulling as many characters as it can. Because the ? allows 0 or 1 characters, it doesn't stop the * from continuing to pull in characters.

Try the following expression: \s*(?<name>[^:]+?)\s*:\s*(?<value>('[^']+')|.+)
See demo: https://regex101.com/r/w7b2oO/1
A couple of differences:
(?<name>...) gives the capture group a name so that you can reference it by name instead of index
('[^']+')|.+) makes captures values enclosed in a string ('[^']+') or (|) without (.+)

Character ä is represented in different Char Codes in the same string

I have a on web uploaded File Name "Schränke Wintsch.pdf".
The file Name is saved in a XML file like so:
<File>Schra?nke Wintsch.pdf</File>
If I debug this in c# and maunally add an ä, then it is saved correctly.
<File>Schra?nke Wintsch-ä.pdf</File>
OK I know it is an Encoding Problem.
But why is the same ä character represented with different char codes(example on Img 2)?

XML defines the encoding used within the document using the header. It will look something like this.. <?xml version="1.0" encoding="ISO-8859-9" ?>.
If you append the string make sure to use the same encoding to avoid a mismatch.
Test appending the char bytes and see if that helps.
var en = Encoding.GetEncoding("ISO-8859-9");
en.GetString(Encoding.GetBytes("ä"));

The original XML that you have is using the Unicode 'COMBINING DIAERESIS' code (int value 776) to use two characters to representä.
(Note how the combining character has been displayed as ? in the <File>Schra?nke Wintsch.pdf</File> image in your post.)
The 776 code says to put the double-dots above the previous character (an a).
However, where you typed in the ä it has been stored as the unicode character with code 228.
The question you need to answer is: Why is the original source XML using the "Combining Diaeresis" character rather than the more usual ä? (Without knowing the origin of the XML file, we cannot answer that question.)
Incidentally, you can "normalise" those sorts of characters by using string.Normalize(), as demonstrated by the following program:
using System;
namespace Demo
{
static class Program
{
static void Main()
{
char[] a = {(char)97, (char)776};
string s = new string(a);
Console.WriteLine(s + " -> " + s.Length); // Prints a¨ -> 2
var t = s.Normalize();
Console.WriteLine(t + " -> " + t.Length); // Prints ä -> 1
}
}
}
Note how the length of s is 2, but the length of t is only 1 (and it contains the single character ä).
So you might be able to improve things by using string.Normalize() to normalise these unexpected characters.

string.Normalize() ist the working solution for the string "Schränke Wintsch-ä.pdf". So it ist correctly saved as Schränke Wintsch-ä.pdf

Replace content in first set of quotes found in string c#

I am working on a project that involves having to manipulate a bat file based on certain user produced parameters. The bat files themselves are created manually, with a static format. A basic example of a bat file would be:
cd \some\predefined\bat
start run_some_script "user_generated_argument" [other pre-defined arguments]
The "user_generated_argument" bit of the bat file is manipulated in C# by the following code:
string bat_text = File.ReadAllText(bat_path);
Regex regex = new Regex("(.*?)\".*\"(.*)");
string new_argument = "A new argument";
string new_bat = regex.Replace(bat_text , "$1\"" + new_argument + "\"$2", 1);
And that would produce the following:
cd \some\predefined\bat
start run_some_script "A new argument" [other pre-defined arguments]
which is the expected output.
However, the problem lies when one of the other pre-defined arguments after the first quoted argument is also in quotes when that is the case, it seems that the second quoted argument disappears. For example, if the bat file looks like:
cd \some\predefined\bat
start run_some_script "user_generated_argument" "a_predefined_quoted_argument" [other pre-defined arguments]
Running the same C# code from above would produce the following:
cd \some\predefined\bat
start run_some_script "A new argument" [other pre-defined arguments]
The "a_predefined_quoted_argument" would no longer be in the string.
I may be doing this completely wrong. How would I make the predefined quoted argument not disappear?

the problem is that your expression
\".*\"
is eager or greedy, taking everything between the first quote and the last quote it finds. To make it lazy or reluctant, put a ? after the *
like so (I used VB, which escapes double quotes by double double quotes)
Dim batfile As String = "cd \some\predefined\bat" & vbCrLf & "start run_some_script ""user_generated_argument"" ""a_predefined_quoted_argument"" [other pre-defined arguments]"
Dim regex As Regex = New Regex("(.*?)"".*?""(.*)")
Dim new_argument As String = "A new argument"
Dim new_bat As String = regex.Replace(batfile, "$1""" + new_argument + """ $2", 1)
It will now take everything between the first quote, and the next quote.

Instead of using Regex you could also read the lines with File.ReadAllLines(), take the desired line and split it with string.Split() and replace them in that way.
Something like:
string[] lines = File.ReadAllLines(fileName);
string commandLine = lines.Where(d => d.StartsWith("start")).Single();
string[] arguments = commandLine.Split(' ');
foreach (var argument in arguments)
{
if (argument.StartsWith("\""))
{
// do your stuff and reassemble
}
}

Multiline C# interpolated string literal

C# 6 brings compiler support for interpolated string literals with syntax:
var person = new { Name = "Bob" };
string s = $"Hello, {person.Name}.";
This is great for short strings, but if you want to produce a longer string must it be specified on a single line?
With other kinds of strings you can:
var multi1 = string.Format(#"Height: {0}
Width: {1}
Background: {2}",
height,
width,
background);
Or:
var multi2 = string.Format(
"Height: {1}{0}" +
"Width: {2}{0}" +
"Background: {3}",
Environment.NewLine,
height,
width,
background);
I can't find a way to achieve this with string interpolation without having it all one one line:
var multi3 = $"Height: {height}{Environment.NewLine}Width: {width}{Environment.NewLine}Background: {background}";
I realise that in this case you could use \r\n in place of Environment.NewLine (less portable), or pull it out to a local, but there will be cases where you can't reduce it below one line without losing semantic strength.
Is it simply the case that string interpolation should not be used for long strings?
Should we just string using StringBuilder for longer strings?
var multi4 = new StringBuilder()
.AppendFormat("Width: {0}", width).AppendLine()
.AppendFormat("Height: {0}", height).AppendLine()
.AppendFormat("Background: {0}", background).AppendLine()
.ToString();
Or is there something more elegant?

You can combine $ and # together to get a multiline interpolated string literal:
string s =
$#"Height: {height}
Width: {width}
Background: {background}";
Source: Long string interpolation lines in C#6 (Thanks to #Ric for finding the thread!)

I'd probably use a combination
var builder = new StringBuilder()
.AppendLine($"Width: {width}")
.AppendLine($"Height: {height}")
.AppendLine($"Background: {background}");

Personally, I just add another interpolated string using string concatenation
For example
var multi = $"Height : {height}{Environment.NewLine}" +
$"Width : {width}{Environment.NewLine}" +
$"Background : {background}";
I find that is easier to format and read.
This will have additional overhead compared to using $#" " but only in the most performance critical applications will this be noticeable. In memory string operations are extremely cheap compared to data I/O. Reading a single variable from the db will take hundreds of times longer in most cases.

Since C# 11 you can do it like this (file multiline.cs):
using System;
public class Program
{
public static void Main()
{
var multiLineStr =
$$"""
{
"Color" : "Blue",
"Thickness" : {{1 + 1}}
}
""";
Console.WriteLine(multiLineStr);
}
}
Now the string variable multiLineStr contains:
{
"Color" : "Blue",
"Thickness" : 2
}
Explanation:
The string is now delimited by """, and interpolation is delimited by {{ and }} because there were two consecutive $ specified (you can add more $ or " if needed, but for the quotes you must use the same number for opening and closing it).
The indentation of the lines where you define the starting and ending quotes matters! If you indented it with different amount of tabs and/or spaces, the compiler might complain.
If required, you can have more than 3 quotes, e.g. """"". Please note that the number of opening and closing quotes must match (in this case, 5 opening and 5 closing double quotes are required to enclose a string).
With this new syntax, the # prefix is not needed because you don't need to escape double quotes inside of the string any more
This simplifies declaring multi-line strings a lot.
You can find a full documentation here # Microsoft.
Note: You can try it out in LinqPad 7 (in Preferences -> Query tab, make sure that "Enable C#/F# preview features" is enabled), in Visual Studio or on the command line. DotNetFiddle does not support the new syntax (yet).
To try it out on the command line, use the batch file CompileCS you can find in the link and invoke it like: compilecs /run multiline.cs (provided you have installed a recent version of Roslyn).

How can I determine the index of the same set of characters between two strings that are of different lengths?

I apologize up front for the title, I'm not sure how to word the question.
I am trying to find the index for a similar character or set of characters in two different, but similar strings.
String A: I <color=red><b>really</b></color> don't like spiders!
String B: I really don't like spiders!
The relevant text is the same, however A has some formatting while B does not. I got B by taking A and running a regex to find and replace all <contents> with an empty string.
Now lets say I have selected a character at an index of 9 in B, this would be the letter d in the word don't. How can I then determine in string A that the letter d in don't needs to also be selected which is at an index of 35 (if I counted correctly)?
Edit: Possibly important information, these tags are for the rich text within Unity. Very similar to HTML in almost all regards.

As I already suggested in the comments, you should write your own parser for this format that keeps the formatting as metadata next to the text. For example, you could keep a simple list of string parts where each part represents consecutive text with the same formatting.
You could start with something simplistic as this:
import re
def parse (string):
it = iter([None] + re.split('(<[^>]+>)', string))
parsed = []
curFormat = {}
for fmt, text in zip(it, it):
if fmt is None:
curFormat = {}
elif fmt.startswith('</'):
fmt = fmt[2:-1]
del curFormat[fmt]
else:
fmt = fmt[1:-1]
if '=' in fmt:
name, value = fmt.split('=', 1)
curFormat[name] = value
else:
curFormat[fmt] = True
if text != '':
parsed.append((text, list(curFormat.items())))
return parsed
For your text, this will give you the following result:
>>> text = "I <color=red><b>really</b></color> don't like spiders!"
>>> parsed = parse(text)
>>> parsed
[('I ', []), ('really', [('color', 'red'), ('b', True)]), (" don't like spiders!", [])]
As you can see, you get pairs of text, with a list of formatting information for that particular part of text. If you then want to get the underlying text, you can just iterate the first list elements:
>>> ''.join(t for t, fmt in parsed)
"I really don't like spiders!"
And on top of that, you can also create your own indexing method (note that this one is really crude):
def index (parsed, start, length):
output = ''
for t, fmt in parsed:
if start < 0:
output += t
elif start > len(t):
start -= len(t)
else:
output += t[start:]
start = -1
if len(output) > length:
return output[:length]
return output
>>> index(parsed, 4, 5)
'ally '
>>> index(parsed, 7, 6)
"y don'"
Finally, you can put this all inside a custom type, which implements the iterator protocol and the senquence protocol, so you can use it like a normal string.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replace text in file regardless of the end of the line - c#

Related

How to write Regex pattern to extract output from SOX info in C#?

Character ä is represented in different Char Codes in the same string

Replace content in first set of quotes found in string c#

Multiline C# interpolated string literal

How can I determine the index of the same set of characters between two strings that are of different lengths?

Categories

Resources