Issue with Regex Replace? [duplicate]

Issue with Regex Replace? [duplicate] - c#

I am using Regex to replace all the strings in a template. Everything works fine until there is a value I want to replace, which is $0.00. I can't seem to properly replace the $0 as replacement text. The output I am getting is "Project Cost: [[ProjectCost]].00". Any idea why?
Here is an example of the code with some simplified variables.
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.Security;
using System.Text.RegularExpressions;
namespace Export.Services
{
public class CommonExportService
{
private Dictionary<string, string> _formTokens;
public CommonExportService() {
_formTokens = {{"EstimatedOneTimeProjectCost", "0.00"}};
}
private string GetReplacementText(string replacementText)
{
replacementText = "Project Cost: [[EstimatedOneTimeProjectCost]]";
//replacement text = "Project Cost: [[ProjectCost]]"
foreach (var token in _formTokens)
{
var val = token.Value;
var key = token.Key;
//work around for now
//if (val.Equals("$0.00")) {
// val = "0.00";
//}
var reg = new Regex(Regex.Escape("[[" + key + "]]"));
if (reg.IsMatch(replacementText))
replacementText = reg.Replace(replacementText, SecurityElement.Escape(val ?? string.Empty));
else {
}
}
return replacementText;
//$0.00 does not replace, something is happening with the $0 before the decimal
//the output becomes Project Cost: [[EstimatedOneTimeProjectCost]].00
//The output is correct for these
//0.00 replaces correctly
//$.00 replaces correctly
//0 replaces correctly
//00 replaces correctly
//$ replaces correctly
}
}
}

Since your replacement string is built dynamically, you need to take care of the $ char in it. When $ is followed with 0, the $0 is a backreference to the whole match, so the whole match is inserted as a result of replacement.
You just need to dollar-escape the $ inside a literal string pattern:
return replacementText.replace("$", "$$");
Then, your replacement pattern will contain $$0, and that will "translate" into a literal $0 string.

Related

Remove characters from List<string> in between separators (from text file)

Fast way to replace text in text file.
From this: somename#somedomain.com:hello_world
To This: somename:hello_world
It needs to be FAST and support multiple lines of text file.
I tried spiting the string into three parts but it seems slow. Example in the code below.
<pre><code>
public static void Conversion()
{
List<string> list = File.ReadAllLines("ETU/Tut.txt").ToList();
Console.WriteLine("Please wait, converting in progress !");
foreach (string combination in list)
{
if (combination.Contains("#"))
{
write: try
{
using (StreamWriter sw = new
StreamWriter("ETU/UPCombination.txt", true))
{
sw.WriteLine(combination.Split('#', ':')[0] + ":"
+ combination.Split('#', ':')[2]);
}
}
catch
{
goto write;
}
}
else
{
Console.WriteLine("At least one line doesn't contain #");
}
}
}</code></pre>
So a fast way to convert every line in text file from
somename#somedomain.com:hello_world
To: somename:hello_world
then save it different text file.
!Remember the domain bit always changes!

Most likely not the fastest, but it is pretty fast with an expression similar to,
#[^:]+
and replace that with an empty string.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"#[^:]+";
string substitution = #"";
string input = #"somename#somedomain.com:hello_world1
somename#some_other_domain.com:hello_world2";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

How to extract name and version from string

I have many filenames such as:
libgcc1-5.2.0-r0.70413e92.rbt.xar
python3-sqlite3-3.4.3-r1.0.f25d9e76.rbt.xar
u-boot-signed-pad.bin-v2015.10+gitAUTOINC+1b6aee73e6-r0.02df1c57.rbt.xar
I need to reliably extract the name, version and "rbt" or "norbt" from this. What is the best way? I am trying regex, something like:
(?<fileName>.*?)-(?<version>.+).(rbt|norbt).xar
Issue is the file name and version both can have multiple semi colons. So I am not sure if there is an answer by I have two questions:
What is the best strategy to extract values such as these?
How would I be able to figure out which version is greater?
Expected output is:
libgcc1, 5.2.0-r0.70413e92, rbt
python3-sqlite3, 3.4.3-r1.0.f25d9e76, rbt
u-boot-signed-pad.bin, v2015.10+gitAUTOINC+1b6aee73e6-r0.02df1c57, rbt

This will give you what you want without using Regex:
var fileNames = new List<string>(){
"libgcc1-5.2.0-r0.70413e92.rbt.xar",
"python3-sqlite3-3.4.3-r1.0.f25d9e76.rbt.xar",
"u-boot-signed-pad.bin-v2015.10+gitAUTOINC+1b6aee73e6-r0.02df1c57.rbt.xar"
};
foreach(var file in fileNames){
var spl = file.Split('-');
string name = string.Join("-",spl.Take(spl.Length-2));
string versionRbt = string.Join("-",spl.Skip(spl.Length-2));
string rbtNorbt = versionRbt.IndexOf("norbt") > 0 ? "norbt" : "rbt";
string version = versionRbt.Replace($".{rbtNorbt}.xar","");
Console.WriteLine($"name={name};version={version};rbt={rbtNorbt}");
}
Output:
name=libgcc1;version=5.2.0-r0.70413e92;rbt=rbt
name=python3-sqlite3;version=3.4.3-r1.0.f25d9e76;rbt=rbt
name=u-boot-signed-pad.bin;version=v2015.10+gitAUTOINC+1b6aee73e6-r0.02df1c57;rbt=rbt
Edit:
Or using Regex:
var m = Regex.Match(file,#"^(?<fileName>.*)-(?<version>.+-.+)\.(rbt|norbt)\.xar$");
string name = m.Groups["fileName"].Value;
string version = m.Groups["version"].Value;
string rbtNorbt = m.Groups[1].Value;
The output will be the same. Both approaches assum that "version" has one -.

Tested following code and work perfectly with Regex. I used option Right-To-Left
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication107
{
class Program
{
static void Main(string[] args)
{
string[] inputs = {
"libgcc1-5.2.0-r0.70413e92.rbt.xar",
"python3-sqlite3-3.4.3-r1.0.f25d9e76.rbt.xar",
"u-boot-signed-pad.bin-v2015.10+gitAUTOINC+1b6aee73e6-r0.02df1c57.rbt.xar"
};
string pattern = #"(?'prefix'.+)-(?'middle'[^-][\w+\.]+-[\w+\.]+)\.(?'extension'[^\.]+).\.xar";
foreach (string input in inputs)
{
Match match = Regex.Match(input, pattern, RegexOptions.RightToLeft);
Console.WriteLine("prefix : '{0}', middle : '{1}', extension : '{2}'",
match.Groups["prefix"].Value,
match.Groups["middle"].Value,
match.Groups["extension"].Value
);
}
Console.ReadLine();
}
}
}

.net Regex Search and string.replace

my xml file is around 7mb . I have to remove some invalid characters from some of nodes. there are many nodes like "title" , "country" and so on ..
I am having 31000 matches for "title" node and it is taking more than 35 mins . which not acceptable for my project requirements , How can I optimise this
method call
fileText = RemoveInvalidCharacters(fileText, "title", #"(&#[xX]?[A-Fa-f\d]+;)|[^\w\s\/\;\&\.#-]", "$1");
Method definition
private static string RemoveInvalidCharacters(string fileText, string nodeName, string regexPattern, string regexReplacement)
{
foreach (Match match in Regex.Matches(fileText, #"<" + nodeName + ">(.*)</" + nodeName + ">"))
{
var oldValue = match.Groups[0].Value;
var newValue = "<" + nodeName + ">" + Regex.Replace(match.Groups[1].Value, regexPattern, regexReplacement) +
"</" + nodeName + ">";
fileText = fileText.Replace(oldValue, newValue);
}
return fileText;
}

Instead of using Regex to parse the Xml Document, you can use the tools in the System.Xml.Linq namespace to handle the parsing for you, which is inherently much faster and easier to use.
Here's an example program that takes a structure with 35,000 nodes in. I've kept your regex string to check for the bad characters, but I've specified it as a Compiled regex string, which should yield better performance, although admittedly, not a huge increase when I compared the two. More info.
This example uses Descendants, which gets references to all of the element you specify in the parameter within the element specified (in this case, we've started from the root element). Those results are filtered by the ContainsBadCharacters method.
For the sake of simplicity I haven't made the foreach loops DRY, but it's probably worth doing so.
On my machine, this runs in less than a second, but timings will vary based on machine performance and occurrences of bad characters.
using System;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Text.RegularExpressions;
using System.Xml.Linq;
namespace ConsoleApplication2
{
class Program
{
static Regex r = new Regex(#"(&#[xX]?[A-Fa-f\d]+;)|[^\w\s\/\;\&\.#-]", RegexOptions.Compiled);
static void Main(string[] args)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
var xmls = new StringBuilder("<Nodes>");
for(int i = 0;i<35000;i++)
{
xmls.Append(#"<Node>
<Title>Lorem~~~~</Title>
<Country>Ipsum!</Country>
</Node>");
}
xmls.Append("</Nodes>");
var doc = XDocument.Parse(xmls.ToString());
sw.Start();
foreach(var element in doc.Descendants("Title").Where(ContainsBadCharacters))
{
element.Value = r.Replace(element.Value, "$1");
}
foreach (var element in doc.Descendants("Country").Where(ContainsBadCharacters))
{
element.Value = r.Replace(element.Value, "$1");
}
sw.Stop();
var saveFile = new FileInfo(Path.Combine(Assembly.GetExecutingAssembly().Location.Substring(0,
Assembly.GetExecutingAssembly().Location.LastIndexOf(#"\")), "test.txt"));
if (!saveFile.Exists) saveFile.Create();
doc.Save(saveFile.FullName);
Console.WriteLine(sw.Elapsed);
Console.Read();
}
static bool ContainsBadCharacters(XElement item)
{
return r.IsMatch(item.Value);
}
}
}

Delimit a string by character unless within quotation marks C#

I need to demilitarise text by a single character, a comma. But I want to only use that comma as a delimiter if it is not encapsulated by quotation marks.
An example:
Method,value1,value2
Would contain three values: Method, value1 and value2
But:
Method,"value1,value2"
Would contain two values: Method and "value1,value2"
I'm not really sure how to go about this as when splitting a string I would use:
String.Split(',');
But that would demilitarise based on ALL commas. Is this possible without getting overly complicated and having to manually check every character of the string.
Thanks in advance

Copied from my comment: Use an available csv parser like VisualBasic.FileIO.TextFieldParser or this or this.
As requested, here is an example for the TextFieldParser:
var allLineFields = new List<string[]>();
string sampleText = "Method,\"value1,value2\"";
var reader = new System.IO.StringReader(sampleText);
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader))
{
parser.Delimiters = new string[] { "," };
parser.HasFieldsEnclosedInQuotes = true; // <--- !!!
string[] fields;
while ((fields = parser.ReadFields()) != null)
{
allLineFields.Add(fields);
}
}
This list now contains a single string[] with two strings. I have used a StringReader because this sample uses a string, if the source is a file use a StreamReader(f.e. via File.OpenText).

You can try Regex.Split() to split the data up using the pattern
",|(\"[^\"]*\")"
This will split by commas and by characters within quotes.
Code Sample:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "Method,\"value1,value2\",Method2";
string[] pieces = Regex.Split(data, ",|(\"[^\"]*\")").Where(exp => !String.IsNullOrEmpty(exp)).ToArray();
foreach (string piece in pieces)
{
Console.WriteLine(piece);
}
}
}
Results:
Method
"value1,value2"
Method2
Demo

C# using regex to replace value only after = sign

ok I have a text file that contains:
books_book1 = 1
books_book2 = 2
books_book3 = 3
I would like to retain "books_book1 = "
so far I have:
string text = File.ReadAllText("settings.txt");
text = Regex.Replace(text, ".*books_book1*.", "books_book1 = a",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book2*.", "books_book2 = b",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book3*.", "books_book3 = c",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
this results in:
books_book1 = a=1
output to file should be:
books_book1 = a
books_book2 = b
books_book3 = c
Thanks much in advance...

In a comment I stated:
"I would personally just go for recreating the file if it is that simple. Presumably you load all the values from the file into an object of some kind initially so just use that to recreate the file with the new values. Much easier than messing with Regularexpressions - its simpler and easier to test and see what is going on and easier to change if you ever need to."
I think having looked at this again it is even more true.
From what you said in comments: "when the program loads it reads the values from this text file, then the user has an option to change the values and save it back to the file". Presumably this means that you need to actually know which of the books1, books2, etc. lines you are replacing so you know which of the user supplied values to put in. This is fine (though a little unwieldy) with three items but if you increase that number then you'll need to update your code for every new item. This is never a good thing and will quickly produce some very horrendous looking code liable to get bugs in.
If you have your new settings in some kind of data structure (eg a dictionary) then as I say recreating the file from scratch is probably easiest. See for example this small fully contained code snippet:
//Set up our sample Dictionary
Dictionary<string, string> settings = new Dictionary<string,string>();
settings.Add("books_book1","a");
settings.Add("books_book2","b");
settings.Add("books_book3","c");
//Write the values to file via an intermediate stringbuilder.
StringBuilder sb = new StringBuilder();
foreach (var item in settings)
{
sb.AppendLine(String.Format("{0} = {1}", item.Key, item.Value));
}
File.WriteAllText("settings.txt", sb.ToString());
This has obvious advantages of being simpler and that if you add more settings then they will just go into the dictionary and you don't need to change the code.

I don't think this is the best way to solve the problem, but to make the RegEx do what you want you can do the following:
var findFilter = #"(.*books_book1\s*=\s)(.+)";
var replaceFilter = "${1}a"
text = Regex.Replace(text, findFilter, replaceFilter, RegexOptions.Multiline)
File.WriteLine("settings.txt", text);
....
The code between the ( and ) in the regex is in this case the first and only back reference capturing group and ${1} in the replace portion will use the matching group text to create the output you want. Also you'll notice I used \s for white space so you don't match book111 for example. I'm sure there are other edge cases you'll need to deal with.
books_book1 = a
...

Here's the start to a more generic approach:
This regular expression captures the last digit, taking care to account for variability in digit and whitespace length.
text = Regex.Replace(text , #"(books_book\d+\s*=\s*)(\d+)", DoReplace)
// ...
string DoReplace(Match m)
{
return m.Groups[1].Value + Convert.ToChar(int.Parse(m.Groups[2].Value) + 96);
}

How about something like this (no error checking):
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace TestRegex
{
class Program
{
static void Main( string[] args )
{
var path = #"settings.txt";
var pattern = #"(^\s*books_book\d+\s*=\s*)(\d+)(\s*)$";
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var contents = Regex.Replace( File.ReadAllText( path ), pattern, MyMatchEvaluator, options );
File.WriteAllText( path, contents );
}
static int x = char.ConvertToUtf32( "a", 0 );
static string MyMatchEvaluator( Match m )
{
var x1 = m.Groups[ 1 ].Value;
var x2 = char.ConvertFromUtf32( x++ );
var x3 = m.Groups[ 3 ].Value;
var result = x1 + x2 + x3;
return result;
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Issue with Regex Replace? [duplicate] - c#

Related

Remove characters from List<string> in between separators (from text file)

How to extract name and version from string

.net Regex Search and string.replace

Delimit a string by character unless within quotation marks C#

C# using regex to replace value only after = sign

Categories

Resources