Realtime Search and Replace - c#

I have a file which contains lots of numbers which I want to reduce to construct a new file. First, I extract all the text using File.ReadAllText, then I split and extract numbers from each line that contains a number which are separated by commas or spaces. After scan, I replace all occurrences of each found number with the new reduced number but the problem is that this method is error prone since some numbers get replaced more than once
Here's the code I'm using:
List<float> oPaths = new List<float>();
List<float> nPaths = new List<float>();
var far = File.ReadAllText("paths.js");
foreach(var s in far.Split('\n'))
{
//if it starts with this that means there are some numbers
if (s.StartsWith("\t\tpath:"))
{
var paths = s.Substring(10).Split(new[]{',', ' '});
foreach(var n in paths)
{
float di;
if(float.TryParse(n, out di))
{
if(oPaths.Contains(di)) break;
oPaths.Add(di);
nPaths.Add(di * 3/4);
}
}
}
}
//second iteration to replace old numbers with new ones
var ns = far;
for (int i = 0; i < oPaths.Count; i++)
{
var od = oPaths[i].ToString();
var nd = nPaths[i].ToString();
ns = ns.Replace(od, nd);
}
File.WriteAllText("npaths.js", ns);
As you can see, the above method is redundant as it does not replace the strings on real time. Maybe my head is full, but I'm just lost on how to go about this. Any ideas?
Thanks.

I think a regex can help here
string text = File.ReadAllText(file);
string newtext = Regex.Replace(text, #"\b(([0-9]+)?\.)?[0-9]+\b", m =>
{
float f;
if (float.TryParse(m.Value, NumberStyles.Float, CultureInfo.InvariantCulture, out f)) f *= 3.0f / 4;
return f.ToString();
});
File.WriteAllText(file, newtext);

Just after typing the question I realized the answer was to iterate character by character and replace accordingly. Here's the code I used to get this to work:
string nfar = "";
var far = File.ReadAllText("paths.js");
bool neg = false;
string ccc = "";
for(int i = 0; i < far.Length; i++)
{
char c = far[i];
if (Char.IsDigit(c) || c == '.')
{
ccc += c;
if (far[i + 1] == ' ' || far[i + 1] == ',')
{
ccc = neg ? "-" + ccc : ccc;
float di;
if (float.TryParse(ccc, out di))
{
nfar += (di*0.75f).ToString();
ccc = "";
neg = false;
}
}
}
else if (c == '-')
{
neg = true;
}
else
{
nfar += c;
}
}
File.WriteAllText("nfile.js", nfar);
Comments and/or optimization suggestions are welcome.

Related

Delete part of string value

I want to mix 2 string in 1 randomly using foreach but I don't know how I delete the part I used on the string for the foreach like:
string s = "idontknow";
string sNew = "";
foreach(char ss in s){
s = s + ss;
ss.Delete(s); //don't exist
}
Full code here i'm trying to do:
do
{
if (state == 0)
{
for (int i = 0; random.Next(1, 5) > variable.Length; i++)
{
foreach (char ch in variable)
{
fullString = fullString + ch;
}
}
state++;
}
else if (state == 1)
{
for (int i = 0; random.Next(1, 5) > numbers.Length; i++)
{
foreach (char n in numbers)
{
fullString = fullString + n;
}
}
state--;
}
} while (variable.Length != 0 && numbers.Length != 0);
I'm pretty confident, that in your first code snippet, you are creating an infinite loop, since you are appending the used char back to the string while removing it from the first position.
Regarding your specification to shuffle two stings together, this code sample might do the job:
public static string ShuffleStrings(string s1, string s2){
List<char> charPool = new();
foreach (char c in s1) {
charPool.Add(c);
}
foreach (char c in s2) {
charPool.Add(c);
}
Random rand = new();
char[] output = new char[charPool.Count];
for(int i = 0; i < output.Length; i++) {
int randomIndex = rand.Next(0, charPool.Count);
output[i] = charPool[randomIndex];
charPool.RemoveAt(randomIndex);
}
return new string(output);
}
In case you just want to shuffle one string into another string, just use an empty string as the first or second parameter.
Example:
string shuffled = ShuffleStrings("TEST", "string");
Console.WriteLine(shuffled);
// Output:
// EgsTtSnrTi
There are possibly other solutions, which are much shorter, but I think this code is pretty easy to read and understand.
Concerning the performance, the code above should works both for small stings and large strings.
Since strings are immutable, each modify-operation on any string, e.g. "te" + "st" or "test".Replace("t", ""), will allocate and create a new string in the memory, which is - in a large scale - pretty bad.
For that very reason, I initialized a char array, which will then be filled.
Alternatively, you can use:
using System.Text;
StringBuilder sb = new();
// append each randomly picked char
sb.Append(c);
// Create a string from appended chars
sb.ToString();
And if your question was just how to remove the first char of a string:
string myStr = "Test";
foreach (char c in myStr) {
// do with c whatever you want
myStr = myStr[1..]; // assign a substring exluding first char (start at index 1)
Console.WriteLine($"c = {c}; myStr = {myStr}");
}
// Output:
// c = T; myStr = est
// c = e; myStr = st
// c = s; myStr = t
// c = t; myStr =

C# Remove characters from string between /**/

In string will be something like this:
for(I=1;I<10;I++) /*Else*/ {A = "for"; B = 'c'; break;} // while(a < 10)
I would like to remove from this string anything that is between /**/ and between "" and between '' and anything after //
Here is example
input:
for(I=1;I<10;I++) /*Else*/ {A = "for"; B = 'c'; break;} // while(a < 10)
output:
FOR(i=1;i<10;i++) /**/ {a = ""; b = ''; BREAK;} //
I know that I have to go through characters in string with:
for (int i = 0; i < input.Length; i++)
{
// search for /**/ ?
}
but i don't know how should I remove characters and put the other characters in new string.
string sentence = "for(I=1;I<10;I++) /*Else*/ {A = "for"; B = 'c'; break;} // while(a < 10)";
//how can I remove these characters from string so it will look something like this?
string shortSentence = "FOR(i=1;i<10;i++) /**/ {a = ""; b = ''; BREAK;} //";
Try this. The idea is to first remove the stuff at the end, the // comments. Then afterwards, we search for /* and the next /, remove that text (we are left with /**/, save the index of that string and remove it as well. Once we have no more / strings, we reinsert the /**/ into the positions we removed them from.
NOTE: This won't work if you have something like this /* comment /* more comment */ and some other part */. But this also isn't a valid comment in C#, which is the language you tagged here.
static string s = "for(int i = 0; i < 10; i++){ var myVar = \"Test\"; /*commented out code*/ Console.WriteLine(\"stuff\"); /* more comments here*/}//my comments here \n var myOverVar = /*more stuff removed*/ true; //some more comments";
static void Main(string[] args)
{
//s.IndexOf()
var result = new List<string>();
var lines = s.Split(new[] {"\n"}, StringSplitOptions.RemoveEmptyEntries); //use this if you have multiple code lines, separated by new lines.
foreach (var line in lines)
{
var listOfPositions = new List<int>();
var l = line;
//chop off everything after comments
var indexOfLineComment = l.IndexOf("//");
l = l.Remove(indexOfLineComment + 2); // 2 because // is two characters long
var openBraceIndex = l.IndexOf("/*");
while (openBraceIndex != -1) //-1 indicates that we didn't find /*
{
var closingBraceIndex = l.IndexOf("*/");
if (closingBraceIndex == -1)
{
break; //you didn't specify how to the case when an error in syntax was made, but handle it here
}
l = l.Remove(openBraceIndex + 2, closingBraceIndex - openBraceIndex-2);
var ind = l.IndexOf("/**/");
listOfPositions.Insert(0, ind);
l = l.Remove(ind, 4);
openBraceIndex = l.IndexOf("/*");
}
foreach (var i in listOfPositions)
if (l.Length <= i)
l = l + "/**/";
else
l = l.Insert(i, "/**/");
result.Add(l);
}
}

string.Format to display only first n (2) digits in a number

I have a mouse move coordinate,
For example:
s = string.Format("{0:D4},{1:D4}", nx, ny);
the result s is "0337,0022"
the question is how to show only two digits in front only?
I would like to get:
s is "03,00"
Here is another example:
s = "0471,0306"
I want to be:
s = "04,03"
and when the coordinate is "-"
example
s = "-0471,0306"
I want to be:
s = "-04,03"
s =string.Format("{0},{1}",
string.Format("{0:D4}", nx).Substring(0,2),
string.Format("{0:D4}", ny).Substring(0,2));
Just split the string on the comma and then sub-string the first two characters of each portion, like this:
string result = String.Empty;
string s = String.Format("{0:D4},{1:D4}", nx, ny);
string[] values = s.Split(',');
int counter = 0;
foreach (string val in values)
{
StringBuilder sb = new StringBuilder();
int digitsCount = 0;
// Loop through each character in string and only keep digits or minus sign
foreach (char theChar in val)
{
if (theChar == '-')
{
sb.Append(theChar);
}
if (Char.IsDigit(theChar))
{
sb.Append(theChar);
digitsCount += 1;
}
if (digitsCount == 2)
{
break;
}
}
result += sb.ToString();
if (counter < values.Length - 1)
{
result += ",";
}
counter += 1;
}
Note: This will work for any amount of comma separated values you have in your s string.
Assuming that nx and ny are integers
s = nx.ToString("D4").Substring(0,2) // leftmost digits
+ ny.ToString("D4").Substring(0,2) // leftmost digits
"D4" ensure the size of the string that must be enought for substring boundaries
I'd do it this way:
Func<int, string> f = n => n.ToString("D4").Substring(0, 2);
var s = string.Format("{0},{1}", f(nx), f(ny));
Check the number before you use Substring.
var s1 = nx.ToString();
var s2 = ny.ToString();
// Checks if the number is long enough
string c1 = (s1.Count() > 2) ? s1.Substring(0, 2) : s1;
string c2 = (s2.Count() > 2) ? s2.Substring(0, 2) : s2;
Console.WriteLine("{0},{1}",c1,c2);

How to get [x] and [x+1] into a string then skip to [x+2]

Input : 1 ; a ; 2; b; 3;c;4;d;5;e;6;f
output I'm getting : 1a ; a2;b3;c4;d5;e6
output I want: 1a ; 2b ; 3c; 4d ; 5e; 6f
I know this is a simple thing but I just can't seem to get my damn head around it....
Heres my code:
for (int x = 0; x < coll.Count; x++)
{
if (x == 0)
{
line.Append(coll[x].ToString());
line.AppendLine(coll[x + 1].ToString());
}
else
{
if (x % 2 == 0)
{
}
else
{
try
{
line.Append(coll[x].ToString());
line.AppendLine(coll[x + 1].ToString());
x++;
textBox1.Text = line.ToString();
}
catch { }
}
}
If you want to keep the code the way it is (I'm assuming something will go in the empty conditional), then you just need to change if (x % 2 == 0) to if (x % 2 != 0) (or equally if (x % 2 == 1)), as your code is currently appending to the line when i = 0, then 1, 3,... i.e. all odd numbered indices, whereas you need to be appending to the line at all even numbered indices.
(Unfortunately I can't edit your question, but if you just stick four spaces in front of the line starting with for then it should be formatted correctly.)
If your list is like = [1,a,2,b,3,c,4,d,5,e,6,f]
Try this;
String line = "";
for(int i=0;i<list.size();i+2){
line += list.get(i)+list.get(i+1);
}
textBox1.Text = line;
EDIT
And if you want semi colons;
I edited like
String line = "";
for(int i=0;i<list.size();i+2){
line += list.get(i)+list.get(i+1);
if(i != list.size() - 2){
line+=";";
}
}
textBox1.Text = line;
One line code with LINQ (but not so efficient):
string[] source = { "1", "a", "2", "b", "3", "c" };
var result = source.Zip(source.Skip(1), (s1, s2) => s1 + s2)
.Where((s, i) => i % 2 == 0);
string[] arrayResult = result.ToArray();
string stringResultWithSeperator = string.Join(";", result);
Another LINQy solution:
string input = "1;a;2;b;3;c;4;d;5;e;6;f";
var split = input.Split(';');
string rejoined = String.Join(";", Pairs(split));
Where Pairs is
IEnumerable<string> Pairs(IEnumerable<string> strings)
{
if (strings.Take(1).Count() == 0)
{
return new string[]{};
}
return new [] {String.Join("", strings.Take(2))}.Concat(Pairs(strings.Skip(2)));
}

C#, regular expressions : how to parse comma-separated values, where some values might be quoted strings themselves containing commas

In C#, using the Regex class, how does one parse comma-separated values, where some values might be quoted strings themselves containing commas?
using System ;
using System.Text.RegularExpressions ;
class Example
{
public static void Main ( )
{
string myString = "cat,dog,\"0 = OFF, 1 = ON\",lion,tiger,'R = red, G = green, B = blue',bear" ;
Console.WriteLine ( "\nmyString is ...\n\t" + myString + "\n" ) ;
Regex regex = new Regex ( "(?<=,(\"|\')).*?(?=(\"|\'),)|(^.*?(?=,))|((?<=,).*?(?=,))|((?<=,).*?$)" ) ;
Match match = regex.Match ( myString ) ;
int j = 0 ;
while ( match.Success )
{
Console.WriteLine ( j++ + " \t" + match ) ;
match = match.NextMatch() ;
}
}
}
Output (in part) appears as follows:
0 cat
1 dog
2 "0 = OFF
3 1 = ON"
4 lion
5 tiger
6 'R = red
7 G = green
8 B = blue'
9 bear
However, desired output is:
0 cat
1 dog
2 0 = OFF, 1 = ON
3 lion
4 tiger
5 R = red, G = green, B = blue
6 bear
Try with this Regex:
"[^"\r\n]*"|'[^'\r\n]*'|[^,\r\n]*
Regex regexObj = new Regex(#"""[^""\r\n]*""|'[^'\r\n]*'|[^,\r\n]*");
Match matchResults = regexObj.Match(input);
while (matchResults.Success)
{
Console.WriteLine(matchResults.Value);
matchResults = matchResults.NextMatch();
}
Ouputs:
cat
dog
"0 = OFF, 1 = ON"
lion
tiger
'R = red, G = green, B = blue'
bear
Note: This regex solution will work for your case, however I recommend you to use a specialized library like FileHelpers.
Why not heed the advice from the experts and Don't roll your own CSV parser.
Your first thought is, "I need to handle commas inside of quotes."
Your next thought will be, "Oh, crap, I need to handle quotes inside of quotes. Escaped quotes. Double quotes. Single quotes..."
It's a road to madness. Don't write your own. Find a library with an extensive unit test coverage that hits all the hard parts and has gone through hell for you. For .NET, use the free and open source FileHelpers library.
it's not a regex, but I've used Microsoft.VisualBasic.FileIO.TextFieldParser to accomplish this for csv files. yes, it might feel a little strange adding a reference to Microsoft.VisualBasic in a C# app, maybe even a little dirty, but hey it works.
Ah, RegEx. Now you have two problems. ;)
I'd use a tokenizer/parser, since it is quite straightforward, and more importantly, much easier to read for later maintenance.
This works, for example:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;
class Program
{
static void Main(string[] args)
{
string myString = "cat,dog,\"0 = OFF, 1 = ON\",lion,tiger,'R = red, G = green, B = blue',bear";
Console.WriteLine("\nmyString is ...\n\t" + myString + "\n");
CsvParser parser = new CsvParser(myString);
Int32 lineNumber = 0;
foreach (string s in parser)
{
Console.WriteLine(lineNumber + ": " + s);
}
Console.ReadKey();
}
}
internal enum TokenType
{
Comma,
Quote,
Value
}
internal class Token
{
public Token(TokenType type, string value)
{
Value = value;
Type = type;
}
public String Value { get; private set; }
public TokenType Type { get; private set; }
}
internal class StreamTokenizer : IEnumerable<Token>
{
private TextReader _reader;
public StreamTokenizer(TextReader reader)
{
_reader = reader;
}
public IEnumerator<Token> GetEnumerator()
{
String line;
StringBuilder value = new StringBuilder();
while ((line = _reader.ReadLine()) != null)
{
foreach (Char c in line)
{
switch (c)
{
case '\'':
case '"':
if (value.Length > 0)
{
yield return new Token(TokenType.Value, value.ToString());
value.Length = 0;
}
yield return new Token(TokenType.Quote, c.ToString());
break;
case ',':
if (value.Length > 0)
{
yield return new Token(TokenType.Value, value.ToString());
value.Length = 0;
}
yield return new Token(TokenType.Comma, c.ToString());
break;
default:
value.Append(c);
break;
}
}
// Thanks, dpan
if (value.Length > 0)
{
yield return new Token(TokenType.Value, value.ToString());
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
internal class CsvParser : IEnumerable<String>
{
private StreamTokenizer _tokenizer;
public CsvParser(Stream data)
{
_tokenizer = new StreamTokenizer(new StreamReader(data));
}
public CsvParser(String data)
{
_tokenizer = new StreamTokenizer(new StringReader(data));
}
public IEnumerator<string> GetEnumerator()
{
Boolean inQuote = false;
StringBuilder result = new StringBuilder();
foreach (Token token in _tokenizer)
{
switch (token.Type)
{
case TokenType.Comma:
if (inQuote)
{
result.Append(token.Value);
}
else
{
yield return result.ToString();
result.Length = 0;
}
break;
case TokenType.Quote:
// Toggle quote state
inQuote = !inQuote;
break;
case TokenType.Value:
result.Append(token.Value);
break;
default:
throw new InvalidOperationException("Unknown token type: " + token.Type);
}
}
if (result.Length > 0)
{
yield return result.ToString();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Just adding the solution I worked on this morning.
var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");
foreach (Match m in regex.Matches("<-- input line -->"))
{
var s = m.Value;
}
As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.
This is still a work in progress, but it happily parses CSV strings like:
2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D
CSV is not regular. Unless your regex language has sufficient power to handle the stateful nature of csv parsing (unlikely, the MS one does not) then any pure regex solution is a list of bugs waiting to happen as you hit a new input source that isn't quite handled by the last regex.
CSV reading is not that complex to write as a state machine since the grammar is simple but even so you must consider: quoted quotes, commas within quotes, new lines within quotes, empty fields.
As such you should probably just use someone else's CSV parser. I recommend CSVReader for .Net
Function:
private List<string> ParseDelimitedString (string arguments, char delim = ',')
{
bool inQuotes = false;
bool inNonQuotes = false; //used to trim leading WhiteSpace
List<string> strings = new List<string>();
StringBuilder sb = new StringBuilder();
foreach (char c in arguments)
{
if (c == '\'' || c == '"')
{
if (!inQuotes)
inQuotes = true;
else
inQuotes = false;
}else if (c == delim)
{
if (!inQuotes)
{
strings.Add(sb.Replace("'", string.Empty).Replace("\"", string.Empty).ToString());
sb.Remove(0, sb.Length);
inNonQuotes = false;
}
else
{
sb.Append(c);
}
}
else if ( !char.IsWhiteSpace(c) && !inQuotes && !inNonQuotes)
{
if (!inNonQuotes) inNonQuotes = true;
sb.Append(c);
}
}
strings.Add(sb.Replace("'", string.Empty).Replace("\"", string.Empty).ToString());
return strings;
}
Usage
string myString = "cat,dog,\"0 = OFF, 1 = ON\",lion,tiger,'R = red, G = green, B = blue',bear, text";
List<string> strings = ParseDelimitedString(myString);
foreach( string s in strings )
Console.WriteLine( s );
Output:
cat
dog
0 = OFF, 1 = ON
lion
tiger
R = red, G = green, B = blue
bear
text
I found a few bugs in that version, for example, a non-quoted string that has a single quote in the value.
And I agree use the FileHelper library when you can, however that library requires you know what your data will look like... I need a generic parser.
So I've updated the code to the following and thought I'd share...
static public List<string> ParseDelimitedString(string value, char delimiter)
{
bool inQuotes = false;
bool inNonQuotes = false;
bool secondQuote = false;
char curQuote = '\0';
List<string> results = new List<string>();
StringBuilder sb = new StringBuilder();
foreach (char c in value)
{
if (inNonQuotes)
{
// then quotes are just characters
if (c == delimiter)
{
results.Add(sb.ToString());
sb.Remove(0, sb.Length);
inNonQuotes = false;
}
else
{
sb.Append(c);
}
}
else if (inQuotes)
{
// then quotes need to be double escaped
if ((c == '\'' && c == curQuote) || (c == '"' && c == curQuote))
{
if (secondQuote)
{
secondQuote = false;
sb.Append(c);
}
else
secondQuote = true;
}
else if (secondQuote && c == delimiter)
{
results.Add(sb.ToString());
sb.Remove(0, sb.Length);
inQuotes = false;
}
else if (!secondQuote)
{
sb.Append(c);
}
else
{
// bad,as,"user entered something like"this,poorly escaped,value
// just ignore until second delimiter found
}
}
else
{
// not yet parsing a field
if (c == '\'' || c == '"')
{
curQuote = c;
inQuotes = true;
inNonQuotes = false;
secondQuote = false;
}
else if (c == delimiter)
{
// blank field
inQuotes = false;
inNonQuotes = false;
results.Add(string.Empty);
}
else
{
inQuotes = false;
inNonQuotes = true;
sb.Append(c);
}
}
}
if (inQuotes || inNonQuotes)
results.Add(sb.ToString());
return results;
}
since this question: Regex to to parse csv with nested quotes
reports here and is much more generic, and since a RegEx is not really the proper way to solve this problem (i.e. I have had many issues with catastrophic backtracking (http://www.regular-expressions.info/catastrophic.html)
here is a simple parser implementation in Python as well
def csv_to_array(string):
stack = []
match = []
matches = []
for c in string:
# do we have a quote or double quote?
if c == "\"":
# is it a closing match?
if len(stack) > 0 and stack[-1] == c:
stack.pop()
else:
stack.append(c)
elif (c == "," and len(stack) == 0) or (c == "\n"):
matches.append("".join(match))
match = []
else:
match.append(c)
return matches

Categories

Resources