How can i delete space from text file and replace it semicolon? - c#

I have this data into the test text file:
behzad razzaqi xezerlooot abrizii ast
i want delete space and replace space one semicolon character,write this code in c# for that:
string[] allLines = File.ReadAllLines(#"d:\test.txt");
using (StreamWriter sw = new StreamWriter(#"d:\test.txt"))
{
foreach (string line in allLines)
{
if (!string.IsNullOrEmpty(line) && line.Length > 1)
{
sw.WriteLine(line.Replace(" ", ";"));
}
}
}
MessageBox.Show("ok");
behzad;;razzaqi;;xezerlooot;;;abrizii;;;;;ast
but i want one semicolon in space.how can i solve that?

Regex is an option:
string[] allLines = File.ReadAllLines(#"d:\test.txt");
using (StreamWriter sw = new StreamWriter(#"d:\test.txt"))
{
foreach (string line in allLines)
{
if (!string.IsNullOrEmpty(line) && line.Length > 1)
{
sw.WriteLine(Regex.Replace(line,#"\s+",";"));
}
}
}
MessageBox.Show("ok");

Use this code:
string[] allLines = File.ReadAllLines(#"d:\test.txt");
using (StreamWriter sw = new StreamWriter(#"d:\test.txt"))
{
foreach (string line in allLines)
{
string[] words = line.Split(" ", StringSplitOptions.RemoveEmptyEntries);
string joined = String.Join(";", words);
sw.WriteLine(joined);
}
}

You need to use a regular expression:
(\s\s+)
Usage
var input = "behzad razzaqi xezerlooot abrizii ast";
var pattern = "(\s\s+)";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, ';');

You can do that with a regular expression.
using System.Text.RegularExpressions;
and:
string pattern = "\\s+";
string replacement = ";";
Regex rgx = new Regex(pattern);
sw.WriteLine(rgx.Replace(line, replacement));
This regular expression matches any series of 1 or more spaces and replaces the entire series with a semicolon.

you can try this
Regex r=new Regex(#"\s+");
string result=r.Replace("YourString",";");
\s+ is for matching all spaces. + is for one or more occurrences.
for more information on regular expression see http://www.w3schools.com/jsref/jsref_obj_regexp.asp

You should check a string length after replacement, not before ;-).
const string file = #"d:\test.txt";
var result = File.ReadAllLines(file).Select(line => Regex.Replace(line, #"\s+", ";"));
File.WriteAllLines(file, result.Where(line => line.Length > 1));
...and don't forget, that for input hello you will get ;hello;.

Related

C# Regular expression To replace all matches in the string

I Have one text file and I want to replaces all matches in each line, so I defined Pattern and I loop through to the text file after I want to write the result in another file, unfortunately my pattern is only replace first occurrence of the word what did |I do in a wrong way?
Content of text file:
"testebook kok o testebook\ntestbbb1232 joj ds testbbb1232"
using System.Text.RegularExpressions;
string filePath = "test.txt";
string fileNewPath = "test1.txt";
string ma = #"^test[0-9a-zA-Z]+";
string newString = string.Empty;
using(StreamReader sr = new(filePath)){
string line = sr.ReadLine();
while (line != null){
while(Regex.IsMatch(line, ma) != false){
line = Regex.Replace(line, ma, "");
}
newString += line + "\n";
line = sr.ReadLine();
}
}
using(StreamWriter sw = new(fileNewPath)){
sw.WriteLine(newString);
}
Your code is correct but your regex pattern is not correct.
you should write this:
string ma = #"test[0-9a-zA-Z]+";
The letter "^" has removed from pattern
So I modified My pattern and remove start with character and everything works now as desired
using System.Text.RegularExpressions;
string filePath = "test.txt";
string fileNewPath = "test1.txt";
MatchesFinder test = new(filePath, fileNewPath);
test.RunTheProcess();
class MatchesFinder{
private string filePath;
private string fileNewPath;
private string ma = #"test[a-zA-Z0-9]+";
public MatchesFinder(string filePath,string fileNewPath){
this.filePath = filePath;
this.fileNewPath = fileNewPath;
}
public void RunTheProcess(){
string newString = string.Empty;
using(StreamReader sr = new(filePath)){
string line = sr.ReadLine();
while (line != null){
while(Regex.IsMatch(line, ma) != false){
line = Regex.Replace(line, ma, string.Empty);
}
newString += line.TrimStart() + "\n";
line = sr.ReadLine();
}
}
using(StreamWriter sw = new(fileNewPath)){
sw.WriteLine(newString);
}
}
}
I think you don´t need to check IsMatch separately, just calling Regex.Replace should yield the same result.
Also, newString += line.TrimStart() + "\n"; means you´re copying all the lines you´ve already checked every time you append a new line. I´d either write directly to the output stream or at least use a StringBuilder if you really want to have the full file in memory for some reason.
Something like this:
using var sw = new StreamWriter(fileNewPath);
using var sr = new StreamReader(filePath);
var line = sr.ReadLine();
while (line != null){
line = Regex.Replace(line, ma, string.Empty);
sw.WriteLine(line.TrimStart());
line = sr.ReadLine();
}

Removing special characters from a string with RegEx

Am reading a text file that contains words, numbers and special characters, I want to remove certain special characters like: [](),'
I have this code but it is not working !
using (var reader = new StreamReader ("C://Users//HP//Documents//result2.txt")) {
string line = reader.ReadToEnd ();
Regex rgx = new Regex ("[^[]()',]");
string res = rgx.Replace (line, "");
Message1.text = res;
}
what am I missing, thanks
Some of the characters in your Regex, specifically [ ] ( ) ^, hold special meaning in Regex and in order to use them literally they must be escaped.
Use the following properly escaped Regex:
Regex rgx = new Regex (#"[\^\[\]\(\)',]");
Note that it is necessary to use the # verbatim string, because we don't want to escape these characters from the string, only from the Regex.
Alternatively, double escape the backslashes:
Regex rgx = new Regex ("[\\^\\[\\]\\(\\)',]");
But that's less readable in this case.
You could skip Regex and just maintain a list of characters you want to remove and then replace the old fashioned way:
string[] specialCharsToRemove = new [] { "[", "]", "(", ")", "'", "," };
using (var reader = new StreamReader ("C://Users//HP//Documents//result2.txt"))
{
string line = reader.ReadToEnd();
foreach(string s in specialCharsToRemove)
{
line = line.Replace(s, string.Empty);
}
Message1.text = res;
}
Ideally this would be in its own method, something like this:
private static string RemoveCharacters(string input, string[] specialCharactersToRemove)
{
foreach(string s in specialCharactersToRemove)
{
input = input.Replace(s, string.Empty);
}
return input;
}
I made a fiddle here
Replace them one at a time with String.Replace:
using (var reader = new StreamReader ("C://Users//HP//Documents//result2.txt"))
{
string line = reader.ReadToEnd ();
string res = line.Replace(line, "[", "");
res = res.Replace(line, "]", "");
res = res.Replace(line, "(", "");
res = res.Replace(line, ")", "");
res = res.Replace(line, "'", "");
res = res.Replace(line, ",", "");
Message1.text = res;
}
I agree with avoiding regex for this, but I would not use string.Replace multiple times, either.
Consider implementing a Replace or Remove method that accepts an array of characters to replace, and scan the input string only once. For example:
var builder = new StringBuilder();
foreach (char ch in input)
{
if (!chars.Contains(ch))
{
builder.Append(ch):
}
}
return builder.ToString();

Search for some phrases in a text file using Regex C#

The task:
Write a program, which counts the phrases in a text file. Any sequence of characters could be given as phrase for counting, even sequences containing separators. For instance in the text "I am a student in Sofia" the phrases "s", "stu", "a" and "I am" are found respectively 2, 1, 3 and 1 times.
I know the solution with string.IndexOf or with LINQ or with some type of algorithm like Aho-Corasick. I want to do same thing with Regex.
This is what I've done so far:
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
namespace CountThePhrasesInATextFile
{
class Program
{
static void Main(string[] args)
{
string input = ReadInput("file.txt");
input.ToLower();
List<string> phrases = new List<string>();
using (StreamReader reader = new StreamReader("words.txt"))
{
string line = reader.ReadLine();
while (line != null)
{
phrases.Add(line.Trim());
line = reader.ReadLine();
}
}
foreach (string phrase in phrases)
{
Regex regex = new Regex(String.Format(".*" + phrase.ToLower() + ".*"));
int mathes = regex.Matches(input).Count;
Console.WriteLine(phrase + " ----> " + mathes);
}
}
private static string ReadInput(string fileName)
{
string output;
using (StreamReader reader = new StreamReader(fileName))
{
output = reader.ReadToEnd();
}
return output;
}
}
}
I know my regular expression is incorrect but I don't know what to change.
The output:
Word ----> 2
S ----> 2
MissingWord ----> 0
DS ----> 2
aa ----> 0
The correct output:
Word --> 9
S --> 13
MissingWord --> 0
DS --> 2
aa --> 3
file.txt contains:
Word? We have few words: first word, second word, third word.
Some passwords: PASSWORD123, #PaSsWoRd!456, AAaA, !PASSWORD
words.txt contains:
Word
S
MissingWord
DS
aa
You need to post the file.txt contents first, otherwise it's difficult to verify if the regex is working correctly or not.
That being said, check out the Regex answer here:
Finding ALL positions of a substring in a large string in C#
and see if that helps with your code in the mean time.
edit:
So there's a simple solution, add "(?=(" and "))" to each of your phrases. This is a lookahead assertion in regex. The following code handles what you want.
foreach (string phrase in phrases) {
string MatchPhrase = "(?=(" + phrase.ToLower() + "))";
int mathes = Regex.Matches(input, MatchPhrase).Count;
Console.WriteLine(phrase + " ----> " + mathes);
}
You also had an issue with
input.ToLower();
which should be instead
input = input.ToLower();
as strings in c# are immutable. In total, your code should be:
static void Main(string[] args) {
string input = ReadInput("file.txt");
input = input.ToLower();
List<string> phrases = new List<string>();
using (StreamReader reader = new StreamReader("words.txt")) {
string line = reader.ReadLine();
while (line != null) {
phrases.Add(line.Trim());
line = reader.ReadLine();
}
}
foreach (string phrase in phrases) {
string MatchPhrase = "(?=(" + phrase.ToLower() + "))";
int mathes = Regex.Matches(input, MatchPhrase).Count;
Console.WriteLine(phrase + " ----> " + mathes);
}
Thread.Sleep(50000);
}
private static string ReadInput(string fileName) {
string output;
using (StreamReader reader = new StreamReader(fileName)) {
output = reader.ReadToEnd();
}
return output;
}
here is what happened. I am going to use Word as example.
the regex you built for "word" is ".word.". It is telling regex to match anything starts with anything, contains "word" and ends with anything.
for your input, it matched
Word? We have few words: first word, second word, third word.
which starts with "Word? We have few words: first" and ends with ", second word, third word."
then second line starts with "Some pass" contains "word" and ends with ": PASSWORD123, #PaSsWoRd!456, AAaA, !PASSWORD"
so the count is 2
the regex you want is simple, string "word" is sufficient.
Update:
for ignore case pattern try "(?i)word"
And for the multiple matches within AAaA, try "(?i)(?<=a)a"
?<= is a Zero-width positive lookbehind assertion
Try this code:
string input = File.ReadAllText("file.txt");
foreach (string word in File.ReadLines("words.txt"))
{
var regex = new Regex(word, RegexOptions.IgnoreCase);
int startat = 0;
int count = 0;
Match match = regex.Match(input, startat);
while (match.Success)
{
count++;
startat = match.Index + 1;
match = regex.Match(input, startat);
}
Console.WriteLine(word + "\t" + count);
}
To correctly find all substrings like "aa", had to use the overload Match method with startat parameter.
Note the RegexOptions.IgnoreCase parameter.
A shorter but less clear code:
Match match;
while ((match = regex.Match(input, startat)).Success)
{
count++;
startat = match.Index + 1;
}

Get character after certain character from a String

I need to get a characters after certain character match in a string. Please consider my Input string with expected resultant character set.
Sample String
*This is a string *with more than *one blocks *of values.
Resultant string
Twoo
I have done this
string[] SubIndex = aut.TagValue.Split('*');
string SubInd = "";
foreach (var a in SubIndex)
{
SubInd = SubInd + a.Substring(0,1);
}
Any help to this will be appreciated.
Thanks
LINQ solution:
var str = "*This is a string *with more than *one blocks *of values.";
var chars = str.Split(new char[] {'*'}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.First());
var output = String.Join("", chars);
string s = "*This is a string *with more than *one blocks *of values.";
string[] splitted = s.Split(new char[] { '*' }, StringSplitOptions.RemoveEmptyEntries);
string result = "";
foreach (string split in splitted)
result += split[0];
Console.WriteLine(result);
Below code should work
var s = "*This is a string *with more than *one blocks *of values."
while ((i = s.IndexOf('*', i)) != -1)
{
// Print out the next char
if(i<s.Length)
Console.WriteLine(s[i+1]);
// Increment the index.
i++;
}
String.Join("",input.Split(new char[]{'*'},StringSplitOptions.RemoveEmptyEntries)
.Select(x=>x.First())
);
string strRegex = #"(?<=\*).";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline | RegexOptions.Singleline);
string strTargetString = "*This is a string *with more than *one blocks *of values.";
StringBuilder sb = new StringBuilder();
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success) sb.Append(myMatch.Value);
}
string result = sb.ToString();
please see below...
char[] s3 = "*This is a string *with more than *one blocks *of values.".ToCharArray();
StringBuilder s4 = new StringBuilder();
for (int i = 0; i < s3.Length - 1; i++)
{
if (s3[i] == '*')
s4.Append(s3[i+1]);
}
Console.WriteLine(s4.ToString());

How to remove empty lines from a formatted string

How can I remove empty lines in a string in C#?
I am generating some text files in C# (Windows Forms) and for some reason there are some empty lines. How can I remove them after the string is generated (using StringBuilder and TextWrite).
Example text file:
THIS IS A LINE
THIS IS ANOTHER LINE AFTER SOME EMPTY LINES!
If you also want to remove lines that only contain whitespace, use
resultString = Regex.Replace(subjectString, #"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline);
^\s+$ will remove everything from the first blank line to the last (in a contiguous block of empty lines), including lines that only contain tabs or spaces.
[\r\n]* will then remove the last CRLF (or just LF which is important because the .NET regex engine matches the $ between a \r and a \n, funnily enough).
Tim Pietzcker - it is not working for me. I have to change a little bit, but thanks!
Ehhh C# Regex.. I had to change it again, but this it working well:
private string RemoveEmptyLines(string lines)
{
return Regex.Replace(lines, #"^\s*$\n|\r", string.Empty, RegexOptions.Multiline).TrimEnd();
}
Example:
http://regex101.com/r/vE5mP1/2
You could try String.Replace("\n\n", "\n");
Try this
Regex.Replace(subjectString, #"^\r?\n?$", "", RegexOptions.Multiline);
private string remove_space(string st)
{
String final = "";
char[] b = new char[] { '\r', '\n' };
String[] lines = st.Split(b, StringSplitOptions.RemoveEmptyEntries);
foreach (String s in lines)
{
if (!String.IsNullOrWhiteSpace(s))
{
final += s;
final += Environment.NewLine;
}
}
return final;
}
private static string RemoveEmptyLines(string text)
{
var lines = text.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
var sb = new StringBuilder(text.Length);
foreach (var line in lines)
{
sb.AppendLine(line);
}
return sb.ToString();
}
None of the methods mentioned here helped me all the way, but I found a workaround.
Split text to lines - collection of strings (with or without empty strings, also Trim() each string).
Add these lines to multiline string.
public static IEnumerable<string> SplitToLines(this string inputText, bool removeEmptyLines = true)
{
if (inputText == null)
{
yield break;
}
using (StringReader reader = new StringReader(inputText))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (removeEmptyLines && !string.IsNullOrWhiteSpace(line))
yield return line.Trim();
else
yield return line.Trim();
}
}
}
public static string ToMultilineText(this string text)
{
var lines = text.SplitToLines();
return string.Join(Environment.NewLine, lines);
}
Based on Evgeny Sobolev's code, I wrote this extension method, which also trims the last (obsolete) line break using TrimEnd(TrimNewLineChars):
public static class StringExtensions
{
private static readonly char[] TrimNewLineChars = Environment.NewLine.ToCharArray();
public static string RemoveEmptyLines(this string str)
{
if (str == null)
{
return null;
}
var lines = str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);
var stringBuilder = new StringBuilder(str.Length);
foreach (var line in lines)
{
stringBuilder.AppendLine(line);
}
return stringBuilder.ToString().TrimEnd(TrimNewLineChars);
}
}
I found a simple answer to this problem:
YourradTextBox.Lines = YourradTextBox.Lines.Where(p => p.Length > 0).ToArray();
Adapted from Marco Minerva [MCPD] at Delete Lines from multiline textbox if it's contain certain string - C#
I tried the previous answers, but some of them with regex do not work right.
If you use a regex to find the empty lines, you can’t use the same for deleting.
Because it will erase "break lines" of lines that are not empty.
You have to use "regex groups" for this replace.
Some others answers here without regex can have performance issues.
private string remove_empty_lines(string text) {
StringBuilder text_sb = new StringBuilder(text);
Regex rg_spaces = new Regex(#"(\r\n|\r|\n)([\s]+\r\n|[\s]+\r|[\s]+\n)");
Match m = rg_spaces.Match(text_sb.ToString());
while (m.Success) {
text_sb = text_sb.Replace(m.Groups[2].Value, "");
m = rg_spaces.Match(text_sb.ToString());
}
return text_sb.ToString().Trim();
}
This pattern works perfect to remove empty lines and lines with only spaces and/or tabs.
s = Regex.Replace(s, "^\s*(\r\n|\Z)", "", RegexOptions.Multiline)

Categories

Resources