Delete a specific string from a file in C# - c#

Title says it all, I have a file called test.txt with these contents:
Hello from th [BACK]e
This i [BACK]s line two.
Here, [BACK] is just a visible representation of the backspace. So a i [BACK]s would mean is. Because a backspace is implemented after a space and i.
So basically, at the click of a button, i should be able to access this file and remove ALL strings containing the word [BACK]-1. -1 implemented because a [BACK] means a backspace and is used to remove the last string before the word [BACK].
EDIT:
This time i replaced [BACK] with [SDRWUE49CDKAS]. Just to make it a unique string. I also tested on another file. This time a .html with following contents:
Alpha, Brav [SDRWUE49CDKAS]o, Charlie, Dr [SDRWUE49CDKAS][SDRWUE49CDKAS]elta, Echo.
// ^^Implementing "backspace" ^^Here doing it double because we made a mistake in spelling "Delta"
//These sentences should be Alpha, Bravo, Charlie, Delta, Echo
Did some experimenting and tested it out with this code:
string s = File.ReadAllText(path2html);
string line = "";
string contents = File.ReadAllText(path2html);
if (contents.Contains("[SDRWUE49CDKAS]"))
{
System.IO.StreamWriter sw = new System.IO.StreamWriter(path2html);
s = s.Remove(s.LastIndexOf(line + "[SDRWUE49CDKAS]") - 2, 15);
sw.WriteLine(s);
sw.Close();
}
The edited code above will give me an output of actually deleting [SDRWUE49CDKAS] but not exactly how i wanted it to be:
Alpha, BraS]o, Charlie, DS]elta, Echo.
This really caused some confusion with the testing. And also not to mention that i had to run this code 3 times, because we had 3 x [SDRWUE49CDKAS]. So a loop will do good. I checked out a bunch of similar problems on the web, but couldn't find a one working. I'm tryna test out this one too. But it's using a StreamReader and a StreamWriter at the same time. Or maybe i should make a copy the original, and make a temp file?

var s = "i [BACK]s";
s = s.Remove(s.IndexOf("[BACK]")-1, 1 + "[BACK]".Length);
result is "is".
Explanation:
Find the start-index of [Back]
Go back one position to start at the char before
Remove from there that extra char + the marker chars
But there are several issues:
This assumes that that search string is not at the start of that string (there is a char to remove)
Plus it only removes the first occurrence so you will have to repeat this until all are removed
And it doesn't handle "surrogate pairs"

Use Regex. You may have multiple spaces or no space is it is at the beginning of the line
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input =
"Hello from th [BACK]e\n" +
"This i [BACK]s line two.\n";
string pattern = #"\s*\[BACK\]";
string output = Regex.Replace(input, pattern, "");
}
}

Related

Character ä is represented in different Char Codes in the same string

I have a on web uploaded File Name "Schränke Wintsch.pdf".
The file Name is saved in a XML file like so:
<File>Schra?nke Wintsch.pdf</File>
If I debug this in c# and maunally add an ä, then it is saved correctly.
<File>Schra?nke Wintsch-ä.pdf</File>
OK I know it is an Encoding Problem.
But why is the same ä character represented with different char codes(example on Img 2)?
XML defines the encoding used within the document using the header. It will look something like this.. <?xml version="1.0" encoding="ISO-8859-9" ?>.
If you append the string make sure to use the same encoding to avoid a mismatch.
Test appending the char bytes and see if that helps.
var en = Encoding.GetEncoding("ISO-8859-9");
en.GetString(Encoding.GetBytes("ä"));
The original XML that you have is using the Unicode 'COMBINING DIAERESIS' code (int value 776) to use two characters to representä.
(Note how the combining character has been displayed as ? in the <File>Schra?nke Wintsch.pdf</File> image in your post.)
The 776 code says to put the double-dots above the previous character (an a).
However, where you typed in the ä it has been stored as the unicode character with code 228.
The question you need to answer is: Why is the original source XML using the "Combining Diaeresis" character rather than the more usual ä? (Without knowing the origin of the XML file, we cannot answer that question.)
Incidentally, you can "normalise" those sorts of characters by using string.Normalize(), as demonstrated by the following program:
using System;
namespace Demo
{
static class Program
{
static void Main()
{
char[] a = {(char)97, (char)776};
string s = new string(a);
Console.WriteLine(s + " -> " + s.Length); // Prints a¨ -> 2
var t = s.Normalize();
Console.WriteLine(t + " -> " + t.Length); // Prints ä -> 1
}
}
}
Note how the length of s is 2, but the length of t is only 1 (and it contains the single character ä).
So you might be able to improve things by using string.Normalize() to normalise these unexpected characters.
string.Normalize() ist the working solution for the string "Schränke Wintsch-ä.pdf". So it ist correctly saved as Schränke Wintsch-ä.pdf

How do I find a variable set of 5 numbers qualified by surrounding underscores?

I am pulling file names into a variable (#[User::FileName]) and attempting to extract the work order number (always 5 numbers with underscores on both sides) from that string. For example, a file name would look like - "ABC_2017_DEF_9_12_GHI_35132_S5160.csv". I want result to return "35132". I have found examples of how to do it such as this SUBSTRING(FileName,1,FINDSTRING(FileName,"_",1) - 1) but the underscore will not always be in the same location.
Is it possible to do this in the expression builder?
Answer:
public void Main()
{
string strFilename = Dts.Variables["User::FileName"].Value.ToString();
var RegexObj = new Regex(#"_([\d]{5})_");
var match = RegexObj.Match(strFilename);
if (match.Success)
{
Dts.Variables["User::WorkOrder"].Value = match.Groups[1].Value;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
First of all, the example you have provided ABC_2017_DEF_9_12_GHI_35132_S5160.csv contains 4 numbers located between underscores:
2017 , 9 , 12 , 35132
I don't know if the filename may contains many a 5 digits number can occurs many times, so in my answer i will assume that the number you want to return is the last occurrence of the number made of 5 digits.
Solution
You have to use the Following Regular Expression:
(?:_)\K[0-9][0-9][0-9][0-9][0-9](?=_)
DEMO
Or as #MartinSmith Suggested (in a comment), you can use the following RegEx:
_([\d]{5})_
Implemeting RegEx in SSIS
First add another Variable (Ex: #[User::FileNumber])
Add a Script Task and choose #[User::Filename] variable as ReadOnlyVariable, and #[User:FileNumber] as ReadWriteVariable
Inside the script task use the following code:
using System.Text.RegularExpressions;
public void Main()
{
string strFilename = Dts.Variables["filename"].Value.ToString();
string strNumber;
var objRegEx = new Regex(#"(?:_)\K[0-9][0-9][0-9][0-9][0-9](?=_)");
var mc = objRegEx.Matches(strFilename);
//The last match contains the value needed
strNumber = mc[mc.Count - 1].Value;
Dts.Variables["FileNumber"].Value.ToString();
Dts.TaskResult = (int)ScriptResults.Success;
}
do the other pieces mean something?
anyway you can use a script task and split function.
pass in #fileName as readonly, and #WO as readwrite
string fn = Dts.Variables["fileName"].Value;
string[] parts = fn.Split('_');
//Assuming it's always the 7th part
// You could extract the other parts as well.
Dts.Variables["WO"].Value = part(6);
I would do this with a Script Transformation (or Script Task if this is not in a DataFlow) and use a Regex.

Remove a specific part of a string

I want to go through a text file and remove specific parts of the string.
In this case I want to remove the path:
PERFORMER "Charles Lloyd"
TITLE "Mirror"
FILE "Charles Lloyd\[2010] Mirror\01. I Fall In Love Too Easily.wav" WAVE
TRACK 01 AUDIO
FILE "Charles Lloyd\[2010] Mirror\02. Go Down Moses.wav" WAVE
to
PERFORMER "Charles Lloyd"
TITLE "Mirror"
FILE "01. I Fall In Love Too Easily.wav" WAVE //here are the changes
TRACK 01 AUDIO
FILE "02. Go Down Moses.wav" WAVE //here are the changes
I tried out things like: (given the string s which contains the whole text)
s = s.Remove(s.IndexOf("FILE") + 5, (s.IndexOf("\\") + 1) - s.IndexOf("FILE") - 5);
and repeat this function to remove the part between "FILE " " and the following backslash
It removes the part correctly, but I would have to manually adjust the number of times it has to run this function (run once for every backslash per line). But this algorithm lacks flexibility and I don't know how to make it approach the next line that starts with "FILE" and begin the procedure again...
If all your text is one string variable, you could first split it, and than do replacements for all strings and than join again (assume your text is variable lines):
var strings = lines.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
var replacedStrings = new List<string>();
foreach (var s in strings)
{
string replaced;
if (s.StartsWith("FILE"))
{
var sWithoutFile = s.Substring(5);
replaced = s.Substring(0, 6) +
sWithoutFile.Substring(sWithoutFile.LastIndexOf("\\") + 1);
}
else
{
replaced = s;
}
replacedStrings.Add(replaced);
}
var result = string.Join(Environment.NewLine, replacedStrings);
What about Regular Expressions.
using System;
using System.Text.RegularExpressions;
class RemovePaths
{
static void Main()
{
string input = #"
PERFORMER ""Charles Lloyd""
TITLE ""Mirror""
FILE ""Charles Lloyd\[2010] Mirror\01. I Fall In Love Too Easily.wav"" WAVE
TRACK 01 AUDIO
FILE ""Charles Lloyd\[2010] Mirror\02. Go Down Moses.wav"" WAVE";
string test = #"
PERFORMER ""Charles Lloyd""
TITLE ""Mirror""
FILE ""01. I Fall In Love Too Easily.wav"" WAVE
TRACK 01 AUDIO
FILE ""02. Go Down Moses.wav"" WAVE";
Regex rgx = new Regex(#"(?<=\"").*\\(?=.+\"")");
string result = rgx.Replace(input, "");
Console.WriteLine(result == test ? "Pass" : "Fail");
}
}
Breakdown of the RegEx...
(?<=\"") <--- must start with a double-quote but be excluded using (?<=...)
.\ <--- match any text up to and including a "\". note: . matches anything
(?=.+\"") <--- skip at least one character(.+) and it must end with a double-quote(\").
Assuming that your line always start with FILE " and EndsWith " WAVE, you can use System.Io.Path.GetFilename() Function to achieve this:
If str.StartsWith("File"){
string strResult = "FILE """ + IO.Path.GetFileName(str.Substring(6,str.Length - 12)) + """ WAVE";
}
Example:
FILE "Charles Lloyd\[2010] Mirror\01. I Fall In Love Too Easily.wav" WAVE
Result:
FILE "01. I Fall In Love Too Easily.wav" WAVE
You can read more about this Function in this MSDN article
Split the array using character \ and store the last element in the array back to the string.
For Example something like this:
array = file.split('\')
file = array[array.size - 1];

grouping adjacent similar substrings

I am writing a program in which I want to group the adjacent substrings, e.g ABCABCBC can be compressed as 2ABC1BC or 1ABCA2BC.
Among all the possible options I want to find the resultant string with the minimum length.
Here is code what i have written so far but not doing job. Kindly help me in this regard.
using System;
using System.Collections.Generic;
using System.Linq;
namespace EightPrgram
{
class Program
{
static void Main(string[] args)
{
string input;
Console.WriteLine("Please enter the set of operations: ");
input = Console.ReadLine();
char[] array = input.ToCharArray();
List<string> list = new List<string>();
string temp = "";
string firstTemp = "";
foreach (var x in array)
{
if (temp.Contains(x))
{
firstTemp = temp;
if (list.Contains(firstTemp))
{
list.Add(firstTemp);
}
temp = "";
list.Add(firstTemp);
}
else
{
temp += x;
}
}
/*foreach (var item in list)
{
Console.WriteLine(item);
}*/
Console.ReadLine();
}
}
}
You can do this with recursion. I cannot give you a C# solution, since I do not have a C# compiler here, but the general idea together with a python solution should do the trick, too.
So you have an input string ABCABCBC. And you want to transform this into an advanced variant of run length encoding (let's called it advanced RLE).
My idea consists of a general first idea onto which I then apply recursion:
The overall target is to find the shortest representation of the string using advanced RLE, let's create a function shortest_repr(string).
You can divide the string into a prefix and a suffix and then check if the prefix can be found at the beginning of the suffix. For your input example this would be:
(A, BCABCBC)
(AB, CABCBC)
(ABC, ABCBC)
(ABCA, BCBC)
...
This input can be put into a function shorten_prefix, which checks how often the suffix starts with the prefix (e.g. for the prefix ABC and the suffix ABCBC, the prefix is only one time at the beginning of the suffix, making a total of 2 ABC following each other. So, we can compact this prefix / suffix combination to the output (2ABC, BC).
This function shorten_prefix will be used on each of the above tuples in a loop.
After using the function shorten_prefix one time, there still is a suffix for most of the string combinations. E.g. in the output (2ABC, BC), there still is the string BC as suffix. So, need to find the shortest representation for this remaining suffix. Wooo, we still have a function for this called shortest_repr, so let's just call this onto the remaining suffix.
This image displays how this recursion works (I only expanded one of the node after the 3rd level, but in fact all of the orange circles would go through recursion):
We start at the top with a call of shortest_repr to the string ABABB (I selected a shorter sample for the image). Then, we split this string at all possible split positions and get a list of prefix / suffix pairs in the second row. On each of the elements of this list we first call the prefix/suffix optimization (shorten_prefix) and retrieve a shortened prefix/suffix combination, which already has the run-length numbers in the prefix (third row). Now, on each of the suffix, we call our recursion function shortest_repr.
I did not display the upward-direction of the recursion. When a suffix is the empty string, we pass an empty string into shortest_repr. Of course, the shortest representation of the empty string is the empty string, so we can return the empty string immediately.
When the result of the call to shortest_repr was received inside our loop, we just select the shortest string inside the loop and return this.
This is some quickly hacked code that does the trick:
def shorten_beginning(beginning, ending):
count = 1
while ending.startswith(beginning):
count += 1
ending = ending[len(beginning):]
return str(count) + beginning, ending
def find_shortest_repr(string):
possible_variants = []
if not string:
return ''
for i in range(1, len(string) + 1):
beginning = string[:i]
ending = string[i:]
shortened, new_ending = shorten_beginning(beginning, ending)
shortest_ending = find_shortest_repr(new_ending)
possible_variants.append(shortened + shortest_ending)
return min([(len(x), x) for x in possible_variants])[1]
print(find_shortest_repr('ABCABCBC'))
print(find_shortest_repr('ABCABCABCABCBC'))
print(find_shortest_repr('ABCABCBCBCBCBCBC'))
Open issues
I think this approach has the same problem as the recursive levenshtein distance calculation. It calculates the same suffices multiple times. So, it would be a nice exercise to try to implement this with dynamic programming.
If this is not a school assignment or performance critical part of the code, RegEx might be enough:
string input = "ABCABCBC";
var re = new Regex(#"(.+)\1+|(.+)", RegexOptions.Compiled); // RegexOptions.Compiled is optional if you use it more than once
string output = re.Replace(input,
m => (m.Length / m.Result("$1$2").Length) + m.Result("$1$2")); // "2ABC1BC" (case sensitive by default)

need to do an ascii shift cypher in c#

I am trying to shift the characters in a string by 20 to match a file format I used in basic. I am using the following code in c#
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace test_controls
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public string text3;
private void button1_Click(object sender, EventArgs e)
{
// This is a test conversion of my pdw save routine from basic to c#
int pos = 0;
string text = label1.Text;
int t = text.Length; // get the length of the text
while (pos < t + 1) ;
string s = text.Substring(pos, 1); // get subsstring 1 character at a time
byte[] ASCIIValues = Encoding.ASCII.GetBytes(text); // convert that character to ascii
foreach (byte b in ASCIIValues)
{
int temp = b;
temp = temp + 20; // add 20 to the ascii value
char text2 = Convert.ToChar(temp); // convert the acii back into a char
text3 =""+ text2.ToString(); // add the char to the final string
}
label1.Text = text3; // rewrite the new string to replace the old one for the label1.text
}
}
}
The problem is it just does nothing and doesn't respond and I have to tell windows to close the unresponsive program. To be clear, I'm using winforms in c# to make a shift cypher. All of this code I'm using I found in various answers and pieced it together. in Vb or any other basic, I would just get the ascii value of each character in the string then do the math and convert it back using the chr$ command.
Any help would be greatly appreciated.
You have two problems. As pointed out in the comments, the following line is an infinite loop:
while (pos < t + 1) ;
Even without the loop, though, your shift algorithm is incorrect. The following line will also lead to incorrect results:
temp = temp + 20;
Consider as counterexamples the following cases:
G maps to [
ASCII z = 122. 122 + 20 = 144, which isn't even a valid ASCII character.
Uppercase Z would map to lowercase n
You could come up with other similar cases.
Incidentally, you could also rewrite this line as temp += 20.
Finally, this line is incorrect:
text3 =""+ text2.ToString();
You're not appending the new text to text3, you're replacing it every time you do an iteration, so text3 will always contain the last character encoded (rather than the entire string). Keep in mind, too, that building a C# string like this (especially a long one) is inefficient as strings are immutable objects in C#. If the string in question might be long you want to consider using a StringBuilder for this purpose.

Categories

Resources