C#: Getting Substring between two different Delimiters - c#

I have problems splitting this Line. I want to get each String between "#VAR;" and "#ENDVAR;". So at the End, there should be a output of:
Variable=Speed;Value=Fast;
Variable=Fabricator;Value=Freescale;Op==;
Later I will separate each Substring, using ";" as a delimiter but that I guess wont be that hard. This is how a line looks like:
#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;
I tried some split-options, but most of the time I just get an empty string. I also tried a Regex. But either the Regex was wrong or it wasnt suitable to my String. Probably its wrong, at school we learnt Regex different then its used in C#, so I was confused while implementing.
Regex.Match(t, #"/#VAR([a-z=a-z]*)/#ENDVAR")
Edit:
One small question: I am iterating over many lines like the one in the question. I use NoIdeas code on the line to get it in shape. The next step would be to print it as a Text-File. To print an Array I would have to loop over it. But in every iteration, when I get a new line, I overwrite the Array with the current splitted string. I put the Rest of my code in the question, would be nice if someone could help me.
string[] w ;
foreach (EA.Element theObjects in myPackageObject.Elements)
{
theObjects.Type = "Object";
foreach (EA.Element theElements in PackageHW.Elements)
{
if (theObjects.ClassfierID == theElements.ElementID)
{
t = theObjects.RunState;
w = t.Replace("#ENDVAR;", "#VAR;").Replace("#VAR;", ";").Split(new string[] { ";" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in w)
{
tw2.WriteLine(s);
}
}
}
}
The piece with the foreach-loop is wrong pretty sure. I need something to print each splitted t. Thanks in advance.

you can do it without regex using
str.Replace("#ENDVAR;", "#VAR;")
.Split(new string[] { "#VAR;" }, StringSplitOptions.RemoveEmptyEntries);
and if you want to save time you can do:
str.Replace("#ENDVAR;", "#VAR;")
.Replace("#VAR;", ";")
.Split(new string[] { ";" }, StringSplitOptions.RemoveEmptyEntries);

You can use a look ahead assertion here.
#VAR;(.*?)(?=#ENDVAR)
If your string never consists of whitespace between #VAR; and #ENDVAR; you could use the below line, this will not match empty instances of your lines.
#VAR;([^\s]+)(?=#ENDVAR)
See this demo

Answer using raw string manipulation.
IEnumerable<string> StuffFoundInside(string biggerString)
{
var closeDelimeterIndex = 0;
do
{
int openDelimeterIndex = biggerString.IndexOf("#VAR;", startingIndex);
if (openDelimeterIndex != -1)
{
closeDelimeterIndex = biggerString.IndexOf("#ENDVAR;", openDelimeterIndex);
if (closeDelimiterIndex != -1)
{
yield return biggerString.Substring(openDelimeterIndex, closeDelimeterIndex - openDelimiterIndex);
}
}
} while (closeDelimeterIndex != -1);
}
Making a list and adding each item to the list then returning the list might be faster, depending on how the code using this code would work. This allows it to terminate early, but has the coroutine overhead.

Use this regex:
(?i)#VAR;(.+?)#ENDVAR;
Group 1 in each match will be your line content.

(If you don't like regexs)
Code:
var s = "#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;";
var tokens = s.Split(new String [] {"#ENDVAR;#VAR;"}, StringSplitOptions.None);
foreach (var t in tokens)
{
var st = t.Replace("#VAR;", "").Replace("#ENDVAR;", "");
Console.WriteLine(st);
}
Output:
Variable=Speed;Value=Fast;Op==;
Variable=Fabricator;Value=Freescale;Op==;

Regex.Split works well but yields empty entries that have to be removed as shown here:
string[] result = Regex.Split(input, #"#\w+;")
.Where(s => s != "")
.ToArray();

I tried some split-options, but most of the time I just get an empty string.
In this case the requirements seem to be simpler than you're stating. Simply splitting and using linq will do your whole operation in one statement:
string test = "#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;";
List<List<string>> strings = (from s in test.Split(new string[]{"#VAR;",";#ENDVAR;"},StringSplitOptions.RemoveEmptyEntries)
let s1 = s.Split(new char[]{';'},StringSplitOptions.RemoveEmptyEntries).ToList<string>()
select (s1)).ToList<List<string>>();
the outpout is:
?strings[0]
Count = 3
[0]: "Variable=Speed"
[1]: "Value=Fast"
[2]: "Op=="
?strings[1]
Count = 3
[0]: "Variable=Fabricator"
[1]: "Value=Freescale"
[2]: "Op=="
To write the data to a file something like this will work:
foreach (List<string> s in strings)
{
System.IO.File.AppendAllLines("textfile1.txt", s);
}

Related

Need to refer to second to the last element of array of partial filenames

I need to find distinct values of partial filenames in an array of filenames. I'd like to do it in one line.
So, I have something like that as a filenames:
string[] filenames = {"aaa_ab12345.txt", "bbb_ab12345.txt", "aaa_ac12345.txt", "bbb_ac12345"}
and I need to find distinct values for ab12345 part of it.
So I currently have something like that:
string[] filenames_partial_distinct = Array.ConvertAll(
filenames,
file => System.IO.Path.GetFileNameWithoutExtension(file)
.Split({"_","."}, StringSplitOptions.RemoveEmptyEntries)[1]
)
.Distinct()
.ToArray();
Now, I'm getting filenames that are of form of aaa_bbb_ab12345.txt. So, instead of referring to the second part of the filename, I need to refer to the second to the last.
So, how do I refer to an arbitrary element based on length of array in one line, if it's a result of Split method? Something along lines of:
Array.ConvertAll(filenames, file=>file.Split(separator)[this.Length-2]).Distinct().ToArray();
In other words, if a string method results in an array of strings, how do I immediately select element based on the length of array:
String.Split()[third from end, fifth from end, etc.];
If you use GetFileNameWithoutExtension there will be no extension and therefore splitting by '_' will do it. Then you can take the last part with .Last().
string[] filenames_partial_distinct = Array.ConvertAll(
filenames,
file => Path.GetFileNameWithoutExtension(file).Split('_').Last()
)
.Distinct()
.ToArray();
With the input
string[] filenames = { "aaa_ab12345.txt", "bbb_ab12345.txt",
"aaa_ac12345.txt", "bbb_ac12345", "aaa_bbb_ab12345.txt" };
You get the result
{ "ab12345", "ac12345" }
The StringSplitOptions.RemoveEmptyEntries is only required if there are filenames ending with _ (before the extension).
Seems you're looking for something like this:
string[] arr = filenames.Select(n => n.Substring(n.IndexOf("_") + 1, 7)).Distinct().ToArray();
I usually defer problems like this to regex. They are very powerful. This approach also gives you the opportunity to detect unexpected cases and handle them appropriately.
Here is a crude example, assuming I understood your requirements:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string MyMatcher(string filename)
{
// this pattern may need work depending on what you need - it says
// extract that pattern between the "()" which is 2 characters and
// 4 digits, exactly; and can be found in `Groups[1]`.
Regex r = new Regex(#".*_(\w{2}\d{4}).*", RegexOptions.IgnoreCase);
Match m = r.Match(filename);
return m.Success
? m.Groups[1].ToString()
: null; // what should happen here?
}
string[] filenames =
{
"aaa_ab12345.txt",
"bbb_ab12345.txt",
"aaa_ac12345.txt",
"bbb_ac12345",
"aaa_bbb_ab12345.txt",
"ae12345.txt" // MyMatcher() return null for this - what should you do if this happens?
};
var results = filenames
.Select(MyMatcher)
.Distinct();
foreach (var result in results)
{
Console.WriteLine(result);
}
}
}
Gives:
ab1234
ac1234
This can be refined further, such as pre-compiled regex patterns, encapsulation in a class, etc.

C# How to delete every character after something on every line

So I've been trying to figure out how to delete characters after a certain point on each line, for example I have a list like:
dskfokes=dasfn3rewk
dsanfiwen=434efsde
damkw4343=o3rm3i
dmfkim303rk2=0439wefksd
32i32j9esfj=42393jdsf
How would I go about deleting everything on each line after '='?
Use string.IndexOf() to get the index of the char you want to remove, then string.Remove() to do the removing.
string str = "dskfokes=dasfn3rewk";
str = str.Remove(str.IndexOf('='));
One approach is to use LINQ:
var strings = new string[] {
"dskfokes=dasfn3rewk
, "dsanfiwen=434efsde
, "damkw4343=o3rm3i
, "dmfkim303rk2=0439wefksd
, "32i32j9esfj=42393jdsf
};
var res = strings.Select(s => s.Split('=')[0]).ToArray();
This splits each string on =, and drops everything after the first '=' character if it is there.
You may also do it using String.Split() like
string[] allWords = new string[] {
"dskfokes=dasfn3rewk",
"dsanfiwen=434efsde",
"damkw4343=o3rm3i",
"dmfkim303rk2=0439wefksd",
"32i32j9esfj=42393jdsf"
};
foreach(string s in allWords)
{
string[] urstring = s.Split('=');
Console.WriteLine(urstring[0]);
}
I suggest you to use a combination of string handling operations SubString and IndexOf to achieve this, consider the example:
List<string> inputLine = new List<string>(){ "dskfokes=dasfn3rewk",
"dsanfiwen=434efsde",
"damkw4343=o3rm3i",
"dmfkim303rk2=0439wefksd",
"32i32j9esfj=42393jdsf"};
List<string> outputLines = inputLine.Select(x =>
x.IndexOf('=') == -1 ? x :
x.Substring(0, x.IndexOf('='))).ToList();
Note : The IndexOf will return -1 if the specified index was not found, in such cases substring method will throws errors since -1 will not be a valid index. So we have to check for existence of index withing the string before proceeding. you can also try simple foreach as like this:
List<string> outputLines=new List<string>();
int currentCharIndex=-1;
foreach (string line in inputLine)
{
currentCharIndex = line.IndexOf('=');
if(currentCharIndex ==-1)
outputLines.Add(line);
else
outputLines.Add(line.Substring(0,currentCharIndex));
}

comma not being removed as expected from string

I have an MVC app which I need to store information into the database. I get a string value e.g. as
string a = "a,b,c";
I then split the string by removing the commas as
string[] b = a.Split(',');
Now before I save to database I have to add the comma back in and this is where I'm kind of stuck. I can add the comma however one gets added to the end of the string too which I don't want. If I do TrimEnd(',') it removes every comma. Can someone tell me where I'm going wrong please. I'm adding the comma back as:
foreach(var items in b)
{
Console.WriteLine(string.Format("{0},", items));
}
Please note I have to split the comma first due to some validation which needs to be carried out before saving to DB
The expected result should be for example
a,b,c
In stead I get
a,b,c,
Update - The below is the code I'm using In my MVC app after Bruno Garcia answer
string[] checkBoxValues = Request.Form["location"].Split(',');
foreach(var items in checkBoxValues)
{
if (!items.Contains("false"))
{
UsersDto.Location += string.Join(",", items);
}
}
Try:
string.Join(",", b);
This will add a ',' in between each item of your array
Based on the code you posted this is what I think you need
UsersDto.Location = string.Join(
",",
Request.Form["location"]
.Split(',')
.Where(item => !item.Contains("false")));
That will split the values in Request.Form["location"] on comma. Then filter out items that contain "false" as a substring, and finally join them back together with a comma.
So a string like "abc,def,blahfalseblah,xyz" would become "abc,def,xyz".
You can just use String.Join then?
var result = String.join(",", b); // a,b,c
Full document: https://msdn.microsoft.com/en-us/library/57a79xd0(v=vs.110).aspx
it can do
string[] checkBoxValues = Request.Form["location"].Split(',');
string s = "";
foreach (var items in checkBoxValues)
{
if (!items.Contains("false"))
{
s = s + string.Format("{0},", items);
}
}
UsersDto.Location = s.TrimEnd(',');

string.Split ignores Null values between delimiters

I'm trying to convert some data into sql statements with the use of Streamreader and Streamwriter.
My problem is, when i split lines which in which between 2 delimiters is nothing, not even a space, they get ignored and i get a IndexOutOfRange error
because my temparray only goes till temparray[3] , but it should go to like temparray[6] ..
How can i split and use Null values or replace those null values with a simple char, so that i dont get an IndexOutOfRange error when i want to create my sql statements ?
foreach (string a in values)
{
int temp = 1;
String[] temparray = a.Split(';');
streamWriter.WriteLine("Insert into table Firma values({0},'{1}','{2}')", temp, temparray[1], temparray[4]);
temp++;
}
First of all, this is asking for trouble (SQL injection). You should at the very least escape the values parsed from the string.
And you seem to be mistaken, as String.Split does exactly what you want by default: "x;y;;z".Split(';') returns a four-element array {"x", "y", "", "z"}. You can achieve the described behavior by using StringSplitOptions.RemoveEmptyEntries: "x;y;;z".Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries) returns a three-element array {"x", "y", "z"}. Which is what you do not want, as you say.
Either way, "Überarbeitung der SAV Seite;b.i.b.;;;;PB;".Split(';') returns a seven-element array here for me, so check your inputs and implementation…
If you print out your string, I'm pretty sure it will not be what you expect it to be.
static void Main(string[] args)
{
var result = "Überarbeitung der SAV Seite;b.i.b.;;;;PB;".Split(';');
foreach (var part in result)
{
Console.WriteLine(" --> " + part);
}
Console.ReadLine();
}
This works great. It will not ignore the empty values. It will print
--> Überarbeitung der SAV Seite
--> b.i.b.
-->
-->
-->
--> PB
-->
including the empty values.
Greetings to bib Paderborn :)
If you're using SQL Server, you can return empty strings instead of null by using the ISNULL operator in your query.
For example:
SELECT ISNULL(PR_Arbeitstitel, '') FROM Table
Why don't you iterate over your temparray to build up a string of param values.
This is by no means perfect, but should point you in the direction
foreach (string a in values)
{
int temp = 1;
String[] temparray = a.Split(';');
var stringBuilder = new StringBuilder();
foreach (var s in temparray)
stringBuilder.Append("'" + s + "',");
var insertStatement = string.Format("Insert into table Firma values({0}, {1})", temp, stringBuilder.ToString().TrimEnd(','));
temp++;
}
Why would you have 2 delimiters in a row with nothing inbetween them? Do you not control the input?
In that case you could control it by inserting a so-called sentinel value, such as "IGNOREMEPLEASE":
String[] temparray = a.Replace(";;", ";IGNOREMEPLEASE;").Split(';');
Then the rest of your code knows that IGNOREMEPLEASE means there was an empty line, and it should be ignored.
That being said, be very careful about what you send to a database, and scrub incoming data that you use to build SQL statements with, to get rid of anything dangerous.
I don't see your issue occurring. The following outputs a string.Empty for string[2] and has all 5 elements
string[] str = "0,1,,3,4".Split(new char[] { ',' });
foreach (string s in str)
{
Debug.Print(s);
}
output
0
1
3
4
I've tried to reproduce with your example of string "Überarbeitung der SAV Seite;b.i.b.;;;;PB;" but everything was fine. I've got 7 items in array.
You can try to use
string s = "Überarbeitung der SAV Seite;b.i.b.;;;;PB;";
var result = s.Split(new[] { ';' }, StringSplitOptions.None);
To be sure that StringSplitOptions.RemoveEmptyEntries is not enabled.
Perhaps use the ?? operator:
streamWriter.WriteLine(
"Insert into table Firma values({0},'{1}','{2}')",
temp,
temparray[1] ?? 'x',
temparray[4] ?? 'x');
This still is only safe, though, if you know for sure that your input has at least 5 tokens after splitting. If you can't guarantee this you'll need to wrap it in a conditional:
if (temparray.Length < 5)
{
// handle invalid input
}

parse lines using linq to txt

var t1 = from line in File.ReadAllLines(#"alkahf.txt")
let item = line.Split(new string[] {". "}, StringSplitOptions.RemoveEmptyEntries)
let verse = line.Split(new string[] { "\n. " }, StringSplitOptions.RemoveEmptyEntries)
select new
{
Index = item,
Text = verse
};
having problems with above code im unsure how to parse the lines properly.
the format of the file is like so, I would also like to ignore any empty lines
StringSplitOptions.RemoveEmptyEntries doesn't work for some reason
1. This is text it might have numbers
2. I skipped a line
In the LINQ part, you are inside a single line, so you might want to exclude the empty lines first:
from line in File.ReadAllLines(#"alkahf.txt")
where !string.IsNullOrEmpty(line)
You then do two splits - one on newline, which is odd (since that won't be there, since we know we are reading lines). I expect you mean something like:
let parts = line.Split('.')
where parts.Length == 2
select new {
Index = parts[0],
Text = parts[1]
};
?
Also, note that ReadAllLines is a buffered operation; if you want true streaming, you might want something like:
public static IEnumerable<string> ReadLines(string path) {
using(var reader = File.OpenText(path)) {
string line;
while((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
which is not buffering (you don't load the entire file at once). Just change the first line to:
from line in ReadLines(#"alkahf.txt")
Thanks to Marc's answer I fixed my issue. Sorry for the late response I'm working on this as a personal project.
The code is like so
var t1 = from line in StreamReaderExtension.ReadLinesFromFile(#"alkahf.txt")
let parts = line.Split(new string[]{". "},
StringSplitOptions.RemoveEmptyEntries)
where !string.IsNullOrEmpty(line)
&& int.Parse(parts[0].ToString()).ToString() != ""
select new
{
Index = parts[0],
Text = parts[1]
};
The int parse addition makes sure that the input is returning an integer, if you're using this code it's a good idea to set a flag in case it picks ups a non-integer or it will go unnoticed.

Categories

Resources