parse lines using linq to txt - c#

var t1 = from line in File.ReadAllLines(#"alkahf.txt")
let item = line.Split(new string[] {". "}, StringSplitOptions.RemoveEmptyEntries)
let verse = line.Split(new string[] { "\n. " }, StringSplitOptions.RemoveEmptyEntries)
select new
{
Index = item,
Text = verse
};
having problems with above code im unsure how to parse the lines properly.
the format of the file is like so, I would also like to ignore any empty lines
StringSplitOptions.RemoveEmptyEntries doesn't work for some reason
1. This is text it might have numbers
2. I skipped a line

In the LINQ part, you are inside a single line, so you might want to exclude the empty lines first:
from line in File.ReadAllLines(#"alkahf.txt")
where !string.IsNullOrEmpty(line)
You then do two splits - one on newline, which is odd (since that won't be there, since we know we are reading lines). I expect you mean something like:
let parts = line.Split('.')
where parts.Length == 2
select new {
Index = parts[0],
Text = parts[1]
};
?
Also, note that ReadAllLines is a buffered operation; if you want true streaming, you might want something like:
public static IEnumerable<string> ReadLines(string path) {
using(var reader = File.OpenText(path)) {
string line;
while((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
which is not buffering (you don't load the entire file at once). Just change the first line to:
from line in ReadLines(#"alkahf.txt")

Thanks to Marc's answer I fixed my issue. Sorry for the late response I'm working on this as a personal project.
The code is like so
var t1 = from line in StreamReaderExtension.ReadLinesFromFile(#"alkahf.txt")
let parts = line.Split(new string[]{". "},
StringSplitOptions.RemoveEmptyEntries)
where !string.IsNullOrEmpty(line)
&& int.Parse(parts[0].ToString()).ToString() != ""
select new
{
Index = parts[0],
Text = parts[1]
};
The int parse addition makes sure that the input is returning an integer, if you're using this code it's a good idea to set a flag in case it picks ups a non-integer or it will go unnoticed.

Related

Read specific values out of a text-file and put them in a list

I have a text-file with many lines, each line looks like this:
"string string double double" between each value is a space. I'd like to read out the first string and last double of every line and put these two values in a existing list. That is my code so far, but it doesnt really work.
private void bOpen_Click(object sender, RoutedEventArgs e)
{
bool exists = File.Exists(#"C:\Users\p2\Desktop\Liste.txt");
if (exists == true)
{
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(#"C:\Users\p2\Desktop\Liste.txt"))
{
Vgl comp = new Vgl();
comp.name = Abzahlungsdarlehenrechner.zgName;
comp.gErg = Abzahlungsdarlehenrechner.zgErg;
GlobaleDaten.VglDaten.Add(comp);
int i = 0;
string line = File.ReadLines(#"Liste.txt").Skip(0).Take(1).First();
while ((line = sr.ReadLine()) != null)
{
sb.Append((line));
listBox.Items.Add(line);
GlobaleDaten.VglDaten.Add(comp);
i++;
}
}
}
I have already read this, but it didnt help How do I read specific value[...]
You can try Linq:
var source = File
.ReadLines(#"C:\Users\p2\Desktop\Liste.txt")
.Select(line => line.Split(' '))
.Select(items => new Vgl() {
name = items[0],
gErg = double.Parse(items[3])
});
// If you want to add into existing list
GlobaleDaten.VglDaten.AddRange(source);
// If you want to create a new list
//List<Vgl> list = source.ToList();
how about
List<Vgl> Result = File.ReadLines(#"C:\Users\p2\Desktop\Liste.txt")
.Select(x => new Vgl()
{
name = x.Split(' ').First(),
gErg = decimal.Parse(x.Split(' ').Last(), NumberStyles.AllowCurrencySymbol)
})
.ToList();
I would avoid storing money within doulbe values because this could lead to rounding issues. Use decimal instead. Examples here: Is a double really unsuitable for money?
You can use:
string[] splitBySpace = line.Split(' ');
string first = splitBySpace.ElementAt(0);
decimal last = Convert.ToDecimal(splitBySpace.ElementAt(splitBySpace.Length - 1));
Edit : To Handle Currency symbol:
string[] splitBySpace = line.Split(' ');
string pattern = #"[^0-9\.\,]+";
string first = splitBySpace.ElementAt(0);
string last = (new Regex(pattern)).Split(splitBySpace.ElementAt(splitBySpace.Length - 1))
.FirstOrDefault();
decimal lastDecimal;
bool success = decimal.TryParse(last, out lastDecimal);
I agree with #Dmitry and fubo, if you are looking for alternatives, you could try this.
var source = File
.ReadLines(#"C:\Users\p2\Desktop\Liste.txt")
.Select(line =>
{
var splits = line.Split(' '));
return new Vgl()
{
name = splits[0],
gErg = double.Parse(splits[3])
};
}
use string.split using space as the delimiter on line to the string into an array with each value. Then just access the first and last array element. Of course, if you aren't absolutely certain that each line contains exactly 4 values, you may want to inspect the length of the array to ensure there are at least 4 values.
reference on using split:
https://msdn.microsoft.com/en-us/library/ms228388.aspx
Read the whole file as a string.
Split the string in a foreach loop using \r\n as a row separator. Add each row to a list of strings.
Iterate through that list and split again each record in another loop using space as field separator and put them into another list of strings.
Now you have all the four fields containig one row. Now just use First and Last methods to get the first word and the last number.

C#: Getting Substring between two different Delimiters

I have problems splitting this Line. I want to get each String between "#VAR;" and "#ENDVAR;". So at the End, there should be a output of:
Variable=Speed;Value=Fast;
Variable=Fabricator;Value=Freescale;Op==;
Later I will separate each Substring, using ";" as a delimiter but that I guess wont be that hard. This is how a line looks like:
#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;
I tried some split-options, but most of the time I just get an empty string. I also tried a Regex. But either the Regex was wrong or it wasnt suitable to my String. Probably its wrong, at school we learnt Regex different then its used in C#, so I was confused while implementing.
Regex.Match(t, #"/#VAR([a-z=a-z]*)/#ENDVAR")
Edit:
One small question: I am iterating over many lines like the one in the question. I use NoIdeas code on the line to get it in shape. The next step would be to print it as a Text-File. To print an Array I would have to loop over it. But in every iteration, when I get a new line, I overwrite the Array with the current splitted string. I put the Rest of my code in the question, would be nice if someone could help me.
string[] w ;
foreach (EA.Element theObjects in myPackageObject.Elements)
{
theObjects.Type = "Object";
foreach (EA.Element theElements in PackageHW.Elements)
{
if (theObjects.ClassfierID == theElements.ElementID)
{
t = theObjects.RunState;
w = t.Replace("#ENDVAR;", "#VAR;").Replace("#VAR;", ";").Split(new string[] { ";" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in w)
{
tw2.WriteLine(s);
}
}
}
}
The piece with the foreach-loop is wrong pretty sure. I need something to print each splitted t. Thanks in advance.
you can do it without regex using
str.Replace("#ENDVAR;", "#VAR;")
.Split(new string[] { "#VAR;" }, StringSplitOptions.RemoveEmptyEntries);
and if you want to save time you can do:
str.Replace("#ENDVAR;", "#VAR;")
.Replace("#VAR;", ";")
.Split(new string[] { ";" }, StringSplitOptions.RemoveEmptyEntries);
You can use a look ahead assertion here.
#VAR;(.*?)(?=#ENDVAR)
If your string never consists of whitespace between #VAR; and #ENDVAR; you could use the below line, this will not match empty instances of your lines.
#VAR;([^\s]+)(?=#ENDVAR)
See this demo
Answer using raw string manipulation.
IEnumerable<string> StuffFoundInside(string biggerString)
{
var closeDelimeterIndex = 0;
do
{
int openDelimeterIndex = biggerString.IndexOf("#VAR;", startingIndex);
if (openDelimeterIndex != -1)
{
closeDelimeterIndex = biggerString.IndexOf("#ENDVAR;", openDelimeterIndex);
if (closeDelimiterIndex != -1)
{
yield return biggerString.Substring(openDelimeterIndex, closeDelimeterIndex - openDelimiterIndex);
}
}
} while (closeDelimeterIndex != -1);
}
Making a list and adding each item to the list then returning the list might be faster, depending on how the code using this code would work. This allows it to terminate early, but has the coroutine overhead.
Use this regex:
(?i)#VAR;(.+?)#ENDVAR;
Group 1 in each match will be your line content.
(If you don't like regexs)
Code:
var s = "#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;";
var tokens = s.Split(new String [] {"#ENDVAR;#VAR;"}, StringSplitOptions.None);
foreach (var t in tokens)
{
var st = t.Replace("#VAR;", "").Replace("#ENDVAR;", "");
Console.WriteLine(st);
}
Output:
Variable=Speed;Value=Fast;Op==;
Variable=Fabricator;Value=Freescale;Op==;
Regex.Split works well but yields empty entries that have to be removed as shown here:
string[] result = Regex.Split(input, #"#\w+;")
.Where(s => s != "")
.ToArray();
I tried some split-options, but most of the time I just get an empty string.
In this case the requirements seem to be simpler than you're stating. Simply splitting and using linq will do your whole operation in one statement:
string test = "#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;";
List<List<string>> strings = (from s in test.Split(new string[]{"#VAR;",";#ENDVAR;"},StringSplitOptions.RemoveEmptyEntries)
let s1 = s.Split(new char[]{';'},StringSplitOptions.RemoveEmptyEntries).ToList<string>()
select (s1)).ToList<List<string>>();
the outpout is:
?strings[0]
Count = 3
[0]: "Variable=Speed"
[1]: "Value=Fast"
[2]: "Op=="
?strings[1]
Count = 3
[0]: "Variable=Fabricator"
[1]: "Value=Freescale"
[2]: "Op=="
To write the data to a file something like this will work:
foreach (List<string> s in strings)
{
System.IO.File.AppendAllLines("textfile1.txt", s);
}

reading a CSV issue

I am trying to read a csv
following is the sample.
"0734306547 ","9780734306548 ","Jane Eyre Pink PP ","Bronte Charlotte ","FRONT LIST",20/03/2013 0:00:00,0,"PAPERBACK","Y","Pen"
Here is the code i am using read CSV
public void readCSV()
{
StreamReader reader = new StreamReader(File.OpenRead(#"C:\abc\21-08-2013\PNZdatafeed.csv"),Encoding.ASCII);
List<string> ISBN = new List<String>();
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (!String.IsNullOrWhiteSpace(line))
{
string[] values = line.Split(',');
if (values[9] == "Pen")
{
ISBN.Add(values[1]);
}
}
}
MessageBox.Show(ISBN.Count().ToString());
}
I am not able to compare it values if (values[9] == "Pen") because when i debug the code it says values[9] value is \"Pen\""
How do i get rid of the special characters.?
The problem here is that you're splitting the line every time you find , and leaving the data like that. For example, if this is the line you're reading in:
"A","B","C"
and you split it at commas, you'll get "A", "B", and "C" as your data. According to your description, you don't want quotes around the data.
To throw away quotes around a string:
Check if the leftmost character is ".
If so, check if the rightmost character is ".
If so, remove the leftmost and rightmost characters.
In pseudocode:
if (data.left(1) == "\"" && data.right(1) == "\"") {
data = data.trimleft(1).trimright(1)
}
At this point you might have a few questions (I'm not sure how much experience you have). If any of these apply to you, feel free to ask them, and I'll explain further.
What does "\"" mean?
How do I extract the leftmost/rightmost character of a string?
How do I extract the middle of a string?

Searching for text in a .txt

What would be the best way to search a text file that looks like this..?
efee|| Nbr| Address| Name |Phone|City|State|Zip abc
||455|gsgd |first last|gsg |fef |jk |0393 gjgj||jfj|ddg
|first last|fht |ree |hn |th ...more lines...
I started by reading in the file and all its contexts with a streamreader
I was thinking to count the "|" and grab the text between the 5th and 6th using substring but i'm not sure how to do the count of the "|". Or if someone has a better idea I'm open to it.
Tried something like this:
StreamReader file = new StreamReader(#"...");
string line;
int num=0;
while ((line = file.ReadLine()) != null)
{
for (int i = 1; i <= 6; i++)
{
if (line.Contains("|"))
{
num++;
}
}
int start = line.IndexOf("|");
int end = line.IndexOf("|");
string result = line.Substring(start, end - start - 1);
}
The text I want I beleive is always between the 5th and 6th "|"
You can do it like this:
var res = File
.ReadLines(#"FileName.txt")
.Select(line => line.Split(new[]{'|'}, StringSplitOptions.None)[5])
.ToList();
This produces a List<strings> from the file, where each string is the part of the corresponding line of the file taken from between the fifth and the sixth '|' separator.
For a delimited file you should use a parser - there is one in the Microsoft.VisualBasic.FileIO namespace - the TextFieldParser class, though you could also look at third-party libraries like the popular FileHelpers.
A simpler approach would be to use string.Split on the | character and getting the value in the corresponding index of the returned string[], however, if any of the fields are escaped and can validly contain | internally, this will fail.
You could split each line into an array:
while ((line = file.ReadLine()) != null)
{
var values = line.Split('|');
}
This should work
string txt = File.ReadAllText("file.txt");
string res = Regex.Match(txt, "\\|*?{5}(.+?)\\|", RegexOptions.Singleline).Result("$1");

Linebreak in WinForms Textbox

I'm having some trouble dealing with linebreaks here. Basically, the user puts in some data to a textbox that is then saved directly to the database.
When I display the data in my program, I only want to display the first line of the textbox, so I tried
Regex newLine = new Regex("/[\r\n]+/");
String[] lines = newLine.Split(row[data.textColumn.ColumnName].ToString());
SomeLabel.Text = lines[0];
But it display me all lines after another, so if the user puts in
a
b
c
The label displays
abc
How can I get this to work so it only displays the first line?
(I have added this in another answer because this answer is rather large and I think it will make this thread more clear - please leave a comment if I should make it one answer)
I have made this extension method which often have its uses:
public static IEnumerable<string> Lines(this string data)
{
using (var sr = new StringReader(data))
{
string line;
while ((line = sr.ReadLine()) != null)
yield return line;
}
}
And you can get the first line with:
var line = data.Lines().First();
This should be a lot faster than .Split when only a subset of the lines is used.
var data = row[data.textColumn.ColumnName].ToString();
And one of these (both work with unix and windows line-seperators). The first is fastest because it does not split every line when your only using the first.
int min = Math.Min(data.IndexOf("\r\n"), data.IndexOf("\n"));
string line;
if (min != -1)
line = data.Substring(0, min);
else
line = data;
or
var lines = data.Split(new[] { "\r\n", "\n" }, StringSplitOptions.None);
var line = lines[0];
(See also a few extension methods I have posted here: How can I convert a string with newlines in it to separate lines?)

Categories

Resources