I love LINQ statements for the expressive syntax and other convenient features. However, I find it very troublesome to debug them sometimes. Specifically, when I run a LINQ statement on a collection and one of the elements in the collection causes an exception, how can I figure out what the problem input was and where the problem came from?
Imagine I have a text file with 1000 real numbers:
0.46578
12.314213
1.444876
...
I am reading this as a List<string> and loading it into a more specific data structure:
var file_contents = File.ReadAllLines("myfile.txt");
var data = file_contents.Select(s => double.Parse(s));
Now, for this particular input, I didn't bother to look at it carefully and it turns out the 876th line contains (line numbers shown):
875 5.56786450
876 Error: Could not calculate value.
878 0.0316213
For whatever reason (perhaps the file was generated by a script that malfunctioned). My LINQ method chain will of course throw an exception. The problem is, how do I figure which element of the list caused the exception, and what its value was?
To clarify, if instead I used a for-loop:
var data = new List<double>();
foreach(var row in file_contents)
{
var d = double.Parse(row);
data.Add(d);
}
Then the exception would highlight the string which calls double.Parse, and I would be able to mouse over row to easily see what the problem input was.
I can, of course, use Resharper to convert my LINQ statements into for-loops, and then debug them, but is there a better way?
Put a conditional breakpoint on the lambda function, where the condition is s.StartsWith("5.56"). You just need to have your cursor on the lambda and press F9. Assuming you're using visual studio.
var data = file_contents.Select(s => {
try
{
return double.Parse(s);
}
catch
{
throw; //breakpoint?
}
});
Disclaimer: I work for OzCode
LINQ debugging is hard borderline impossible using Visual Studio. I suggest you try using OzCode.
This is what your code looks when debugging (the exception in on the 6th item).
You can tell which item caused the exception by investigating the items that where passed to the Select clause - and since the last one triggered the exception - it's easy to find the offending value.
If you're interested you can try OzCode's LINQ debugging - we've just started an EAP
I would just use a tryparse personally.
var data = new List<string>
{
"0.46578",
"12.314213",
"Error: Could not calculate value.",
"1.444876",
};
double d;
var good = data.Where(s => Double.TryParse(s, out d)).Select(Double.Parse);
var bad = data.Where(s => !Double.TryParse(s, out d)).Select(x => new
{
key = data.IndexOf(x),
value = x
}).ToDictionary(x => x.key, x => x.value);
textBox1.AppendTextAddNewLine("Good Data:");
WriteDataToTextBox(good);
textBox1.AppendTextAddNewLine(String.Format("{0}{0}Bad Data:", Environment.NewLine));
WriteDataToTextBox(bad);
The AppendTextAddNewLine is simply an extension method I wrote for my little proof of concept test program
public static void AppendTextAddNewLine(this TextBox textBox, string textToAppend)
{
textBox.AppendText(textToAppend + Environment.NewLine);
}
Edit
The WriteDataToTextbox is a generic method that writes an IEnumerble<T> out to the text box.
void WriteDataToTextBox<T>(IEnumerable<T> data )
{
foreach (var row in data)
{
textBox1.AppendTextAddNewLine(row.ToString());
}
}
Forgot to put the output here so I figure I should do that. It shows the index of the bad data and the data itself that caused the problem.
Good Data:
0.46578
12.314213
1.444876
Bad Data:
[2, Error: Could not calculate value.]
I'm not sure why you don't like foreach loop here. LINQ uses it internally anyway, and as you've already realized there are some pros and cons of using LINQ and debugging is one of cons.
I would probably mix LINQ with foreach and end up with following:
// read all lines from file //
var file_contents = File.ReadAllLines("myfile.txt");
// set initial data list length to number of lines for better performance
var data = new List<double>(file_contents.Length);
// list for incorrect line numbers
var incorrectRows = new List<int>();
foreach (var x in file_contents.Select((s, i) => new {s, i}))
{
// x.s - line string
// x.i - line number
double value;
if (double.TryParse(x.s, out value))
data.Add(value); // add value, which was OK
else
incorrectRows.Add(x.i); // add index of incorrect value
}
That will prevent an exception at all and will give you line numbers for all incorrect values. It also iterate over file_contents just once and every value is being parsed only once.
Related
I am getting compiler error: 'System.Collections.Generic.List' to 'System.Xml.Linq.XName'.
I was orgininally getting an 'XAttribute' does not contain a definition for 'Trim' and no accessible extension method 'Trim' ...etc.' but I think I figured it out that my quotes were in the wrong place.
What am I doing wrong?
public static List<Phrase> LoadPhrasesFromXMLFile(string file)
{
try
{
XDocument xdocument = XDocument.Load(file);
char[] trim = new char[3] { '\'', '"', ' ' };
return xdocument.Descendants("Phrase").Select((Func<XElement, Phrase>)(x => new Phrase()
{
eventname = (string)x.Attribute("Event".Trim(trim)),
priority = int.Parse((string)x.Attribute("Priority".Trim(trim))),
words = x.Descendants("Word").Select((Func<XElement, Word>)(y =>
{
Word word1 = new Word
{
preferred_text = (string)y.Attribute("Primaries".Trim(trim).ToLower())
};
List<string> stringList = (string)y.Attribute("Secondaries") == null || string.IsNullOrWhiteSpace((string)y.Attribute("Secondaries"))
? new List<string>()
Fails at this line:
: (List<string>)(IEnumerable<string>)(string)y.Attribute("Secondaries".Trim(trim).Replace(" ", "").ToLower().Split(',').ToList());
Cont code:
Word word2 = word1;
word2.Ssecondaries = stringList;
return word1;
})).ToList<Word>()
})).ToList<Phrase>();
}
Error catching:
catch (Exception ex)
{
Sup.Logger("Encountered an exception reading '" + file + "'. It was: " + ex.ToString(), false, true);
}
return (List<Phrase>)null;
}
Welcome to StackOverflow!
First off, consider the first comment in terms of cleaning up some of the general style. The code is very hard to read as well as the question having multiple split code blocks.
The problematic line's syntax error is solved by changing it to the following (there is
y.Attribute("Secondaries").Value.Trim(trim).Replace(" ", "").ToLower().Split(',').ToList())
You don't need to do any casting as ToList() will already make it a List.
That is the end of the exact compiler issue.
In terms of how to make cleaner code, consider making the helper functions:
// move 'trim' into an accessible memory location
private string SanitizeInput (string input)
{
return input.Trim(trim).Replace(" ", "").ToLower();
}
// Having a function like this will change your solution code from the line above to:
SanitizeInput(y.Attributes("Secondaries).Value).Split(',').ToList();
// This line is much easier to read as you can tell that the XML parsing is happening, being cleaned, and then manipulated.
Another thing to consider, Word.Ssecondaries (it looks like you might have a typo in your parameter name?) is to see if that property can be set to IEnumerable. It'd dangerous to have it be stored as a List due to the potential for any code to change Word.Secondaries. If you don't intend on changing it, IEnumerable will be much safer.
If you find IEnumerable satisfies your needs, you can remove the .ToList() in your problematic line and avoid having to allocate a new chunk of memory for your list as well as having faster code with lazily evaluated queries from LINQ
I dont know why this I'm getting System.IndexOutOfRangeException: 'Index was outside the bounds of the array.' with this code
IEnumerable<char> query = "Text result";
string illegals = "abcet";
for (int i = 0; i < illegals.Length; i++)
{
query = query.Where(c => c != illegals[i]);
}
foreach (var item in query)
{
Console.Write(item);
}
Please can someone explain what's wrong with my code.
The problem is that your lambda expression is capturing the variable i, but the delegate isn't being executed until after the loop. By the time the expression c != illegals[i] is executed, i is illegals.Length, because that's the final value of i. It's important to understand that lambda expressions capture variables, rather than "the values of those variables at the point of the lambda expression being converted into a delegate".
Here are five ways of fixing your code:
Option 1: local copy of i
Copy the value of i into a local variable within the loop, so that each iteration of the loop captures a new variable in the lambda expression. That new variable isn't changed by the rest of the execution of the loop.
for (int i = 0; i < illegals.Length; i++)
{
int copy = i;
query = query.Where(c => c != illegals[copy]);
}
Option 2: extract illegals[i] outside the lambda expression
Extract the value of illegals[i] in the loop (outside the lambda expression) and use that value in the lambda expression. Again, the changing value of i doesn't affect the variable.
for (int i = 0; i < illegals.Length; i++)
{
char illegal = illegals[i];
query = query.Where(c => c != illegal);
}
Option 3: use a foreach loop
This option only works properly with C# 5 and later compilers, as the meaning of foreach changed (for the better) in C# 5.
foreach (char illegal in illegals)
{
query = query.Where(c => c != illegal);
}
Option 4: use Except once
LINQ provides a method to perform set exclusion: Except. This is not quite the same as the earlier options though, as you'll only get a single copy of any particular character in your output. So if e wasn't in illegals, you'd get a result of "Tex resul" with the above options, but "Tex rsul" using Except. Still, it's worth knowing about:
// Replace the loop entirely with this
query = query.Except(illegals);
Option 5: Use Contains once
You can call Where once, with a lambda expression that calls Contains:
// Replace the loop entirely with this
query = query.Where(c => !illegals.Contains(c));
This happens because, although your for loop seems at first glance to be correctly bounded, each iteration captures the index in the closure that is passed to Where. one of the most useful properties of closures is that they capture by reference, enabling all sorts of powerful and sophisticated techniques. However, in this case it means that, by the time the query is executed in the ensuing foreach loop. The index has been incremented past the length of the array.
The most straightforward change to fix this is create a loop scoped copy the current value of the index loop control variable and refer to this in your closure instead of referring directly to the loop control variable.
Ex:
for (int i = 0; i < illegals.Length; i++)
{
var index = i;
query = query.Where(c => c != illegals[index]);
}
However, as has been noted by others, there are better ways to write this that void the problem entirely and they also have the virtue that they raise the level of abstraction.
For example, you can use System.Linq.Enumerable.Except
var legals = query.Except(illegals);
I write static method wich read scientific numbers(X,Y) from text file and put it to List of list. But i dont know why the next value from file override all other values.
IF != 100 - 100 is first value of text file and its only property for my program.
static List<List<double>> DownloadData(string path1)
{
List<List<double>> lista = new List<List<double>>();
List<double> doubelowa = new List<double>();
doubelowa.Clear();
string line = null;
try
{
using (TextReader sr = File.OpenText(path1))
{
while ((line = sr.ReadLine()) != null)
{
doubelowa.Clear();
if (line != "100")
{
var d = line.Split().Select(f => double.Parse(f, System.Globalization.NumberStyles.Float, CultureInfo.InvariantCulture));
doubelowa.AddRange(d);
lista.Add(doubelowa);
}
}
}
}
finally
{
}
return lista;
}
Before i write this method and its work great. But now when i write more and more code i dont know what changed. I try fix it but...
Its screen with locals:
https://onedrive.live.com/redir?resid=DF3242C9A565ECD1!4549&authkey=!AEDu90t1iNQj4MY&v=3&ithint=photo%2cpng
For some reason the double.clear() clear the value of list Lista. Why?
That's because you are adding the same object over and over. If you want different Lists to be stored, you need to use a new List on every iteration:
if (line != "100")
{
var d = line.Split().Select(f => double.Parse(f, System.Globalization.NumberStyles.Float, CultureInfo.InvariantCulture));
lista.Add(new List<double>(d));
}
If you add doubelowa, you are just adding the same reference over and over (and you are overwriting it on every iteration)
After your edit with the screenshot
Just in case the answer was not clear to you... when you add doublelowa to lista, you are just adding the same list every time.
So lista just keeps having the same object on every element:
lista[0] points to doublelowa
lista[1] points to doublelowa
lista[2] points to doublelowa
etc.
So if you clear doublelowa at any point, all elements of lista will point to the same, empty list. The solution, as I wrote above, is having each element be a different list, not doublelowa, which can be achieved with the code I wrote (and you can disregard doublelowa completely since it's not needed anymore).
What I believe is happening is that you only ever make one List object, which you make lista point to over and over again. That way, by changing doubelowa changes them all, because they are all actually the same object. To correct this, try replacing doubelowa.Clear(); with doubelowa = new List<double>();.
Jcl thank you for your explanation. I fix it with lucky try as you can se below. But without you i dont think that about references.
This happens probably because the reference is to every cell and while loop is completed Garbage Collector not cleaning up? Please if you want.explain me this in the detail.
if (line != "100")
{
List<double> doubelowa = new List<double>();
doubelowa.AddRange(line.Split().Select(f => double.Parse(f, System.Globalization.NumberStyles.Float, CultureInfo.InvariantCulture)));
lista.Add(doubelowa);
}
you can also create new list by ToList method of IEnumerator after splitting the line.
concise your code by using Linq methods.
List> r=File.ReadAllLines(fileName).SkipWhile((line) => line=="100")
.Select((line){
line.Split().Select(f => double.Parse(f, System.Globalization.NumberStyles.Float, CultureInfo.InvariantCulture)).ToList()).ToList();
Basically I use Entity Framework to query a huge database. I want to return a string list then log it to a text file.
List<string> logFilePathFileName = new List<string>();
var query = from c in DBContext.MyTable where condition = something select c;
foreach (var result in query)
{
filePath = result.FilePath;
fileName = result.FileName;
string temp = filePath + "." + fileName;
logFilePathFileName.Add(temp);
if(logFilePathFileName.Count %1000 ==0)
Console.WriteLine(temp+"."+logFilePathFileName.Count);
}
However I got an exception when logFilePathFileName.Count=397000.
The exception is:
Exception of type 'System.OutOfMemoryException' was thrown.
A first chance exception of type 'System.OutOfMemoryException'
occurred in System.Data.Entity.dll
UPDATE:
What I want to use a different query say: select top 1000 then add to the list, but I don't know after 1000 then what?
Most probabbly it's not about a RAM as is, so increasing your RAM or even compiling and running your code in 64 bit machine will not have a positive effect, in this case.
I think it's related to a fact that .NET collections are limited to maximum 2GB RAM space (no difference either 32 or 64 bit).
To resolve this, split your list to much smaller chunks and most probabbly your problem will gone.
Just one possible solution:
foreach (var result in query)
{
....
if(logFilePathFileName.Count %1000 ==0) {
Console.WriteLine(temp+"."+logFilePathFileName.Count);
//WRITE SOMEWHERE YOU NEED
logFilePathFileName = new List<string>(); //RESET LIST !|
}
}
EDIT
If you want fragment a query, you can use Skip(...) and Take(...)
Just an explanatory example:
var fisrt1000 = query.Skip(0).Take(1000);
var second1000 = query.Skip(1000).Take(1000);
...
and so on..
Naturally put it in your iteration and parametrize it based on bounds of data you know or need.
Why are you collecting the data in a List<string> if all you need to do is write it to a text file?
You might as well just:
Open the text file;
Iterate over the records, appending each string to the text file (without storing the strings in memory);
Flush and close the text file.
You will need far less memory than now, because you won't be keeping all those strings unnecessarily in memory.
You probably need to set some vmargs for memory!
Also... look into writing it straight to your file and not holding it in a List
What Roy Dictus says sounds the best way.
Also you can try to add a limit to your query. So your database result won't be so large.
For info on:
Limiting query size with entity framework
You shouldn't read all records from database to list. It required a lot of memory. You an combine reading records and writing them to file. For example read 1000 records from db to list and save(append) them to text file, clear used memory (list.Clear()) and continue with new records.
From several other topics on StackOverflow I read that the Entity Framework is not designed to handle bulk data like that. The EF will cache/track all data in the context and will cause the exception in cases of huge bulks of data. Options are to use SQL directly or split up your records in smaller sets.
I used to use the gc arraylist in VS c++ similar to the gc List that you used, to works fin with small and intermediate data sets, but when using Big Dat, same problem 'System.OutOfMemoryException' was thrown.
As the size of these gcs cannot exceed 2 GB and therefore become inefficient with Big data, I built my own linked list, which gives the same functionality, dynamic increase and get by index, basically, it is a normal linked list class, with a dynamic array inside to provide getting data by index, it duplicates the space, but you may delete the linked list after updating the array is you do not need it keeping only the dynamic array, this would solve the problem. see the code:
struct LinkedNode
{
long data;
LinkedNode* next;
};
class LinkedList
{
public:
LinkedList();
~LinkedList();
LinkedNode* head;
long Count;
long * Data;
void add(long data);
void update();
//long get(long index);
};
LinkedList::LinkedList(){
this->Count = 0;
this->head = NULL;
}
LinkedList::~LinkedList(){
LinkedNode * temp;
while(head){
temp= this->head ;
head = head->next;
delete temp;
}
if (Data)
delete [] Data; Data=NULL;
}
void LinkedList::add (long data){
LinkedNode * node = new LinkedNode();
node->data = data;
node->next = this->head;
this->head = node;
this->Count++;}
void LinkedList::update(){
this->Data= new long[this->Count];
long i = 0;
LinkedNode * node =this->head;
while(node){
this->Data[i]=node->data;
node = node->next;
i++;
}
}
If you use this, please refer to my work https://www.liebertpub.com/doi/10.1089/big.2018.0064
I need for example the number of list-items, that are NOT "".
ATM, I solve it like this:
public int getRealCount()
{
List<string> all = new List<string>(originList);
int maxall = all.Count;
try
{
for (int i = 0; i < maxall; i++)
{
all.Remove("");
}
}
catch { }
return all.Count;
}
No question, performance is pretty bad. I'm lucky it's just a 10-items-list, but on a phone you should avoid such code.
So my question is, how can I improve this code?
One idea was: there could already be a method for that. The econd method would be: that all could be filled with only the items that are not "".
How should I solve this?
Thanks
Sounds like you want:
return originList.Count(x => x != "");
There's no need to create a copy of the collection at all. Note that you'll need using System.Linq; in your using directives at the start of your source code.
(Note that you should not have empty catch blocks like that - it's a terrible idea to suppress exceptions in that way. Only catch exceptions when you either want to really handle them or when you want to rethrow them wrapped as another type. If you must ignore an exception, you should at least log it somewhere.)
If performance is your concern, then you should keep a collection that is only for these items.
If performance is not a big deal, I would suggest you use a Linq query on your collection. The cool thing about Linq is that the search is delayed until you need it.
int nonEmptyItemCount = originList.Count(str => !string.IsNullOrEmpty(str));
You could also do
int nonEmptyItemCount = originList.Count(str => str != "");
You should use LINQ. Install ReSharper, it'll generate it for you.
Also, don't create an int maxall = all.Count and then use it in your for loop.
For mobile apps you shouldn't use unnecessary memory so just use all.Count in the for loop.
You're calling all.remove("") for every item in the list all. Why not just call it once? You're not using i at all in your code...
Why not:
public int getRealCount()
{
List<string> all = new List<string>(originList);
int erased =all.RemoveAll(delegate(string s)
{
return s == "";
});
return all.Count - erased;
}
Update:
Fixed the issue I had. This is without lambda's.