I need to find a way to read information out of a very big CSV file with unity. The file is approx. 15000*4000 entries with almost 200MB and could even be longer.
Just using ReadAllLines on the file does kind of work but as soon as I try to do any operation on it, it will crash. Here is the code I am using just counting all non zero values which already crashes it. It's okay if the code might need loading time but it shouldn't crash. I assume it's because I save everything in the memory and therefore flood my RAM? Any ideas how to fix this that it won't crash?
private void readCSV()
{
string[] lines = File.ReadAllLines("Assets/Datasets/testCsv.csv");
foreach (string line in lines)
{
List<string> values = new List<string>();
values = line.Split(',').ToList();
int i = 0;
foreach (string val in values)
{
if (val != "0")
{
i++;
}
}
}
}
As I already stated in your other question you should rather go with a streamed solution in order to not load the entire thing into memory at all.
Also both FileIO as well as string.Split are slow especially for soany entries! Rather use a background thread / async Task for this!
The next future possible issue in your case 15000*4000 entries means a total of 60000000 cells. Which is still fine. However, the maximum value of int is 2147483647 so if your file grows further it might break / behave unexpected => rather use e.g. uint or directly ulong to avoid that issue.
private async Task<ulong> CountNonZeroEntries()
{
ulong count = 0;
// Using a stream reader you can load the content into memory one line at a time
using(var sr = new StreamReader("Assets/Datasets/testCsv.csv"))
{
while(true)
{
var line = await sr.ReadLineAsync();
if(line == null) break;
var values = line.Split(',');
foreach(var v in values)
{
if(v != "0") count++;
}
}
}
return count;
}
And then of course you would need to wait for the result e.g. using
// If you declare Start as asnyc Unity automatically calls it asynchronously
private async void Start()
{
var count = await CountNonZeroEntries();
Debug.Log($"{count} cells are != \"0\".");
}
The same can be done using Linq a bit easier to write in my eyes
using System.Linq;
...
private Task<ulong> CountNonZeroEntries()
{
return File.ReadLines("Assets/Datasets/testCsv.csv").Select(line => line.Split(',')).Count(v => v != "0");
}
Also File.ReadLines doesn't load the entire content at once but rather a lazy enumerable so you can use Linq queries on them one by one.
How can I ban a variable from a list without removing it from that list by adding the variable to a list of "banned" variable?
I wish to be able to type in a string. That string is compared to the file names in a folder. If there is a match, the file is read. If I type this same string again, the file should not be read again. There for I want to have a list of "banned" string that is checked whilst typing to avoid the file to be read again.
I have tried a few ways but not getting there. Below is an example of my last attempt.
What would be the best way?
public class test
{
string scl= "test3";
List <string> lsf,lso;
void Start ()
{
lsf=//file names
new List<string>();
lso=//files open
new List<string>();
lsf.Add("test0");
lsf.Add("test1");
lsf.Add("test2");
lsf.Add("test3");
lsf.Add("test4");
lso.Add("idhtk49fngo");//random string
}
void Update ()
{
if
(
Input.GetKeyDown("a")
)
{
for
(
int i=0;
i<lsf.Count;
i++
)
{
if(lsf[i]==scl)
{
Debug.Log
(i+" is read");
for
(
int j=0;
j<lso.Count;
j++
)
{
//how can i avoid reading
//lsf[3] here the second time
//"a" is pressed (by having "test3"
//added to a "ban" list (lso) )
if(scl!=lso[j])
{
lso.Add(lsf[i]);
}
}
}
}
}
}
Michael’s answer is the way to go here but it can be improved using the more appropriate collection available to keep track of opened files; if you want uniqueness use a set, not a list:
HashSet<string> openedFiles = new HashSet<string>();
public static bool TryFirstRead(
string path,
out string result)
{
if (openedFiles.Add(path))
{
result = File.ReadAllText(path);
return true;
}
result = null;
return false;
}
Also, I’d avoid throwing vexing exceptions. Give the consumer a friendly way to know if the file was read or not, don’t make them end up having to use exceptions as a flow control mechanism.
I didn't understand although if you want to replace a value from another list.
You can use the list index to create a new list with the values which you removed.
String list1 = {"hi", "hello", "World"};
String list2 = {"bye", "goodbye", "World"};
List1[1] = list2[1];
I would suggest such way:
public static List<string> openedFiles = new List<string>();
public static string ReadFileAndAddToOpenedList(string path)
{
if (openedFiles.Contains(path))
throw new Exception("File already opened");
// Instead of throwing exception you could for example just log this or do something else, like:
// Consolle.WriteLine("File already opened");
else
{
openedFiles.Add(path);
return File.ReadAllText(path);
}
}
The idea is - on every file read, add file to list, so you can check every time you try read file, if it was already read (or opened). If it is, throw exception (or do something else). Else read a file.
You could instead of making it a string list use your own class
public class MyFile
{
public string Name;
public bool isOpen;
public MyFile(string name)
{
Name = name;
isOpen = false;
}
}
List<MyFile> lsf = new List<MyFile>()
{
new MyFile("test0"),
new MyFile("test1"),
new MyFile("test2"),
new MyFile("test3"),
new MyFile("test4")
};
Than when you read the file set isOpen to true
MyFile[someIndex].isOpen = true;
and later you can check this
// E.g. skip in a loop
if(MyFile[someIndex]) continue;
You could than also use Linq in order to get a list of only unread files:
var unreadFiles = lsf.Select(f => f.Name).Where(file => !file.isOpen);
public class QuoteGenerator
{
public static randomQuote()
{
string t = "Quotes.txt";
List<string> Quotes = new List<string>();
using (StreamReader quoteReader = new StreamReader(t))
{
string line = "";
while ((line = quoteReader.ReadLine()) != null)
{
Quotes.Add(line);
}
}
string[] response = Quotes.ToArray();
string[] shuffle = Classes.RandomStringArrayTool.RandomizeStrings(response);
return (shuffle[0]);
}
}
Here's what's working and I thought my StreamReader code above would work the same:
public string randomQuote()
{
string[] response = new string[] {"The fortune you seek, is in another bot"
, "Someone has Googled you recently"
, "This fortune no good. Try another"
, "404 fortune not found"};
string[] shuffle = Classes.RandomStringArrayTool.RandomizeStrings(response);
return shuffle[0];
}
I need to return the first line of quote from the StreamReader Method, how come the code I put together doesn't seem to work? I've thought about hard-coding the quotes but maybe it's a good idea to save them in a text file. I guess I don't understand how using StreamReader work. Can anyone please explain, I've only been coding since July. Thank you!
Assuming your Quotes.txt file is in the bin directory the StreamReader code works fine. The only thing obvious is that you are not specifying a return type for the randomQuote method.
public static string randomQuote()
I want to get from this
"../lib/../data/myFile.xml"
to this
"../data/myFile.xml"
I guess I could do it by manipulating the string, searching for "../" and canceling them out with the preceding folders but I was looking for an already existing C# solution.
Tried instantiating an Uri from this string and going back toString(). Didn't help. It leaves the string unchanged.
You can always try to use:
Path.GetFullPath("../lib/../data/myFile.xml")
It behaves as you want with absolute paths but you might end up with strange behaviors with relative paths since it always bases itself from the current working directory. For instance:
Path.GetFullPath("/lib/../data/myFile.xml") // C:\data\myFile.xml
Path.GetFullPath("../lib/../data/myFile.xml") // C:\Program Files (x86)\data\myFile.xml
Sounds like you may either need to parse/rebuild the path yourself, or use some kind of well constructed regular expression to do this for you.
Taking the parse/rebuild route, you could do something like:
public static string NormalisePath(string path)
{
var components = path.Split(new Char[] {'/'});
var retval = new Stack<string>();
foreach (var bit in components)
{
if (bit == "..")
{
if (retval.Any())
{
var popped = retval.Pop();
if (popped == "..")
{
retval.Push(popped);
retval.Push(bit);
}
}
else
{
retval.Push(bit);
}
}
else
{
retval.Push(bit);
}
}
var final = retval.ToList();
final.Reverse();
return string.Join("/", final.ToArray());
}
(and yes, you'd probably want better variable names/commenting/etc.)
You can use a regular expression to do this:
public static string NormalisePath(string path)
{
return new Regex(#"\.{2}/.*/(?=\.\.)").Replace(path, "");
}
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What's the best way to import a CSV file into a strongly-typed data structure?
Microsoft's TextFieldParser is stable and follows RFC 4180 for CSV files. Don't be put off by the Microsoft.VisualBasic namespace; it's a standard component in the .NET Framework, just add a reference to the global Microsoft.VisualBasic assembly.
If you're compiling for Windows (as opposed to Mono) and don't anticipate having to parse "broken" (non-RFC-compliant) CSV files, then this would be the obvious choice, as it's free, unrestricted, stable, and actively supported, most of which cannot be said for FileHelpers.
See also: How to: Read From Comma-Delimited Text Files in Visual Basic for a VB code example.
Use an OleDB connection.
String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();
If you're expecting fairly complex scenarios for CSV parsing, don't even think up of rolling our own parser. There are a lot of excellent tools out there, like FileHelpers, or even ones from CodeProject.
The point is this is a fairly common problem and you could bet that a lot of software developers have already thought about and solved this problem.
I agree with #NotMyself. FileHelpers is well tested and handles all kinds of edge cases that you'll eventually have to deal with if you do it yourself. Take a look at what FileHelpers does and only write your own if you're absolutely sure that either (1) you will never need to handle the edge cases FileHelpers does, or (2) you love writing this kind of stuff and are going to be overjoyed when you have to parse stuff like this:
1,"Bill","Smith","Supervisor", "No Comment"
2 , 'Drake,' , 'O'Malley',"Janitor,
Oops, I'm not quoted and I'm on a new line!
Brian gives a nice solution for converting it to a strongly typed collection.
Most of the CSV parsing methods given don't take into account escaping fields or some of the other subtleties of CSV files (like trimming fields). Here is the code I personally use. It's a bit rough around the edges and has pretty much no error reporting.
public static IList<IList<string>> Parse(string content)
{
IList<IList<string>> records = new List<IList<string>>();
StringReader stringReader = new StringReader(content);
bool inQoutedString = false;
IList<string> record = new List<string>();
StringBuilder fieldBuilder = new StringBuilder();
while (stringReader.Peek() != -1)
{
char readChar = (char)stringReader.Read();
if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
{
// If it's a \r\n combo consume the \n part and throw it away.
if (readChar == '\r')
{
stringReader.Read();
}
if (inQoutedString)
{
if (readChar == '\r')
{
fieldBuilder.Append('\r');
}
fieldBuilder.Append('\n');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
records.Add(record);
record = new List<string>();
inQoutedString = false;
}
}
else if (fieldBuilder.Length == 0 && !inQoutedString)
{
if (char.IsWhiteSpace(readChar))
{
// Ignore leading whitespace
}
else if (readChar == '"')
{
inQoutedString = true;
}
else if (readChar == ',')
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
else
{
fieldBuilder.Append(readChar);
}
}
else if (readChar == ',')
{
if (inQoutedString)
{
fieldBuilder.Append(',');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
}
else if (readChar == '"')
{
if (inQoutedString)
{
if (stringReader.Peek() == '"')
{
stringReader.Read();
fieldBuilder.Append('"');
}
else
{
inQoutedString = false;
}
}
else
{
fieldBuilder.Append(readChar);
}
}
else
{
fieldBuilder.Append(readChar);
}
}
record.Add(fieldBuilder.ToString().TrimEnd());
records.Add(record);
return records;
}
Note that this doesn't handle the edge case of fields not being deliminated by double quotes, but meerley having a quoted string inside of it. See this post for a bit of a better expanation as well as some links to some proper libraries.
I was bored so i modified some stuff i wrote. It try's to encapsulate the parsing in an OO manner whle cutting down on the amount of iterations through the file, it only iterates once at the top foreach.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// usage:
// note this wont run as getting streams is not Implemented
// but will get you started
CSVFileParser fileParser = new CSVFileParser();
// TO Do: configure fileparser
PersonParser personParser = new PersonParser(fileParser);
List<Person> persons = new List<Person>();
// if the file is large and there is a good way to limit
// without having to reparse the whole file you can use a
// linq query if you desire
foreach (Person person in personParser.GetPersons())
{
persons.Add(person);
}
// now we have a list of Person objects
}
}
public abstract class CSVParser
{
protected String[] deliniators = { "," };
protected internal IEnumerable<String[]> GetRecords()
{
Stream stream = GetStream();
StreamReader reader = new StreamReader(stream);
String[] aRecord;
while (!reader.EndOfStream)
{
aRecord = reader.ReadLine().Split(deliniators,
StringSplitOptions.None);
yield return aRecord;
}
}
protected abstract Stream GetStream();
}
public class CSVFileParser : CSVParser
{
// to do: add logic to get a stream from a file
protected override Stream GetStream()
{
throw new NotImplementedException();
}
}
public class CSVWebParser : CSVParser
{
// to do: add logic to get a stream from a web request
protected override Stream GetStream()
{
throw new NotImplementedException();
}
}
public class Person
{
public String Name { get; set; }
public String Address { get; set; }
public DateTime DOB { get; set; }
}
public class PersonParser
{
public PersonParser(CSVParser parser)
{
this.Parser = parser;
}
public CSVParser Parser { get; set; }
public IEnumerable<Person> GetPersons()
{
foreach (String[] record in this.Parser.GetRecords())
{
yield return new Person()
{
Name = record[0],
Address = record[1],
DOB = DateTime.Parse(record[2]),
};
}
}
}
}
There are two articles on CodeProject that provide code for a solution, one that uses StreamReader and one that imports CSV data using the Microsoft Text Driver.
A good simple way to do it is to open the file, and read each line into an array, linked list, data-structure-of-your-choice. Be careful about handling the first line though.
This may be over your head, but there seems to be a direct way to access them as well using a connection string.
Why not try using Python instead of C# or VB? It has a nice CSV module to import that does all the heavy lifting for you.
I had to use a CSV parser in .NET for a project this summer and settled on the Microsoft Jet Text Driver. You specify a folder using a connection string, then query a file using a SQL Select statement. You can specify strong types using a schema.ini file. I didn't do this at first, but then I was getting bad results where the type of the data wasn't immediately apparent, such as IP numbers or an entry like "XYQ 3.9 SP1".
One limitation I ran into is that it cannot handle column names above 64 characters; it truncates. This shouldn't be a problem, except I was dealing with very poorly designed input data. It returns an ADO.NET DataSet.
This was the best solution I found. I would be wary of rolling my own CSV parser, since I would probably miss some of the end cases, and I didn't find any other free CSV parsing packages for .NET out there.
EDIT: Also, there can only be one schema.ini file per directory, so I dynamically appended to it to strongly type the needed columns. It will only strongly-type the columns specified, and infer for any unspecified field. I really appreciated this, as I was dealing with importing a fluid 70+ column CSV and didn't want to specify each column, only the misbehaving ones.
I typed in some code. The result in the datagridviewer looked good. It parses a single line of text to an arraylist of objects.
enum quotestatus
{
none,
firstquote,
secondquote
}
public static System.Collections.ArrayList Parse(string line,string delimiter)
{
System.Collections.ArrayList ar = new System.Collections.ArrayList();
StringBuilder field = new StringBuilder();
quotestatus status = quotestatus.none;
foreach (char ch in line.ToCharArray())
{
string chOmsch = "char";
if (ch == Convert.ToChar(delimiter))
{
if (status== quotestatus.firstquote)
{
chOmsch = "char";
}
else
{
chOmsch = "delimiter";
}
}
if (ch == Convert.ToChar(34))
{
chOmsch = "quotes";
if (status == quotestatus.firstquote)
{
status = quotestatus.secondquote;
}
if (status == quotestatus.none )
{
status = quotestatus.firstquote;
}
}
switch (chOmsch)
{
case "char":
field.Append(ch);
break;
case "delimiter":
ar.Add(field.ToString());
field.Clear();
break;
case "quotes":
if (status==quotestatus.firstquote)
{
field.Clear();
}
if (status== quotestatus.secondquote)
{
status =quotestatus.none;
}
break;
}
}
if (field.Length != 0)
{
ar.Add(field.ToString());
}
return ar;
}
If you can guarantee that there are no commas in the data, then the simplest way would probably be to use String.split.
For example:
String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);
There may be libraries you could use to help, but that's probably as simple as you can get. Just make sure you can't have commas in the data, otherwise you will need to parse it better.