Custom functionality similar to SqlDataReader.Read() - c#

I would like to create a functionality that works similar to SqlDataReader.Read()
I'm reading a flat-file from .txt/.csv and returning it as a datatable to my class handling business logic. This iterates through the rows of the datatable, and transforms the data, writing into a structured database. I use this structure for multiple import sources.
Large files though, work really, really slowly. It is taking me 2h to go through 30 MB of data, and I would like to get this down to 30 min. One step in this direction is to not read the entire file into a DataTable, but handle it line by line, and keep memory from getting klogged.
Something like this would be ideal: PSEUDOCODE.
FlatFileReader ffr = new FlatFileReader(); //Set FlatFileParameters
while(ffr.ReadRow(out DataTable parsedFlatFileRow))
{
//...Business Logic for handling the parsedFlatFileRow
}
How can I implement a method that works like .ReadRow(out DataTable parsedFlatFileRow) ?
Is this the right direction?
foreach(obj in ff.lazyreading()){
//Business Logic
}
...
class FlatFileWrapper{
public IEnumerable<obj> lazyreading(){
while(FileReader.ReadLine()){
yield return parsedFileLine;
}
}
}

As Tim already mentioned, File.ReadLines is what you need:
"When you use ReadLines, you can start enumerating the collection of
strings before the whole collection is returned"
You can create a parser that uses that method, something like this:
// object you want to create from the file lines.
public class Foo
{
// add properties here....
}
// Parser only responsibility is create the objects.
public class FooParser
{
public IEnumerable<Foo> ParseFile(string filename)
{
if(!File.Exists(filename))
throw new FileNotFoundException("Could not find file to parse", filename);
foreach(string line in File.ReadLines(filename))
{
Foo foo = CreateFoo(line);
yield return foo;
}
}
private Foo CreateFoo(string line)
{
// parse line/create instance of Foo here
return new Foo {
// ......
};
}
}
Using the code:
var parser = new FooParser();
foreach (Foo foo in parser.ParseFile(filename))
{
//...Business Logic for handling the parsedFlatFileRow
}

You can use File.ReadLines which works similar to a StreamReader:
foreach(string line in File.ReadLines(path))
{
//...Business Logic for handling the parsedFlatFileRow
}

Related

Use variable value to check if particular instance exists, if not - create it

I have a list of images like this:
public List<Image> imageList = new List<Image>();
I also have a picture class in order to collect and manipulate data about the images in the list:
public Class Pic {
// properties and stuff
}
And then I have a function that takes an integer as an argument. That integer corresponds to an image in the image list. What I want to do in the function is to check if an instance of the Pic class has been created for that particular image. If not, I want to create it, using the value of the variable passed into the function. The following code obviously doesn't work, but it shows what I want:
public void doStuffWithImage(int picNumber) {
// Check if instance called pic + picNumber exists
if(pic + picNumber.toString() == null) {
// Create an instance
Pic pic + picNumber.toString() = new Pic();
}
}
Suggestions on how to accomplish this?
It seems like you're trying to create individual variables pic1, pic2, etc. you'd be better off using a dictionary:
Dictionary<int, Pic> pics = new Dictionary<int, Pic>();
public void doStuffWithImage(int picNumber) {
// Check if instance called pic + picNumber exists
if(!pics.ContainsKey(picNumber)) {
// Create an instance
pics[picNumber] = new Pic();
}
}
You need to create a "registry" of known Pics. DIctionary<int,Pic> would be good collection to hold this registry. You need to store the registry itself somewhere - perhaps in the "factory" object that registers your pictures.
class PicFactory {
private readonly IDictionary<int,Pic> knownPics = new Dictionary<int,Pic>();
public Pic GetOrCreate(int id) {
Pic res;
if (knownPics.TryGetValue(id, out res)) {
return res;
}
res = new Pic(id.ToString()); // This assumes that Pic(string) exists
knownPics.Add(id, res);
return res;
}
}
This way of implementing a registry may be too primitive for your purposes - for example, if you need your registry to be concurrent, you would need to set up some sort if a locking scheme to protect the knownPics dictionary. The class that accesses pictures would need to create an instance of PicFactory, so that it could access pictures through the GetOrCreate(id) call.
If you are using .net 4.0 or more you can use Lazy type which:
Provides support for lazy initialization.
Which means that the object will be constructed not in the moment of declaration, but when first accessed.
So you can basically declare them like
List<Lazy<Pic>> ....
See Lazy<T> and the Lazy Loading Pattern in general - this is actually a common optimization technique as it defers what can add up to a lot at startup to microdelays during runtime.
Be wary about making sure the microdelays are worth it, and I advise leaving methods about which can force loading.
If you're grabbing from a list, preface with a .Any or .Contains check, and since you're looking up by name like that, consider using a Dictionary instead

Keeping track of user customization's c#

Good evening; I have an application that has a drop down list; This drop down list is meant to be a list of commonly visited websites which can be altered by the user.
My question is how can I store these values in such a manor that would allow the users to change it.
Example; I as the user, decide i want google to be my first website, and youtube to be my second.
I have considered making a "settings" file however is it practical to put 20+ websites into a settings file and then load them at startup? Or a local database, but this may be overkill for the simple need.
Please point me in the right direction.
Given you have already excluded database (probably for right reasons.. as it may be over kill for a small app), I'd recommend writing the data to a local file.. but not plain text..
But preferably serialized either as XML or JSON.
This approach has at least two benefits -
More complex data can be stored in future.. example - while order can be implicit, it can be made explicit.. or additional data like last time the url was used etc..
Structured data is easier to validate against random corruption.. If it was a plain text file.. It will be much harder to ensure its integrity.
The best would be to use the power of Serializer and Deserializer in c#, which will let you work with the file in an Object Oriented. At the same time you don't need to worry about storing into files etc... etc...
Here is the sample code I quickly wrote for you.
using System;
using System.IO;
using System.Collections;
using System.Xml.Serialization;
namespace ConsoleApplication3
{
public class UrlSerializer
{
private static void Write(string filename)
{
URLCollection urls = new URLCollection();
urls.Add(new Url { Address = "http://www.google.com", Order = 1 });
urls.Add(new Url { Address = "http://www.yahoo.com", Order = 2 });
XmlSerializer x = new XmlSerializer(typeof(URLCollection));
TextWriter writer = new StreamWriter(filename);
x.Serialize(writer, urls);
}
private static URLCollection Read(string filename)
{
var x = new XmlSerializer(typeof(URLCollection));
TextReader reader = new StreamReader(filename);
var urls = (URLCollection)x.Deserialize(reader);
return urls;
}
}
public class URLCollection : ICollection
{
public string CollectionName;
private ArrayList _urls = new ArrayList();
public Url this[int index]
{
get { return (Url)_urls[index]; }
}
public void CopyTo(Array a, int index)
{
_urls.CopyTo(a, index);
}
public int Count
{
get { return _urls.Count; }
}
public object SyncRoot
{
get { return this; }
}
public bool IsSynchronized
{
get { return false; }
}
public IEnumerator GetEnumerator()
{
return _urls.GetEnumerator();
}
public void Add(Url url)
{
if (url == null) throw new ArgumentNullException("url");
_urls.Add(url);
}
}
}
You clearly need some sort of persistence, for which there are a few options:
Local database
- As you have noted, total overkill. You are just storing a list, not relational data
Simple text file
- Pretty easy, but maybe not the most "professional" way. Using XML serialization to this file would allow for complex data types.
Settings file
- Are these preferences really settings? If they are, then this makes sense.
The Registry - This is great for settings you don't want your users to ever manually mess with. Probably not the best option for a significant amount of data though
I would go with number 2. It doesn't sound like you need any fancy encoding or security, so just store everything in a text file. *.ini files tend to meet this description, but you can use any extension you want. A settings file doesn't seem like the right place for this scenario.

How to read data from multiple (multi language) resource files?

I am trying the multi language features in an application. I have created the resource files GlobalTexts.en-EN.resx GlobalTexts.fr-FR.resx and a class that sets the culture and returns the texts like (I will not go in all the details, just show the structure):
public class Multilanguage
{
...
_res_man_global = new ResourceManager("GlobalResources.Resources.GlobalTexts", System.Reflection.Assembly.GetExecutingAssembly());
...
public virtual string GetText(string _key)
{
return = _res_man_global.GetString(_key, _culture);
}
}
...
Multilanguage _translations = new Multilanguage();
...
someText = _translations.GetText(_someKey);
...
This works just fine.
Now, I would like to use this application in another solution that basically extends it (more windows etc.) which also has resource files ExtendedTexts.en-En.resx ExtendedTexts.fr-FR.resx and a new class like:
public class ExtendedMultilanguage : Multilanguage
{
...
_res_man_local = new ResourceManager("ExtendedResources.Resources.ExtendedTexts", System.Reflection.Assembly.GetExecutingAssembly());
...
public override string GetText(string _key)
{
string _result;
try
{
_result = _res_man_local.GetString(_key, _culture);
}
catch (Exception ex)
{
_result = base.GetText(_key);
}
}
...
ExtendedMultilanguage _translations = new Multilanguage();
...
someText = _translations.GetText(_someKey);
...
the idea being that if the key is not found in ExtendedTexts the method will call the base class which is looking into GlobalTexts. I did this in order to use the call GetText(wantedKey) everywhere in the code without having to care about the location of the resource (I do not want to include the translations from the extensions in the GlobalTexts files); it is juts the used class that is different from project to project.
The problem I am facing is that the try/catch is very slow when exceptions raise- I wait seconds for one window to populate. I tested with direct call and works much faster, but then I need to care all the time where the resource is located...
The question is: is there an alternative way of doing this (having resources spread in various files and have only one method that gives the desired resource without throwing an error)?
In the end I took a workaround solution and loaded all the content of the resource files in dictionaries. This way I can use ContainsKey and see if the key exists or not.

Adding AsParallel() call cause my code to break on writing a file

I'm building a console application that have to process a bunch of document.
To stay simple, the process is :
for each year between X and Y, query the DB to get a list of document reference to process
for each of this reference, process a local file
The process method is, I think, independent and should be parallelized as soon as input args are different :
private static bool ProcessDocument(
DocumentsDataset.DocumentsRow d,
string langCode
)
{
try
{
var htmFileName = d.UniqueDocRef.Trim() + langCode + ".htm";
var htmFullPath = Path.Combine("x:\path", htmFileName;
missingHtmlFile = !File.Exists(htmFullPath);
if (!missingHtmlFile)
{
var html = File.ReadAllText(htmFullPath);
// ProcessHtml is quite long : it use a regex search for a list of reference
// which are other documents, then sends the result to a custom WS
ProcessHtml(ref html);
File.WriteAllText(htmFullPath, html);
}
return true;
}
catch (Exception exc)
{
Trace.TraceError("{0,8}Fail processing {1} : {2}","[FATAL]", d.UniqueDocRef, exc.ToString());
return false;
}
}
In order to enumerate my document, I have this method :
private static IEnumerable<DocumentsDataset.DocumentsRow> EnumerateDocuments()
{
return Enumerable.Range(1990, 2020 - 1990).AsParallel().SelectMany(year => {
return Document.FindAll((short)year).Documents;
});
}
Document is a business class that wrap the retrieval of documents. The output of this method is a typed dataset (I'm returning the Documents table). The method is waiting for a year and I'm sure a document can't be returned by more than one year (year is part of the key actually).
Note the use of AsParallel() here, but I never got issue with this one.
Now, my main method is :
var documents = EnumerateDocuments();
var result = documents.Select(d => {
bool success = true;
foreach (var langCode in new string[] { "-e","-f" })
{
success &= ProcessDocument(d, langCode);
}
return new {
d.UniqueDocRef,
success
};
});
using (var sw = File.CreateText("summary.csv"))
{
sw.WriteLine("Level;UniqueDocRef");
foreach (var item in result)
{
string level;
if (!item.success) level = "[ERROR]";
else level = "[OK]";
sw.WriteLine(
"{0};{1}",
level,
item.UniqueDocRef
);
//sw.WriteLine(item);
}
}
This method works as expected under this form. However, if I replace
var documents = EnumerateDocuments();
by
var documents = EnumerateDocuments().AsParrallel();
It stops to work, and I don't understand why.
The error appears exactly here (in my process method):
File.WriteAllText(htmFullPath, html);
It tells me that the file is already opened by another program.
I don't understand what can cause my program not to works as expected. As my documents variable is an IEnumerable returning unique values, why my process method is breaking ?
thx for advises
[Edit] Code for retrieving document :
/// <summary>
/// Get all documents in data store
/// </summary>
public static DocumentsDS FindAll(short? year)
{
Database db = DatabaseFactory.CreateDatabase(connStringName); // MS Entlib
DbCommand cm = db.GetStoredProcCommand("Document_Select");
if (year.HasValue) db.AddInParameter(cm, "Year", DbType.Int16, year.Value);
string[] tableNames = { "Documents", "Years" };
DocumentsDS ds = new DocumentsDS();
db.LoadDataSet(cm, ds, tableNames);
return ds;
}
[Edit2] Possible source of my issue, thanks to mquander. If I wrote :
var test = EnumerateDocuments().AsParallel().Select(d => d.UniqueDocRef);
var testGr = test.GroupBy(d => d).Select(d => new { d.Key, Count = d.Count() }).Where(c=>c.Count>1);
var testLst = testGr.ToList();
Console.WriteLine(testLst.Where(x => x.Count == 1).Count());
Console.WriteLine(testLst.Where(x => x.Count > 1).Count());
I get this result :
0
1758
Removing the AsParallel returns the same output.
Conclusion : my EnumerateDocuments have something wrong and returns twice each documents.
Have to dive here I think
This is probably my source enumeration in cause
I suggest you to have each task put the file data into a global queue and have a parallel thread take writing requests from the queue and do the actual writing.
Anyway, the performance of writing in parallel on a single disk is much worse than writing sequentially, because the disk needs to spin to seek the next writing location, so you are just bouncing the disk around between seeks. It's better to do the writes sequentially.
Is Document.FindAll((short)year).Documents threadsafe? Because the difference between the first and the second version is that in the second (broken) version, this call is running multiple times concurrently. That could plausibly be the cause of the issue.
Sounds like you're trying to write to the same file. Only one thread/program can write to a file at a given time, so you can't use Parallel.
If you're reading from the same file, then you need to open the file with only read permissions as not to put a write lock on it.
The simplest way to fix the issue is to place a lock around your File.WriteAllText, assuming the writing is fast and it's worth parallelizing the rest of the code.

Import CSV file to strongly typed data structure in .Net [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What's the best way to import a CSV file into a strongly-typed data structure?
Microsoft's TextFieldParser is stable and follows RFC 4180 for CSV files. Don't be put off by the Microsoft.VisualBasic namespace; it's a standard component in the .NET Framework, just add a reference to the global Microsoft.VisualBasic assembly.
If you're compiling for Windows (as opposed to Mono) and don't anticipate having to parse "broken" (non-RFC-compliant) CSV files, then this would be the obvious choice, as it's free, unrestricted, stable, and actively supported, most of which cannot be said for FileHelpers.
See also: How to: Read From Comma-Delimited Text Files in Visual Basic for a VB code example.
Use an OleDB connection.
String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();
If you're expecting fairly complex scenarios for CSV parsing, don't even think up of rolling our own parser. There are a lot of excellent tools out there, like FileHelpers, or even ones from CodeProject.
The point is this is a fairly common problem and you could bet that a lot of software developers have already thought about and solved this problem.
I agree with #NotMyself. FileHelpers is well tested and handles all kinds of edge cases that you'll eventually have to deal with if you do it yourself. Take a look at what FileHelpers does and only write your own if you're absolutely sure that either (1) you will never need to handle the edge cases FileHelpers does, or (2) you love writing this kind of stuff and are going to be overjoyed when you have to parse stuff like this:
1,"Bill","Smith","Supervisor", "No Comment"
2 , 'Drake,' , 'O'Malley',"Janitor,
Oops, I'm not quoted and I'm on a new line!
Brian gives a nice solution for converting it to a strongly typed collection.
Most of the CSV parsing methods given don't take into account escaping fields or some of the other subtleties of CSV files (like trimming fields). Here is the code I personally use. It's a bit rough around the edges and has pretty much no error reporting.
public static IList<IList<string>> Parse(string content)
{
IList<IList<string>> records = new List<IList<string>>();
StringReader stringReader = new StringReader(content);
bool inQoutedString = false;
IList<string> record = new List<string>();
StringBuilder fieldBuilder = new StringBuilder();
while (stringReader.Peek() != -1)
{
char readChar = (char)stringReader.Read();
if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
{
// If it's a \r\n combo consume the \n part and throw it away.
if (readChar == '\r')
{
stringReader.Read();
}
if (inQoutedString)
{
if (readChar == '\r')
{
fieldBuilder.Append('\r');
}
fieldBuilder.Append('\n');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
records.Add(record);
record = new List<string>();
inQoutedString = false;
}
}
else if (fieldBuilder.Length == 0 && !inQoutedString)
{
if (char.IsWhiteSpace(readChar))
{
// Ignore leading whitespace
}
else if (readChar == '"')
{
inQoutedString = true;
}
else if (readChar == ',')
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
else
{
fieldBuilder.Append(readChar);
}
}
else if (readChar == ',')
{
if (inQoutedString)
{
fieldBuilder.Append(',');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
}
else if (readChar == '"')
{
if (inQoutedString)
{
if (stringReader.Peek() == '"')
{
stringReader.Read();
fieldBuilder.Append('"');
}
else
{
inQoutedString = false;
}
}
else
{
fieldBuilder.Append(readChar);
}
}
else
{
fieldBuilder.Append(readChar);
}
}
record.Add(fieldBuilder.ToString().TrimEnd());
records.Add(record);
return records;
}
Note that this doesn't handle the edge case of fields not being deliminated by double quotes, but meerley having a quoted string inside of it. See this post for a bit of a better expanation as well as some links to some proper libraries.
I was bored so i modified some stuff i wrote. It try's to encapsulate the parsing in an OO manner whle cutting down on the amount of iterations through the file, it only iterates once at the top foreach.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// usage:
// note this wont run as getting streams is not Implemented
// but will get you started
CSVFileParser fileParser = new CSVFileParser();
// TO Do: configure fileparser
PersonParser personParser = new PersonParser(fileParser);
List<Person> persons = new List<Person>();
// if the file is large and there is a good way to limit
// without having to reparse the whole file you can use a
// linq query if you desire
foreach (Person person in personParser.GetPersons())
{
persons.Add(person);
}
// now we have a list of Person objects
}
}
public abstract class CSVParser
{
protected String[] deliniators = { "," };
protected internal IEnumerable<String[]> GetRecords()
{
Stream stream = GetStream();
StreamReader reader = new StreamReader(stream);
String[] aRecord;
while (!reader.EndOfStream)
{
aRecord = reader.ReadLine().Split(deliniators,
StringSplitOptions.None);
yield return aRecord;
}
}
protected abstract Stream GetStream();
}
public class CSVFileParser : CSVParser
{
// to do: add logic to get a stream from a file
protected override Stream GetStream()
{
throw new NotImplementedException();
}
}
public class CSVWebParser : CSVParser
{
// to do: add logic to get a stream from a web request
protected override Stream GetStream()
{
throw new NotImplementedException();
}
}
public class Person
{
public String Name { get; set; }
public String Address { get; set; }
public DateTime DOB { get; set; }
}
public class PersonParser
{
public PersonParser(CSVParser parser)
{
this.Parser = parser;
}
public CSVParser Parser { get; set; }
public IEnumerable<Person> GetPersons()
{
foreach (String[] record in this.Parser.GetRecords())
{
yield return new Person()
{
Name = record[0],
Address = record[1],
DOB = DateTime.Parse(record[2]),
};
}
}
}
}
There are two articles on CodeProject that provide code for a solution, one that uses StreamReader and one that imports CSV data using the Microsoft Text Driver.
A good simple way to do it is to open the file, and read each line into an array, linked list, data-structure-of-your-choice. Be careful about handling the first line though.
This may be over your head, but there seems to be a direct way to access them as well using a connection string.
Why not try using Python instead of C# or VB? It has a nice CSV module to import that does all the heavy lifting for you.
I had to use a CSV parser in .NET for a project this summer and settled on the Microsoft Jet Text Driver. You specify a folder using a connection string, then query a file using a SQL Select statement. You can specify strong types using a schema.ini file. I didn't do this at first, but then I was getting bad results where the type of the data wasn't immediately apparent, such as IP numbers or an entry like "XYQ 3.9 SP1".
One limitation I ran into is that it cannot handle column names above 64 characters; it truncates. This shouldn't be a problem, except I was dealing with very poorly designed input data. It returns an ADO.NET DataSet.
This was the best solution I found. I would be wary of rolling my own CSV parser, since I would probably miss some of the end cases, and I didn't find any other free CSV parsing packages for .NET out there.
EDIT: Also, there can only be one schema.ini file per directory, so I dynamically appended to it to strongly type the needed columns. It will only strongly-type the columns specified, and infer for any unspecified field. I really appreciated this, as I was dealing with importing a fluid 70+ column CSV and didn't want to specify each column, only the misbehaving ones.
I typed in some code. The result in the datagridviewer looked good. It parses a single line of text to an arraylist of objects.
enum quotestatus
{
none,
firstquote,
secondquote
}
public static System.Collections.ArrayList Parse(string line,string delimiter)
{
System.Collections.ArrayList ar = new System.Collections.ArrayList();
StringBuilder field = new StringBuilder();
quotestatus status = quotestatus.none;
foreach (char ch in line.ToCharArray())
{
string chOmsch = "char";
if (ch == Convert.ToChar(delimiter))
{
if (status== quotestatus.firstquote)
{
chOmsch = "char";
}
else
{
chOmsch = "delimiter";
}
}
if (ch == Convert.ToChar(34))
{
chOmsch = "quotes";
if (status == quotestatus.firstquote)
{
status = quotestatus.secondquote;
}
if (status == quotestatus.none )
{
status = quotestatus.firstquote;
}
}
switch (chOmsch)
{
case "char":
field.Append(ch);
break;
case "delimiter":
ar.Add(field.ToString());
field.Clear();
break;
case "quotes":
if (status==quotestatus.firstquote)
{
field.Clear();
}
if (status== quotestatus.secondquote)
{
status =quotestatus.none;
}
break;
}
}
if (field.Length != 0)
{
ar.Add(field.ToString());
}
return ar;
}
If you can guarantee that there are no commas in the data, then the simplest way would probably be to use String.split.
For example:
String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);
There may be libraries you could use to help, but that's probably as simple as you can get. Just make sure you can't have commas in the data, otherwise you will need to parse it better.

Categories

Resources