I am trying to use a CSV parser which I found on the net in my project. The problem is I am getting a null reference exception when I try to convert the string to a Tag and my collection does not get populated. Can anyone assist? Thanks
CSV Parser
private static IEnumerable<string[]> parseCSV(string path)
{
List<string[]> parsedData = new List<string[]>();
try
{
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
while ((line = readFile.ReadLine()) != null)
{
row = line.Split(',');
parsedData.Add(row);
}
}
}
catch (Exception e)
{
System.Windows.MessageBox.Show(e.Message);
}
return parsedData;
}
Tag Class
public class Tag
{
public Tag(string name, int weight)
{
Name = name;
Weight = weight;
}
public string Name { get; set; }
public int Weight { get; set; }
public static IEnumerable<Tag> CreateTags(IEnumerable<string> words)
{
Dictionary<string, int> tags = new Dictionary<string, int>();
foreach (string word in words)
{
int count = 1;
if (tags.ContainsKey(word))
{
count = tags[word] + 1;
}
tags[word] = count;
}
return tags.Select(kvp => new Tag(kvp.Key, kvp.Value));
}
}
Validate all method arguments before you use them!
It breaks on this line: foreach (string word in words)
Remember that foreach loops work by calling GetEnumerator on the collection iterated over. That is, your foreach loop causes a call to words.GetEnumerator, and this call fails if words is null.
Therefore, validate your argument words by adding a guard at the very start of your CreateTags method:
if (words == null)
{
throw new ArgumentNullException("words");
}
This will help you find the location in your code where null is passed into CreateTags, and you can then continue fixing the calling code.
Suggestion: Avoid null whenever possible.
As a very general rule, try to avoid using null values whenever possible. For example, when your code is dealing with sets and collections of items, you could make sure that it also works correctly with empty collections. In a second step, make sure that you never use null to represent an empty collection; instead, use e.g. LINQ's Enumerable.Empty<TItem>() generator to create an empty collection.
One place where you could start doing this is in the CreateTags method by ensuring that no matter what the inputs are, that method will always return a valid, non-null (but possibly empty) collection:
if (words == null)
{
return Enumerable.Empty<Tag>(); // You could do without LINQ by writing:
// return new Tag[] { };
}
Every method should run sanity checks on the arguments it accepts to ensure the arguments are valid input parameters. I would probably do something like
public static IEnumerable<Tag> CreateTags(IEnumerable<string> words)
{
if(words==null)
{
//either throw a new ArgumentException or
return null; //or return new Dictionary<string,int>();
}
Dictionary<string, int> tags = new Dictionary<string, int>();
foreach (string word in words)
{
int count = 1;
if (tags.ContainsKey(word))
{
count = tags[word] + 1;
}
tags[word] = count;
}
return tags.Select(kvp => new Tag(kvp.Key, kvp.Value));
}
As to why your "words" param is null, it would be helpful to see the CSV file you are trying to parse in.
Hope this helps!
Related
I am trying to read input from the console and create objects of my own datatype Word and add them to a List. However everytime I add a new one all previous ones get replaced.
The Loop:
while (!QuitInput(wordInput.ToLower()))
{
...handle invalid input...
else
{
try
{
ReadWordFromConsole(languages[0], languages[1], wordInput);
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
}
wordInput = Console.ReadLine().Trim(' ', ';');
}
The Method:
private static void ReadWordFromConsole(string language1, string language2, string input)
{
var splitInput = input.Split(';');
for (int i = 0; i < splitInput.Length; i++)
splitInput[i] = splitInput[i].Trim(' ');
if (splitInput.Length < 2)
{
if (!input.Contains(';'))
throw new ArgumentException("Separate with ';'");
throw new ArgumentException("Invalid input. 'h' for help.");
}
var translationList = new List<string>();
for (int i = 1; i < splitInput.Length; i++)
translationList.Add(splitInput[i]);
var word = new Word(language1, language2, splitInput[0], translationList);
_loadedWords.Add(word);
}
Word class:
private static string _language;
public Word(string language, string translationLanguage, string text, List<string> translations)
{
Language = language;
TranslationLanguage = translationLanguage;
Text = text;
Translations = translations;
}
public string Language
{
get
{
return _language;
}
set
{
if (string.IsNullOrEmpty(value))
throw new ArgumentException("Language cannot be empty");
_language = value;
}
}
...
The global list declared in the same class as ReadWordFromConsole:
private static List<Word> _loadedWords = new List<Word>();
When researching I discovered some posts saying that you cannot use the same instance of the object in the loop. But am I not creating a new one every time ReadWordFromConsole is called?
What would I have to change in order for it to work and not replace previous words?
With a static field as backing store of a property
private static string _language;
even if it is an instance property, you are effectively having just a single location where all your Word instances store/get their Language.
Solution: just remove that static.
I have data in tab-separated values (TSV) text files that I want to read and (eventually) store in database tables. With the TSV files, each line contains one record, but in one file the record can have 2 fields, in another file 4 fields, etc. I wrote working code to handle the 2-field records, but I thought this might be a good case for a generic method (or two) rather than writing new methods for each kind of record. However, I have not been able to code this because of 2 problems: I can't create a new object for holding the record data, and I don't know how to use reflection to generically fill the instance variables of my objects.
I looked at several other similar posts, including Datatable to object by using reflection and linq
Below is the code that works (this is in Windows, if that matters) and also the code that doesn't work.
public class TSVFile
{
public class TSVRec
{
public string item1;
public string item2;
}
private string fileName = "";
public TSVFile(string _fileName)
{
fileName = _fileName;
}
public TSVRec GetTSVRec(string Line)
{
TSVRec rec = new TSVRec();
try
{
string[] fields = Line.Split(new char[1] { '\t' });
rec.item1 = fields[0];
rec.item2 = fields[1];
}
catch (Exception ex)
{
System.Windows.Forms.MessageBox.Show("Bad import data on line: " +
Line + "\n" + ex.Message, "Error",
System.Windows.Forms.MessageBoxButtons.OK,
System.Windows.Forms.MessageBoxIcon.Error);
}
return rec;
}
public List<TSVRec> ImportTSVRec()
{
List<TSVRec> loadedData = new List<TSVRec>();
using (StreamReader sr = File.OpenText(fileName))
{
string Line = null;
while ((Line = sr.ReadLine()) != null)
{
loadedData.Add(GetTSVRec(Line));
}
}
return loadedData;
}
// *** Attempted generic methods ***
public T GetRec<T>(string Line)
{
T rec = new T(); // compile error!
Type t = typeof(T);
FieldInfo[] instanceVars = t.GetFields();
string[] fields = Line.Split(new char[1] { '\t' });
for (int i = 0; i < instanceVars.Length - 1; i++)
{
rec. ??? = fields[i]; // how do I finish this line???
}
return rec;
}
public List<T> Import<T>(Type t)
{
List<T> loadedData = new List<T>();
using (StreamReader sr = File.OpenText(fileName))
{
string Line = null;
while ((Line = sr.ReadLine()) != null)
{
loadedData.Add(GetRec<T>(Line));
}
}
return loadedData;
}
}
I saw the line
T rec = new T();
in the above-mentioned post, but it doesn't work for me...
I would appreciate any suggestions for how to make this work, if possible. I want to learn more about using reflection with generics, so I don't only want to understand how, but also why.
I wish #EdPlunkett had posted his suggestion as an answer, rather than a comment, so I could mark it as the answer...
To summarize: to do what I want to do, there is no need for "Assigning instance variables obtained through reflection in generic method". In fact, I can have a generic solution without using a generic method:
public class GenRec
{
public List<string> items = new List<string>();
}
public GenRec GetRec(string Line)
{
GenRec rec = new GenRec();
try
{
string[] fields = Line.Split(new char[1] { '\t' });
for (int i = 0; i < fields.Length; i++)
rec.items.Add(fields[i]);
}
catch (Exception ex)
{
System.Windows.Forms.MessageBox.Show("Bad import data on line: " + Line + "\n" + ex.Message, "Error",
System.Windows.Forms.MessageBoxButtons.OK,
System.Windows.Forms.MessageBoxIcon.Error);
}
return rec;
}
public List<GenRec> Import()
{
List<GenRec> loadedData = new List<GenRec>();
using (StreamReader sr = File.OpenText(fileName))
{
string Line = null;
while ((Line = sr.ReadLine()) != null)
loadedData.Add(GetRec(Line));
}
return loadedData;
}
I just tested this, and it works like a charm!
Of course, this isn't helping me to learn how to write generic methods or use reflection, but I'll take it...
I have a method where I'm reading a textfile.
I have to get the words in the textfile which start with "ART".
I have a foreach loop which loops through the method.
class ProductsList
{
public static void Main()
{
String path = #"D:\ProductsProjects\products.txt";
GetProducts(path, s => s.StartsWith("ART"));
//foreach (String productin GetProducts(path, s => s.StartsWith("ART")))
//Console.Write("{0}; ", word);
}
My method looks like this:
public static String GetProducts(String path, Func<String, bool> lambda)
{
try {
using (StreamReader sr = new StreamReader(path)){
string[] products= sr.ReadToEnd().Split(' ');
// need to get all the products starting with ART
foreach (string s in products){
return s;
}
}
}
catch (IOException ioe){
Console.WriteLine(ioe.Message);
}
return ="";
}
}
I'm having problems with the lambda in the method, I'm new to working with lambda's and I don't really know how to apply the lambda in the method.
I'm sorry if I can't really explain myself that well.
just add it here
foreach (string s in products.Where(lambda))
Update:
you should change your method like this to return a list of products and not just a single
public static IEnumerable<string> GetProducts(String path, Func<String, bool> lambda)
{
using (StreamReader sr = new StreamReader(path))
{
string[] products = sr.ReadToEnd().Split(' ');
// need to get all the products starting with ART
foreach (string s in products.Where(lambda))
{
yield return s;
}
}
}
Your code is wrong in that it only ever returns the one string, you want to return multiple strings, if the list of products is large this could also take a while, I'd recommend doing it this way:
public static IEnumerable<string> GetProducts(string path, Func<string, bool> matcher)
{
using(var stream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.None))
{
using(var reader = new StreamReader(stream))
{
do
{
var line = reader.ReadLine();
if (matcher(line)) yield return line
}while(!reader.EndOfFile)
}
}
}
Then using it is as simple as:
foreach(var product in GetProducts("abc.txt", s => s.StartsWith("ART")))
{
Console.WriteLine("This is a matching product: {0}", product);
}
This code has the benefit of returning all of the lines that match the predicate (the lambda), as well as doing so using an iterator block, which means it doesn't actually read the next line until you ask for it.
I have the following code which takes a CSV and writes to a console:
using (CsvReader csv = new CsvReader(
new StreamReader("data.csv"), true))
{
// missing fields will not throw an exception,
// but will instead be treated as if there was a null value
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
// to replace by "" instead, then use the following action:
//csv.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};",
headers[i],
csv[i] == null ? "MISSING" : csv[i]));
Console.WriteLine();
}
}
The CSV file has 7 headers for which I have 7 columns in my SQL table.
What is the best way to take each csv[i] and write to a row for each column and then move to the next row?
I tried to add the ccsv[i] to a string array but that didn't work.
I also tried the following:
SqlCommand sql = new SqlCommand("INSERT INTO table1 [" + csv[i] + "]", mysqlconnectionstring);
sql.ExecuteNonQuery();
My table (table1) is like this:
name address city zipcode phone fax device
your problem is simple but I will take it one step further and let you know a better way to approach the issue.
when you have a problem to sold, always break it down into parts and apply each part in each own method. For example, in your case:
1 - read from the file
2 - create a sql query
3 - run the query
and you can even add validation to the file (imagine your file does not even have 7 fields in one or more lines...) and the example below it to be taken, only if your file never passes around 500 lines, as if it does normally you should consider to use a SQL statement that takes your file directly in to the database, it's called bulk insert
1 - read from file:
I would use a List<string> to hold the line entries and I always use StreamReader to read from text files.
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
2 - generate the sql
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
3 - run the query
SqlCommand sql = new SqlCommand(string.Join("", query), mysqlconnectionstring);
sql.ExecuteNonQuery();
having some fun I end up doing the solution and you can download it here, the output is:
The code can be found here. It needs more tweaks but I will left that for others. Solution written in C#, VS 2013.
The ExtractCsvIntoSql class is as follows:
public class ExtractCsvIntoSql
{
private string CsvPath, Separator;
private bool HasHeader;
private List<string[]> Lines;
private List<string> Query;
/// <summary>
/// Header content of the CSV File
/// </summary>
public string[] Header { get; private set; }
/// <summary>
/// Template to be used in each INSERT Query statement
/// </summary>
public string LineTemplate { get; set; }
public ExtractCsvIntoSql(string csvPath, string separator, bool hasHeader = false)
{
this.CsvPath = csvPath;
this.Separator = separator;
this.HasHeader = hasHeader;
this.Lines = new List<string[]>();
// you can also set this
this.LineTemplate = "INSERT INTO [table1] SELECT ({0});";
}
/// <summary>
/// Generates the SQL Query
/// </summary>
/// <returns></returns>
public List<string> Generate()
{
if(this.CsvPath == null)
throw new ArgumentException("CSV Path can't be empty");
// extract csv into object
Extract();
// generate sql query
GenerateQuery();
return this.Query;
}
private void Extract()
{
string line;
string[] splittedLine;
int iLine = 0;
try
{
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
}
catch (Exception ex)
{
if(ex.InnerException != null)
while (ex.InnerException != null)
ex = ex.InnerException;
throw ex;
}
// Lines will have all rows and each row, the column entry
}
private void GenerateQuery()
{
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
}
}
and you can run it as:
class Program
{
static void Main(string[] args)
{
string file = Ask("What is the CSV file path? (full path)");
string separator = Ask("What is the current separator? (; or ,)");
var extract = new ExtractCsvIntoSql(file, separator);
var sql = extract.Generate();
Output(sql);
}
private static void Output(IEnumerable<string> sql)
{
foreach(var query in sql)
Console.WriteLine(query);
Console.WriteLine("*******************************************");
Console.Write("END ");
Console.ReadLine();
}
private static string Ask(string question)
{
Console.WriteLine("*******************************************");
Console.WriteLine(question);
Console.Write("= ");
return Console.ReadLine();
}
}
Usually i like to be a bit more generic so i'll try to explain a very basic flow i use from time to time:
I don't like the hard coded attitude so even if your code will work it will be dedicated specifically to one type. I prefer i simple reflection, first to understand what DTO is it and then to understand what repository should i use to manipulate it:
For example:
public class ImportProvider
{
private readonly string _path;
private readonly ObjectResolver _objectResolver;
public ImportProvider(string path)
{
_path = path;
_objectResolver = new ObjectResolver();
}
public void Import()
{
var filePaths = Directory.GetFiles(_path, "*.csv");
foreach (var filePath in filePaths)
{
var fileName = Path.GetFileName(filePath);
var className = fileName.Remove(fileName.Length-4);
using (var reader = new CsvFileReader(filePath))
{
var row = new CsvRow();
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
while (reader.ReadRow(row))
{
var dtoInstance = (DtoBase)_objectResolver.Resolve("DAL.DTO", className + "Dto");
dtoInstance.FillInstance(row.ToArray());
repository.Save(dtoInstance);
}
}
}
}
}
Above is a very basic class responsible importing the data. Nevertheless of how this piece of code parsing CSV files (CsvFileReader), the important part is thata "CsvRow" is a simple List.
Below is the implementation of the ObjectResolver:
public class ObjectResolver
{
private readonly Assembly _myDal;
public ObjectResolver()
{
_myDal = Assembly.Load("DAL");
}
public object Resolve(string nameSpace, string name)
{
var myLoadClass = _myDal.GetType(nameSpace + "." + name);
return Activator.CreateInstance(myLoadClass);
}
}
The idea is to simple follow a naming convetion, in my case is using a "Dto" suffix for reflecting the instances, and "Dao" suffix for reflecting the responsible dao. The full name of the Dto or the Dao can be taken from the csv name or from the header (as you wish)
Next step is filling the Dto, each dto or implements the following simple abstract:
public abstract class DtoBase
{
public abstract void FillInstance(params string[] parameters);
}
Since each Dto "knows" his structure (just like you knew to create an appropriate table in the database), it can easily implement the FillInstanceMethod, here is a simple Dto example:
public class ProductDto : DtoBase
{
public int ProductId { get; set; }
public double Weight { get; set; }
public int FamilyId { get; set; }
public override void FillInstance(params string[] parameters)
{
ProductId = int.Parse(parameters[0]);
Weight = double.Parse(parameters[1]);
FamilyId = int.Parse(parameters[2]);
}
}
After you have your Dto filled with data you should find the appropriate Dao to handle it
which is basically happens in reflection in this line of the Import() method:
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
In my case the Dao implements an abstract base class - but it's not that relevant to your problem, your DaoBase can be a simple abstract with a single Save() method.
This way you have a dedicated Dao to CRUD your Dto's - each Dao simply knows how to save for its relevant Dto. Below is the corresponding ProductDao to the ProductDto:
public class ProductDao : DaoBase
{
private const string InsertProductQuery = #"SET foreign_key_checks = 0;
Insert into product (productID, weight, familyID)
VALUES (#productId, #weight, #familyId);
SET foreign_key_checks = 1;";
public override void Save(DtoBase dto)
{
var productToSave = dto as ProductDto;
var saveproductCommand = GetDbCommand(InsertProductQuery);
if (productToSave != null)
{
saveproductCommand.Parameters.Add(CreateParameter("#productId", productToSave.ProductId));
saveproductCommand.Parameters.Add(CreateParameter("#weight", productToSave.Weight));
saveproductCommand.Parameters.Add(CreateParameter("#familyId", productToSave.FamilyId));
ExecuteNonQuery(ref saveproductCommand);
}
}
}
Please ignore the CreateParameter() method, since it's an abstraction from the base classs. you can just use a CreateSqlParameter or CreateDataParameter etc.
Just notice, it's a real naive implementation - you can easily remodel it better, depends on your needs.
From the first impression of your questionc I guess you would be having hugely number of records (more than lacs). If yes I would consider the SQL bulk copies an option. If the record would be less go ahead single record. Insert. The reason for you insert not working is u not providing all the columns of the table and also there's some syntax error.
I am new to C# and to programming in general. I am trying to read the contents of a txt file and load them to an arraylist. I can't figure out what condition to use in my while loop.
void LoadArrayList()
{
TextReader tr;
tr = File.OpenText("C:\\Users\\Maattt\\Documents\\Visual Studio 2010\\Projects\\actor\\actors.txt");
string Actor;
while (ActorArrayList != null)
{
Actor = tr.ReadLine();
if (Actor == null)
{
break;
}
ActorArrayList.Add(Actor);
}
}
void LoadArrayList()
{
TextReader tr;
tr = File.OpenText("C:\\Users\\Maattt\\Documents\\Visual Studio 2010\\Projects\\actor\\actors.txt");
string Actor;
Actor = tr.ReadLine();
while (Actor != null)
{
ActorArrayList.Add(Actor);
Actor = tr.ReadLine();
}
}
You can do it with just 2 lines of code
string[] Actor = File.ReadAllLines("C:\\Users\\Maattt\\Documents\\Visual Studio 2010\\Projects\\actor\\actors.txt");
ArrayList list = new ArrayList(Actor);
This is how it should be
void LoadArrayList()
{
string[] lines = System.IO.File.ReadAllLines(#"C:\Users\Maattt\Documents\Visual Studio 2010\Projects\actor\actors.txt");
// Display the file contents by using a foreach loop.
foreach (string Actor in lines)
{
ActorArrayList.Add(Actor);
}
}
Just rearrange it like this:
Actor = tr.ReadLine();
while (Actor != null)
{
ActorArrayList.Add(Actor);
Actor = tr.ReadLine();
}
If you look at the documentation for the TextReader.ReadLine method, you'll see that it returns either a string, or null if there are no more lines. So, what you can do is loop and check null against the results of the ReadLine method.
while(tr.ReadLine() != null)
{
// We know there are more items to read
}
With the above, though, you're not capturing the result of ReadLine. So you need to declare a string to capture the result and to use inside the while loop:
string line;
while((line = tr.ReadLine()) != null)
{
ActorArrayList.Add(line);
}
Also, I would suggest using a generic list, such as List<T> instead of the non-generic ArrayList. Using something like List<T> gives you more type safety and reduces the possibility of invalid assignments or casts.