I have a function that reads a CSV file and returns a list of objects whose parameters depends on the content of the CSV. Right now it works if I hardcode one object. I would like to return different object types.
public static List<CSVObject> ImportCsvIntoObject(string csvFile, string delimiter)
{
List<CSVObject> list = new List<CSVObject>();
using (TextFieldParser csvReader = new TextFieldParser(csvFile))
{
csvReader.SetDelimiters(new String[] { delimiter });
csvReader.HasFieldsEnclosedInQuotes = true;
//Parse the file and creates a list of CSVObject
//example with a csv file with 3 columns
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
string parameter1 = fieldData[0];
string parameter2 = fieldData[1];
string parameter3 = fieldData[2];
CSVObject example = new CSVObject(parameter1, parameter2, parameter3);
list.Add(example);
}
}
return list;
}
The following solution works but I'm not sure if there are not better ways to do this.
public static List<Object> ImportCsvIntoList(string csvFile, string delimiter, Type type)
{
List<Object> list = new List<Object>();
using (TextFieldParser csvReader = new TextFieldParser(csvFile))
{
csvReader.SetDelimiters(new String[] { delimiter });
csvReader.HasFieldsEnclosedInQuotes = true;
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
string parameter1 = fieldData[0];
string parameter2 = fieldData[1];
string parameter3 = fieldData[2];
var example = Activator.CreateInstance(type, parameter1, parameter2, parameter3);
list.Add(example);
}
}
return list;
}
Furthermore, right now it only works with a hardcoded amount of parameters. Unfortunately, my objects all have a different amount of parameters. How can I call Activator.CreateInstance with different amount of parameters ?
It is my first question so sorry if it isn't written properly, suggestion to improve are more than welcome.
The following might work for you using generics and delegates
public static List<T> ImportCsvIntoObject<T>(string csvFile, string delimiter, Func<List<string>, T> createObject)
{
List<T> list = new List<T>();
using (TextFieldParser csvReader = new TextFieldParser(csvFile))
{
csvReader.SetDelimiters(new String[] { delimiter });
csvReader.HasFieldsEnclosedInQuotes = true;
//Parse the file and creates a list of CSVObject
//example with a csv file with 3 columns
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
CSVObject example = createObject(fieldData.ToList())
list.Add(example);
}
}
return list;
}
And you would call the following using:
List<CSVObject> objectList = ImportCsvIntoObject("csvData", ",", (list) => { new CSVObject(list[0], list[1], list[2]); });
The Activator.CreateInstance() function can take an array of parameters, so that you might not know how many you need before runtime, but as you read your CSV, you create arrays corresponding to the number of parameters needed for this particular object (fortunately, your field data object seems to already do this).
So it could be like this :
string[] fieldData = csvReader.ReadFields();
var example = Activator.CreateInstance(type, fieldData);
list.Add(example);
This is because the Activator.CreateInstance function uses the params keyword
Related
I've got several text files which should be tab delimited, but actually are delimited by an arbitrary number of spaces. I want to parse the rows from the text file into a DataTable (the first row of the text file has headers for property names). This got me thinking about building an extensible, easy way to parse text files. Here's my current working solution:
string filePath = #"C:\path\lowbirthweight.txt";
//regex to remove multiple spaces
Regex regex = new Regex(#"[ ]{2,}", RegexOptions.Compiled);
DataTable table = new DataTable();
var reader = ReadTextFile(filePath);
//headers in first row
var headers = reader.First();
//skip headers for data
var data = reader.Skip(1).ToArray();
//remove arbitrary spacing between column headers and table data
headers = regex.Replace(headers, #" ");
for (int i = 0; i < data.Length; i++)
{
data[i] = regex.Replace(data[i], #" ");
}
//make ready the DataTable, split resultant space-delimited string into array for column names
foreach (string columnName in headers.Split(' '))
{
table.Columns.Add(new DataColumn() { ColumnName = columnName });
}
foreach (var record in data)
{
//split into array for row values
table.Rows.Add(record.Split(' '));
}
//test prints correctly to the console
Console.WriteLine(table.Rows[0][2]);
}
static IEnumerable<string> ReadTextFile(string fileName)
{
using (var reader = new StreamReader(fileName))
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
In my project I've already received several large (gig +) text files that are not in the format in which they are purported to be. So can I see having to write methods such as these with some regularity, albeit with a different regular expression. Is there a way to do something like
data =data.SmartRegex(x => x.AllowOneSpace) where I can use a regular expression to iterate over the collection of strings?
Is something like the following on the right track?
public static class SmartRegex
{
public static Expression AllowOneSpace(this List<string> data)
{
//no idea how to return an expression from a method
}
}
I'm not too overly concerned with performance, just would like to see how something like this works
You should consult with your data source and find out why your data is bad.
As for the API design that you are trying to implement:
public class RegexCollection
{
private readonly Regex _allowOneSpace = new Regex(" ");
public Regex AllowOneSpace { get { return _allowOneSpace; } }
}
public static class RegexExtensions
{
public static IEnumerable<string[]> SmartRegex(
this IEnumerable<string> collection,
Func<RegexCollection, Regex> selector
)
{
var regexCollection = new RegexCollection();
var regex = selector(regexCollection);
return collection.Select(l => regex.Split(l));
}
}
Usage:
var items = new List<string> { "Hello world", "Goodbye world" };
var results = items.SmartRegex(x => x.AllowOneSpace);
Take a look and let me know what the hell i'm derping on ;)
[HttpPost]
public ActionResult Upload(HttpPostedFileBase File)
{
HttpPostedFileBase csvFile = Request.Files["adGroupCSV"];
byte[] buffer = new byte[csvFile.ContentLength];
csvFile.InputStream.Read(buffer, 0, csvFile.ContentLength);
string csvString = System.Text.Encoding.UTF8.GetString(buffer);
string[] lines = Regex.Split(csvString, "\r");
List<string[]> csv = new List<string[]>();
foreach (string line in lines)
{
csv.Add(line.Split(','));
}
string json = new System.Web.Script.Serialization.JavaScriptSerializer().Serialize(csv);
ViewData["CSV"] = json;
return View(ViewData);
}
This is how it is coming across:
json = "[[\"Col1\",\"Col2\",\"Col3\",\"Col4\",\"Col5\",\"Col6\"],[\"test\",\"test\",\"test\",\"test\",\"http://www.test.com/\",\"test/\"],[\"test\",\"test\",\"test\",\"test\",\"http://www.test.com...
This is how I want it:
{"Col1":"test","Col2":"test","Col3":"test","Col4":"test","Col5":"http://www.test.com/","Col6":"test/"}
Here is an example of the CSV
Col1,Col2,Col3,Col4,Col5,Col6
Test1,Test1,Test1,test1,test.test,test/
Test2,Test2,Test2,test2,test.test,test/
Test3,Test3,Test3,test3,test.test,test/
Test4,Test4,Test4,test4,test.test,test/
You need a dictionary. Just replace
List<string[]> csv = new List<string[]>();
foreach (string line in lines)
{
csv.Add(line.Split(','));
}
with
var csv = lines.Select(l => l.Split(',')
.Select((s,i)=>new {s,i})
.ToDictionary(x=>"Col" + (x.i+1), x=>x.s));
It should work...
EDIT
var lines = csvString.Split(new char[] { '\n', '\r' }, StringSplitOptions.RemoveEmptyEntries);
var cols = lines[0].Split(',');
var csv = lines.Skip(1)
.Select(l => l.Split(',')
.Select((s, i) => new {s,i})
.ToDictionary(x=>cols[x.i],x=>x.s));
var json = new JavaScriptSerializer().Serialize(csv);
It looks like you have a an array of string arrays when you just want one object with all your columns as properties on it.
Instead of building up your
List<string[]> csv = new List<string[]>();
Can you make a new object from your JSON, like this:
public class UploadedFileObject
{
public string Col1 { get; set; }
public string Col2 { get; set; }
public string Col3 { get; set; }
public string Col4 { get; set; }
public string Col5 { get; set; }
public string Col6 { get; set; }
}
[HttpPost] public ActionResult Upload(HttpPostedFileBase File)
{
HttpPostedFileBase csvFile = Request.Files["adGroupCSV"];
byte[] buffer = new byte[csvFile.ContentLength];
csvFile.InputStream.Read(buffer, 0, csvFile.ContentLength);
string csvString = System.Text.Encoding.UTF8.GetString(buffer);
string[] lines = Regex.Split(csvString, "\r");
List<UploadedFileObject> returnObject = new List<UploadedFileObject>();
foreach (string line in lines)
{
String[] lineParts = line.Split(',');
UploadedFileObject lineObject = new UploadedFileObject();
lineObject.Col1 = lineParts[0];
lineObject.Col2 = lineParts[1];
lineObject.Col3 = lineParts[2];
lineObject.Col4 = lineParts[3];
lineObject.Col5 = lineParts[4];
lineObject.Col6 = lineParts[5];
returnObject.add(lineObject);
}
string json = new System.Web.Script.Serialization.JavaScriptSerializer().Serialize(returnObject);
ViewData["CSV"] = json;
return View(ViewData);
}
In line with the previous answer, perhaps you could use ServiceStack to deserialize the CSV into an array of objects, then re-serialize it using ServiceStack's JSON serializer?
http://www.servicestack.net/docs/text-serializers/json-csv-jsv-serializers
You need to ensure you are using the format that the JavaScriptSerializer expects you to.
Since you are giving it a List<string[]> it is formatting it as:
"[[list1Item1,list1Item2...],[list2Item1,list2Item2...]]"
Which would correlate in your file as row1 -> list1 etc.
However what you want is the first item from the first list, lining up with the first item in the second list and so on.
I don't know about the exact workings of JavaScriptSerializer, but you could try giving it a Dictionary instead of a List<string[]>, assuming you only had one line of data (two lines total).
This would involve caching the first two lines and doing the following:
for (int i = 0; i < first.Length; i++)
{
dict.Add(first[i],second[i]);
}
I have a textfile that looks like this :
John,Gauthier,blue,May
Henry,Ford,Red,June
James,Bond,Orange,December
I want to split it into a two dimensional string array so I could separate each lines then each words. Ex:
mystring[0][0] = "John"
mystring[1][3] = "June"
mystring[2][2] = "Orange"
Here's what I did right now:
string[] words = new string [100];
System.IO.StreamReader myfile = new System.IO.StreamReader("c:\\myfile.csv");
while (fichier.Peek() != -1)
{
i++;
words = myfile.ReadLine().Split(',');
}
I'm stuck. I'm able to split it into a one dimensional string array but not into a two dimensional string array. I guess I need to split it two times ; First time with '\n' and the second time with ',' and then put those two together.
This is actually a one-liner:
File.ReadLines("myfilename.txt").Select(s=>s.Split(',')).ToArray()
Since this is a beginner question, here's what's going on:
File.ReadLines(filename) returns a collection of all lines in your text file
.Select is an extension method that takes a function
s=>s.Split(',') is the function, it splits the string s by all commas and returns an array of strings.
.ToArray() takes the collection of string arrays created by .Select and makes an array out of that, so you get array of arrays.
Try this
var str = File.ReadAllText("myfile.csv");
var arr = str.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
var multi = arr.Select(x => x.Split(',')).ToArray();
Try:
var First = new string [100];
var Sec = new string [100];
System.IO.StreamReader myfile = new System.IO.StreamReader("c:\\myfile.csv");
while (fichier.Peek() != -1)
{
i++;
var buff = myfile.ReadLine().Split(',');
First[i] = buff[0];
Sec[i] = buff[1];
}
Other idea, use a XML serilizer to serilize your hole Object. Two extensions for this:
public static void SaveAsXML(this Object A, string FileName)
{
var serializer = new XmlSerializer(A.GetType());
using (var textWriter = new StreamWriter(FileName))
{
serializer.Serialize(textWriter, A);
textWriter.Close();
}
}
public static void LoadFromXML(this Object A, string FileName)
{
if (File.Exists(FileName))
{
using (TextReader textReader = new StreamReader(FileName))
{
XmlSerializer deserializer = new XmlSerializer(A.GetType());
A = (deserializer.Deserialize(textReader));
}
}
}
Add than in any Static class and call:
YourSaveClassWhitchContainsYourArray.SaveAsXML("Datastore.xml");
or
YourSaveClassWhitchContainsYourArray.LoadFromXML("Datastore.xml");
Answer Summary:
Solved this problem using Jon Skeet's answer below. Here is the finished code
public static CSVData CreateCSVData(List<RegDataDisplay> rList,
string[] selectors)
{
CSVData csv = new CSVData(); // Create the CSVData object
foreach(string selector in selectors)
{
// Get the PropertyInfo for the property whose name
// is the value of selector
var property = typeof(RegDataDisplay).GetProperty(selector);
// Use LINQ to get a list of the values for the specified property
// of each RegDataDisplay object in the supplied list.
var values = rList.Select(row => property.GetValue(row, null)
.ToString());
// Create a new list with the property name to use as a header
List<string> templs = new List<string>(){selector};
// Add the returned values after the header
templs.AddRange(values);
// Add this list as a column for the CSVData object.
csv.Columns.Add(templs);
}
return csv;
}
Question
I am building my SQL query dynamically from user input, and then exporting the results to a CSV file. I have a class called RegDataDisplay which has a property for each of the possible columns returned by my query. I can tell what columns are being selected but in my CSV creator I need to be able to only output those specific columns.
In the example below, all of the data I have retrieved is in rList, and the names of the properties I need are in selectors. So I want to iterate through the list and then add only the properties I need to my CSV data.
public static CSVData CreateCSVData(List<RegDataDisplay> rList, string[] selectors)
{
CSVData csv = new CSVData();
for(int i = 0; i < selectors.Length; i++)
{
csv.Columns.Add(new List<string>(){selectors[i]});
}
// So now I have the headers for the CSV columns,
// I need the specific properties only which is where I'm stuck
for(int i = 0; i < selectors.Length; i++)
{
for(int j = 0; j < rList.Count; j++)
{
// If it was javascript I would do something like this
csv.Columns[i].Add(rList[j][selectors[i]]);
}
}
}
Thanks
EDIT: On the right track now but I'm coming up against an error "Object does not match target type".
public static CSVData CreateCSVData()
{
// I've created a test method with test data
string[] selectors = new string[] { "Firstname", "Lastname" };
List<RegDataDisplay> rList = new List<RegDataDisplay>();
RegDataDisplay rd = new RegDataDisplay();
rd.Firstname = "first";
rd.Lastname = "last";
rList.Add(rd);
CSVData csv = new CSVData();
foreach(string selector in selectors)
{
var property = typeof(RegDataDisplay).GetProperty(selector);
var values = rList.Select(row => property.GetValue(rList, null).ToString())
.ToList(); // Error throws here
csv.Columns.Add(values);
}
return csv;
}
Assuming you're on .NET 3.5 or higher, it sounds like you may want something like:
public static CSVData CreateCSVData(List<RegDataDisplay> rList,
string[] selectors)
{
CSVData csv = new CSVData();
foreach (string selector in selectors)
{
var prop = typeof(RegDataDisplay).GetProperty(selector);
var values = rList.Select(row => (string) prop.GetValue(row, null))
.ToList();
csv.Columns.Add(values);
}
}
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Remove duplicates from a List<T> in C#
i have a List like below (so big email list):
source list :
item 0 : jumper#yahoo.com|32432
item 1 : goodzila#yahoo.com|32432|test23
item 2 : alibaba#yahoo.com|32432|test65
item 3 : blabla#yahoo.com|32432|test32
the important part of each item is email address and the other parts(separated with pipes are not important) but i want to keep them in final list.
as i said my list is to big and i think it's not recommended to use another list.
how can i remove duplicate emails (entire item) form that list without using LINQ ?
my codes are like below :
private void WorkOnFile(UploadedFile file, string filePath)
{
File.SetAttributes(filePath, FileAttributes.Archive);
FileSecurity fSecurity = File.GetAccessControl(filePath);
fSecurity.AddAccessRule(new FileSystemAccessRule(#"Everyone",
FileSystemRights.FullControl,
AccessControlType.Allow));
File.SetAccessControl(filePath, fSecurity);
string[] lines = File.ReadAllLines(filePath);
List<string> list_lines = new List<string>(lines);
var new_lines = list_lines.Select(line => string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)));
List<string> new_list_lines = new List<string>(new_lines);
int Duplicate_Count = 0;
RemoveDuplicates(ref new_list_lines, ref Duplicate_Count);
File.WriteAllLines(filePath, new_list_lines.ToArray());
}
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
char[] splitter = { '|' };
list_lines.ForEach(delegate(string line)
{
// ??
});
}
EDIT :
some duplicate email addrresses in that list have different parts ->
what can i do about them :
mean
goodzila#yahoo.com|32432|test23
and
goodzila#yahoo.com|asdsa|324234
Thanks in advance.
say you have a list of possible duplicates:
List<string> emailList ....
Then the unique list is the set of that list:
HashSet<string> unique = new HashSet<string>( emailList )
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
Duplicate_Count = 0;
List<string> list_lines2 = new List<string>();
HashSet<string> hash = new HashSet<string>();
foreach (string line in list_lines)
{
string[] split = line.Split('|');
string firstPart = split.Length > 0 ? split[0] : string.Empty;
if (hash.Add(firstPart))
{
list_lines2.Add(line);
}
else
{
Duplicate_Count++;
}
}
list_lines = list_lines2;
}
The easiest thing to do is to iterate through the lines in the file and add them to a HashSet. HashSets won't insert the duplicate entries and it won't generate an exception either. At the end you'll have a unique list of items and no exceptions will be generated for any duplicates.
1 - Get rid of your pipe separated string (create an dto class corresponding to the data it's representing)
2 - which rule do you want to apply to select two object with the same id ?
Or maybe this code can be useful for you :)
It's using the same method as the one in #xanatos answer
string[] lines= File.ReadAllLines(filePath);
Dictionary<string, string> items;
foreach (var line in lines )
{
var key = line.Split('|').ElementAt(0);
if (!items.ContainsKey(key))
items.Add(key, line);
}
List<string> list_lines = items.Values.ToList();
First, I suggest to you load the file via stream.
Then, create a type that represent your rows and load them into a HashSet(for
performance considerations).
Look (Ive removed some of your code to make it simple):
public struct LineType
{
public string Email { get; set; }
public string Others { get; set; }
public override bool Equals(object obj)
{
return this.Email.Equals(((LineType)obj).Email);
}
}
private static void WorkOnFile(string filePath)
{
StreamReader stream = File.OpenText(filePath);
HashSet<LineType> hashSet = new HashSet<LineType>();
while (true)
{
string line = stream.ReadLine();
if (line == null)
break;
string new_line = string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));
LineType lineType = new LineType()
{
Email = new_line.Split('|')[3],
Others = new_line
};
if (!hashSet.Contains(lineType))
hashSet.Add(lineType);
}
}