C# how to write JSON with enumerated identifiers - c#

I am writing a code to convert Excel to JSON (so far it works).
But I got a problem, I need to number each line that I am writing after the word Match_ (Aka Match_1, Match_2, Match_3).
If you look towards the end of the code, I tried to maybe put For? but than it gives me all Match_i..
How can I use Replace command so I can actually put corresponding numbers after the word Match_?
IP = another string I am adding to the sentence. Ignore it
row[0] = the text its taking as is from the row from the excel
Match_ is not a var, its literally a text taken, I can also write there Oded_ and then it will write Oded_ = (IP string) + (excel text on row[0])
Match_ is a text I am actually trying to replace from within the text, as I cannot do FOR inside the Link Query.
using (var conn = new OleDbConnection(connectionString))
{
conn.Open();
var cmd = conn.CreateCommand();
cmd.CommandText = $"SELECT * FROM [{sheetName}$]";
using (var rdr = cmd.ExecuteReader())
{
if (rdr != null)
{
//LINQ query - when executed will create anonymous objects for each row
var query = rdr.Cast<DbDataRecord>().Select(row => new
{
Match_ = IP + row[0]
});
//Generates JSON from the LINQ query
var json = JsonConvert.SerializeObject(query);
//Write the file to the destination path
for (int i = 1; i<200; i++)
{
json = json.Replace("match_", "match_" + i );
}
File.WriteAllText(destinationPath, json);
}
}

So, after it is assigned query is an IEnumerable<> of your anonymous type that will have 0 to many rows. Those rows are not actually evaluated yet. The important think to remember is that you are making an anonymous type, not an anonymous object, so all enumerations of your result must be of that type, you can't switch one by one.
There are many way to achieve what you want but possibly the most expedient is to include the iterator in your select enumerator, then return a JObject something like this,
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
...
var query = rdr.Cast<DbDataRecord>().Select((row, i) => {
var result = new JObject();
result.Add( $"match_{i}", IP + row[0]);
return result;
});
Then you won't have to do any error prone and costly string manipulation on your JSON, it will already be formatted correctly.
Here is a full working example of this in action,
using System;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Linq;
public class Program
{
public static void Main()
{
var query = Enumerable
.Range(1,5)
.Select( (n, i) =>
{
var result = new JObject();
result.Add($"match_{i}", n);
return result;
});
Console.WriteLine(
JsonConvert.SerializeObject(
query,
Formatting.Indented));
}
}
It is possible to do this with the more modern System.Text.Json but you'll have to embed the work in a writer.

Try regex.
class Program
{
int i = 0;
static void Main(string[] args)
{
string json = "match_ abc match_ def match_ hijmatch_";
string pattern = "match_";
Program p = new Program();
MatchEvaluator myEvaluator = new MatchEvaluator(p.ReplaceCC);
Regex r = new Regex(pattern);
string output = r.Replace(json, myEvaluator);
}
public string ReplaceCC(Match m)
// Replace each Regex cc match with the number of the occurrence.
{
i++;
return m.Value + i.ToString();
}
}

Related

Serialize dynamic list to CSV without header in Servicestack.Text

I'm trying to generate a csv file using CsvSerializer.SerializeToCsv(data), but I want to omit the headers.
I read this question, but this is not working as I'm using a list of dynamic objects.
I've tried:
IEnumerable<dynamic> data = ...;
CsvConfig<object>.OmitHeaders = true;
string csvFile = CsvSerializer.SerializeToCsv(data);
And
IEnumerable<dynamic> data = ...;
CsvConfig<dynamic>.OmitHeaders = true;
string csvFile = CsvSerializer.SerializeToCsv(data);
Both options are serializing the csvFile with headers, which I don't need.
Since I didn't find a way with a library, I opt to do this manually. Something like this worked for me:
var parsedData = new List<string>();
// parse data into comma separated objects
parsedData.AddRange(data.Select(d =>
{
var dProperties = (IDictionary<string, object>)d;
var valuesFixed = dProperties.Values.Select(v => v.ToString().ToRFC4180String());
return string.Join(",", valuesFixed);
}));
var file = string.Join("\r\n", parsedData);
Where FillInnerQuotes is just an extensor method to manage special characters based on rfc4180 standard.
public static string ToRFC4180String(this string value)
{
if(value.Contains("\""))
value = value.Replace("\"", "\"\"");
if(value.Contains("\"")
|| value.Contains("\n")
|| value.Contains("\r")
|| value.Contains("\r\n")
|| value.Contains(","))
return $"\"{value}\"";
return value;
}

What is the best way to parse my file into my sql tables

I have a data file that is not a straight forward flat file, I need to put into a SQL tables using C#. I am new to C# and not really sure how to go about doing this and what features of c# should i use, e.g. streamreader , LINQ or anything else or combination.
I have tried basic streamreader and linq.
I have tried below but unsure how to get cut the data to get what i need.
IEnumerable<string> strCSV =
File.ReadLines(FilePath);
var results = from str in strCSV
let n = str.Split(',')
where !n[0].EndsWith("SYSWARN")
select str;
List<string> lst = new List<string>();
lst = results.ToList();
Data file string below(below is a two rows of data)
*2019:01:09:00:00:35:GMT: subject=BMRA.SYSTEM.FUELINST, message={TP=2019:01:09:00:00:00:GMT,SD=2019:01:08:00:00:00:GMT,SP=48,TS=2019:01:08:23:55:00:GMT,FT=INTIRL,FG=-441}
2019:01:09:00:00:35:GMT: subject=BMRA.SYSTEM.FUELINST, message={TP=2019:01:09:00:00:00:GMT,SD=2019:01:08:00:00:00:GMT,SP=48,TS=2019:01:08:23:55:00:GMT,FT=INTNED,FG=949}*
It need to it look like the data below so it becomes comma delimited and the i would like to cut the data even further to get the specific records that i need.
*2019:01:09:00:00:35:GMT: subject=BMRA.SYSTEM.FUELINST,TP=2019:01:09:00:00:00:GMT,SD=2019:01:08:00:00:00:GMT,SP=48,TS=2019:01:08:23:55:00:GMT,FT=INTIRL,FG=-441
2019:01:09:00:00:35:GMT: subject=BMRA.SYSTEM.FUELINST,TP=2019:01:09:00:00:00:GMT,SD=2019:01:08:00:00:00:GMT,SP=48,TS=2019:01:08:23:55:00:GMT,FT=INTNED,FG=949*
Try regex :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;
using System.Globalization;
namespace ConsoleApplication125
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
string pattern1 = #"(?'time'.*):GMT: subject=(?'subject'[^,]+), message=\{(?'message'[^}]+)";
StreamReader reader = new StreamReader(FILENAME);
string line = "";
while((line = reader.ReadLine()) != null)
{
Match match = Regex.Match(line, pattern1);
DateTime time = DateTime.ParseExact(match.Groups["time"].Value, "yyyy:MM:dd:HH:mm:ss", CultureInfo.InvariantCulture);
string subject = match.Groups["subject"].Value;
string message = match.Groups["message"].Value;
string pattern2 = #"(?'key'[^=]+)=(?'value'[^,]+),?";
MatchCollection matches = Regex.Matches(message,pattern2);
Dictionary<string, string> dict = matches.Cast<Match>()
.GroupBy(x => x.Groups["key"].Value, y => y.Groups["value"].Value)
.ToDictionary(x => x.Key, y => y.FirstOrDefault());
}
}
}
}

Exporting MongoDB Documents to CSV in C#

I want to export a CSV table from the items of an IMongoCollection from MongoDB.Driver using C#.
How would I be able to do this efficiently? I was thinking of doing this by retrieving the documents from the collection and either convert them to a JSON-like format or use a StringBuilder to create the CSV file using and array of PropertyInfo to access the fields of the retrieved object.
Can someone come with an example of how I would be able to do this?
Seems like the obvious way is to get all header data somehow (see further below), and then iterate through the collection and if you were to write by hand (which people don't encourage), string build, writing to file in batches (if your collection were quite large).
HashSet<string> fields = new HashSet<string>();
BsonDocument query = BsonDocument.Parse(filter);
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
// Populate fields with all unique fields, see below for examples how.
var csv = new StringBuilder();
string headerLine = string.Join(",", fields);
csv.AppendLine(headerLine);
foreach (var element in result.ToListAsync().Result)
{
string line = null;
foreach (var field in fields)
{
BsonValue value;
if (field.Contains("."))
{
value = GetNestedField(element, field);
}
else
{
value = element.GetElement(field).Value;
}
// Example deserialize to string
switch (value.BsonType)
{
case BsonType.ObjectId:
line = line + value.ToString();
break;
case BsonType.String:
line = line + value.ToString();
break;
case BsonType.Int32:
line = line + value.AsInt32.ToString();
break;
}
line = line + ",";
}
csv.AppendLine(line);
}
File.WriteAllText("D:\\temp.csv", csv.ToString());
In the case of your own objects you'd have to use your own deserializer.
HOWEVER I'd recommend using the mongoexport tool if you can.
You could simply run the exe from your application, feeding in arguments as required. Keep in mind though, that it requires explicit fields.
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName = "C:\mongodb\bin\mongoexport.exe";
startInfo.Arguments = "-d testDB -c testCollection --type csv --fields name,address.street,address.zipCode --out .\output.csv";
startInfo.UseShellExecute = false;
Process exportProcess= new Process();
exportProcess.StartInfo = startInfo;
exportProcess.Start();
exportProcess.WaitForExit();
More on mongoexport such as paging, additional queries and field file:
https://docs.mongodb.com/manual/reference/program/mongoexport/
Getting Unique Field Names
In order to find ALL field names you could do this a number of ways. Using BsonDocument as a generic data example.
Recursively traverse through your IMongoCollection results. This is going to have to be through the entire collection, so performance may not be great.
Example:
HashSet<string> fields = new HashSet<string>();
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
foreach (var element in result.ToListAsync().Result)
{
ProcessTree(fields, element, "");
}
private void ProcessTree(HashSet<string> fields, BsonDocument tree, string parentField)
{
foreach (var field in tree)
{
string fieldName = field.Name;
if (parentField != "")
{
fieldName = parentField + "." + fieldName;
}
if (field.Value.IsBsonDocument)
{
ProcessTree(fields, field.Value.ToBsonDocument(), fieldName);
}
else
{
fields.Add(fieldName);
}
}
}
Perform a MapReduce operation to return all fields. Scanning nested fields becomes more complex with this method however. See this.
Example:
string map = #"function() {
for (var key in this) { emit(key, null); }
}";
string reduce = #"function(key, stuff) { return null; }";
string finalize = #"function(key, value){
return key;
}";
MapReduceOptions<BsonDocument, BsonValue> options = new MapReduceOptions<BsonDocument, BsonValue>();
options.Finalize = new BsonJavaScript(finalize);
var results = database.GetCollection<BsonDocument>(collection).MapReduceAsync(
new BsonJavaScript(map),
new BsonJavaScript(reduce),
options).Result.ToListAsync().Result;
foreach (BsonValue result in results.Select(item => item["_id"]))
{
Debug.WriteLine(result.AsString);
}
Perform an Aggregation operation. You'd need to unwind as many times as required to get all nested fields.
Example:
string[] pipeline = new string[3];
pipeline[0] = "{ '$project':{ 'arrayofkeyvalue':{ '$objectToArray':'$$ROOT'}}}";
pipeline[1] = "{ '$unwind':'$arrayofkeyvalue'}";
pipeline[2] = "{ '$group':{'_id':null,'fieldKeys':{'$addToSet':'$arrayofkeyvalue.k'}}}";
var stages = pipeline.Select(s => BsonDocument.Parse(s)).ToList();
var result = await database.GetCollection<BsonDocument>(collection).AggregateAsync<BsonDocument>(stages);
foreach (BsonValue fieldName in result.Single().GetElement("fieldKeys").Value.AsBsonArray)
{
Debug.WriteLine(fieldName.AsString);
}
Nothing perfect here and I couldn't tell you which is the most efficient but hopefully something to help.

Split string that includes multiline substrings into substrings [duplicate]

I'm writing a simple import application and need to read a CSV file, show result in a DataGrid and show corrupted lines of the CSV file in another grid. For example, show the lines that are shorter than 5 values in another grid. I'm trying to do that like this:
StreamReader sr = new StreamReader(FilePath);
importingData = new Account();
string line;
string[] row = new string [5];
while ((line = sr.ReadLine()) != null)
{
row = line.Split(',');
importingData.Add(new Transaction
{
Date = DateTime.Parse(row[0]),
Reference = row[1],
Description = row[2],
Amount = decimal.Parse(row[3]),
Category = (Category)Enum.Parse(typeof(Category), row[4])
});
}
but it's very difficult to operate on arrays in this case. Is there a better way to split the values?
Don't reinvent the wheel. Take advantage of what's already in .NET BCL.
add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL)
use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse CSV file
Here is the sample code:
using (TextFieldParser parser = new TextFieldParser(#"c:\temp\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
It works great for me in my C# projects.
Here are some more links/informations:
MSDN: Read From Comma-Delimited Text Files in Visual Basic
MSDN: TextFieldParser Class
I recommend CsvHelper from NuGet.
PS: Regarding other more upvoted answers, I'm sorry but adding a reference to Microsoft.VisualBasic is:
Ugly
Not cross-platform, because it's not available in .NETCore/.NET5 (and Mono never had very good support of Visual Basic, so it may be buggy).
My experience is that there are many different csv formats. Specially how they handle escaping of quotes and delimiters within a field.
These are the variants I have ran into:
quotes are quoted and doubled (excel) i.e. 15" -> field1,"15""",field3
quotes are not changed unless the field is quoted for some other reason. i.e. 15" -> field1,15",fields3
quotes are escaped with \. i.e. 15" -> field1,"15\"",field3
quotes are not changed at all (this is not always possible to parse correctly)
delimiter is quoted (excel). i.e. a,b -> field1,"a,b",field3
delimiter is escaped with \. i.e. a,b -> field1,a\,b,field3
I have tried many of the existing csv parsers but there is not a single one that can handle the variants I have ran into. It is also difficult to find out from the documentation which escaping variants the parsers support.
In my projects I now use either the VB TextFieldParser or a custom splitter.
Sometimes using libraries are cool when you do not want to reinvent the wheel, but in this case one can do the same job with fewer lines of code and easier to read compared to using libraries.
Here is a different approach which I find very easy to use.
In this example, I use StreamReader to read the file
Regex to detect the delimiter from each line(s).
An array to collect the columns from index 0 to n
using (StreamReader reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
//Define pattern
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
//Separating columns to array
string[] X = CSVParser.Split(line);
/* Do something with X */
}
}
CSV can get complicated real fast.
Use something robust and well-tested:
FileHelpers:
www.filehelpers.net
The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.
Another one to this list, Cinchoo ETL - an open source library to read and write CSV files
For a sample CSV file below
Id, Name
1, Tom
2, Mark
Quickly you can load them using library as below
using (var reader = new ChoCSVReader("test.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO class matching the CSV file
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can use it to load the CSV file as below
using (var reader = new ChoCSVReader<Employee>("test.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library
I use this here:
http://www.codeproject.com/KB/database/GenericParser.aspx
Last time I was looking for something like this I found it as an answer to this question.
private static DataTable ConvertCSVtoDataTable(string strFilePath)
{
DataTable dt = new DataTable();
using (StreamReader sr = new StreamReader(strFilePath))
{
string[] headers = sr.ReadLine().Split(',');
foreach (string header in headers)
{
dt.Columns.Add(header);
}
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
return dt;
}
private static void WriteToDb(DataTable dt)
{
string connectionString =
"Data Source=localhost;" +
"Initial Catalog=Northwind;" +
"Integrated Security=SSPI;";
using (SqlConnection con = new SqlConnection(connectionString))
{
using (SqlCommand cmd = new SqlCommand("spInsertTest", con))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#policyID", SqlDbType.Int).Value = 12;
cmd.Parameters.Add("#statecode", SqlDbType.VarChar).Value = "blagh2";
cmd.Parameters.Add("#county", SqlDbType.VarChar).Value = "blagh3";
con.Open();
cmd.ExecuteNonQuery();
}
}
}
Here's a solution I coded up today for a situation where I needed to parse a CSV without relying on external libraries. I haven't tested performance for large files since it wasn't relevant to my particular use case but I'd expect it to perform reasonably well for most situations.
static List<List<string>> ParseCsv(string csv) {
var parsedCsv = new List<List<string>>();
var row = new List<string>();
string field = "";
bool inQuotedField = false;
for (int i = 0; i < csv.Length; i++) {
char current = csv[i];
char next = i == csv.Length - 1 ? ' ' : csv[i + 1];
// if current character is not a quote or comma or carriage return or newline (or not a quote and currently in an a quoted field), just add the character to the current field text
if ((current != '"' && current != ',' && current != '\r' && current != '\n') || (current != '"' && inQuotedField)) {
field += current;
} else if (current == ' ' || current == '\t') {
continue; // ignore whitespace outside a quoted field
} else if (current == '"') {
if (inQuotedField && next == '"') { // quote is escaping a quote within a quoted field
i++; // skip escaping quote
field += current;
} else if (inQuotedField) { // quote signifies the end of a quoted field
row.Add(field);
if (next == ',') {
i++; // skip the comma separator since we've already found the end of the field
}
field = "";
inQuotedField = false;
} else { // quote signifies the beginning of a quoted field
inQuotedField = true;
}
} else if (current == ',') { //
row.Add(field);
field = "";
} else if (current == '\n') {
row.Add(field);
parsedCsv.Add(new List<string>(row));
field = "";
row.Clear();
}
}
return parsedCsv;
}
First of all need to understand what is CSV and how to write it.
Every next string ( /r/n ) is next "table" row.
"Table" cells is separated by some delimiter symbol. Most often used symbols is \t or ,
Every cell possibly can contain this delimiter symbol (cell must to start with quotes symbol and ends with this symbol in this case)
Every cell possibly can contains /r/n sybols (cell must to start with quotes symbol and ends with this symbol in this case)
The easiest way for C#/Visual Basic to work with CSV files is to use standard Microsoft.VisualBasic library. You just need to add needed reference, and the following string to your class:
using Microsoft.VisualBasic.FileIO;
Yes, you can use it in C#, don't worry. This library can read relatively big files and supports all of needed rules, so you will be able to work with all of CSV files.
Some time ago I had wrote simple class for CSV read/write based on this library. Using this simple class you will be able to work with CSV like with 2 dimensions array.
You can find my class by the following link:
https://github.com/ukushu/DataExporter
Simple example of using:
Csv csv = new Csv("\t");//delimiter symbol
csv.FileOpen("c:\\file1.csv");
var row1Cell6Value = csv.Rows[0][5];
csv.AddRow("asdf","asdffffff","5")
csv.FileSave("c:\\file2.csv");
To complete the previous answers, one may need a collection of objects from his CSV File, either parsed by the TextFieldParser or the string.Split method, and then each line converted to an object via Reflection. You obviously first need to define a class that matches the lines of the CSV file.
I used the simple CSV Serializer from Michael Kropat found here: Generic class to CSV (all properties)
and reused his methods to get the fields and properties of the wished class.
I deserialize my CSV file with the following method:
public static IEnumerable<T> ReadCsvFileTextFieldParser<T>(string fileFullPath, string delimiter = ";") where T : new()
{
if (!File.Exists(fileFullPath))
{
return null;
}
var list = new List<T>();
var csvFields = GetAllFieldOfClass<T>();
var fieldDict = new Dictionary<int, MemberInfo>();
using (TextFieldParser parser = new TextFieldParser(fileFullPath))
{
parser.SetDelimiters(delimiter);
bool headerParsed = false;
while (!parser.EndOfData)
{
//Processing row
string[] rowFields = parser.ReadFields();
if (!headerParsed)
{
for (int i = 0; i < rowFields.Length; i++)
{
// First row shall be the header!
var csvField = csvFields.Where(f => f.Name == rowFields[i]).FirstOrDefault();
if (csvField != null)
{
fieldDict.Add(i, csvField);
}
}
headerParsed = true;
}
else
{
T newObj = new T();
for (int i = 0; i < rowFields.Length; i++)
{
var csvFied = fieldDict[i];
var record = rowFields[i];
if (csvFied is FieldInfo)
{
((FieldInfo)csvFied).SetValue(newObj, record);
}
else if (csvFied is PropertyInfo)
{
var pi = (PropertyInfo)csvFied;
pi.SetValue(newObj, Convert.ChangeType(record, pi.PropertyType), null);
}
else
{
throw new Exception("Unhandled case.");
}
}
if (newObj != null)
{
list.Add(newObj);
}
}
}
}
return list;
}
public static IEnumerable<MemberInfo> GetAllFieldOfClass<T>()
{
return
from mi in typeof(T).GetMembers(BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static)
where new[] { MemberTypes.Field, MemberTypes.Property }.Contains(mi.MemberType)
let orderAttr = (ColumnOrderAttribute)Attribute.GetCustomAttribute(mi, typeof(ColumnOrderAttribute))
orderby orderAttr == null ? int.MaxValue : orderAttr.Order, mi.Name
select mi;
}
I'd highly suggest using CsvHelper.
Here's a quick example:
public class csvExampleClass
{
public string Id { get; set; }
public string Firstname { get; set; }
public string Lastname { get; set; }
}
var items = DeserializeCsvFile<List<csvExampleClass>>( csvText );
public static List<T> DeserializeCsvFile<T>(string text)
{
CsvReader csv = new CsvReader( new StringReader( text ) );
csv.Configuration.Delimiter = ",";
csv.Configuration.HeaderValidated = null;
csv.Configuration.MissingFieldFound = null;
return (List<T>)csv.GetRecords<T>();
}
Full documentation can be found at: https://joshclose.github.io/CsvHelper

Regular Expression with Lambda Expression

I've got several text files which should be tab delimited, but actually are delimited by an arbitrary number of spaces. I want to parse the rows from the text file into a DataTable (the first row of the text file has headers for property names). This got me thinking about building an extensible, easy way to parse text files. Here's my current working solution:
string filePath = #"C:\path\lowbirthweight.txt";
//regex to remove multiple spaces
Regex regex = new Regex(#"[ ]{2,}", RegexOptions.Compiled);
DataTable table = new DataTable();
var reader = ReadTextFile(filePath);
//headers in first row
var headers = reader.First();
//skip headers for data
var data = reader.Skip(1).ToArray();
//remove arbitrary spacing between column headers and table data
headers = regex.Replace(headers, #" ");
for (int i = 0; i < data.Length; i++)
{
data[i] = regex.Replace(data[i], #" ");
}
//make ready the DataTable, split resultant space-delimited string into array for column names
foreach (string columnName in headers.Split(' '))
{
table.Columns.Add(new DataColumn() { ColumnName = columnName });
}
foreach (var record in data)
{
//split into array for row values
table.Rows.Add(record.Split(' '));
}
//test prints correctly to the console
Console.WriteLine(table.Rows[0][2]);
}
static IEnumerable<string> ReadTextFile(string fileName)
{
using (var reader = new StreamReader(fileName))
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
In my project I've already received several large (gig +) text files that are not in the format in which they are purported to be. So can I see having to write methods such as these with some regularity, albeit with a different regular expression. Is there a way to do something like
data =data.SmartRegex(x => x.AllowOneSpace) where I can use a regular expression to iterate over the collection of strings?
Is something like the following on the right track?
public static class SmartRegex
{
public static Expression AllowOneSpace(this List<string> data)
{
//no idea how to return an expression from a method
}
}
I'm not too overly concerned with performance, just would like to see how something like this works
You should consult with your data source and find out why your data is bad.
As for the API design that you are trying to implement:
public class RegexCollection
{
private readonly Regex _allowOneSpace = new Regex(" ");
public Regex AllowOneSpace { get { return _allowOneSpace; } }
}
public static class RegexExtensions
{
public static IEnumerable<string[]> SmartRegex(
this IEnumerable<string> collection,
Func<RegexCollection, Regex> selector
)
{
var regexCollection = new RegexCollection();
var regex = selector(regexCollection);
return collection.Select(l => regex.Split(l));
}
}
Usage:
var items = new List<string> { "Hello world", "Goodbye world" };
var results = items.SmartRegex(x => x.AllowOneSpace);

Categories

Resources