Hey. I have this javascript file that I'm getting off the web and it consists of basically several large javascript arrays. Since I'm a .net developer I'd like for this array to be accessible through c# so I'm wondering if there are any codeplex contributions or any other methods that I could use to turn the javascript array into a c# array that I could work with from my c# code.
like:
var roomarray = new Array(194);
var modulearray = new Array(2055);
var progarray = new Array(160);
var staffarray = new Array(3040);
var studsetarray = new Array(3221);
function PopulateFilter(strZoneOrDept, cbxFilter) {
var deptarray = new Array(111);
for (var i=0; i<deptarray.length; i++) {
deptarray[i] = new Array(1);
}
deptarray[0] [0] = "a/MPG - Master of Public Governance";
deptarray[0] [1] = "a/MPG - Master of Public Governance";
deptarray[1] [0] = "a/MBA_Flex MBA 1";
deptarray[1] [1] = "a/MBA_Flex MBA 1";
deptarray[2] [0] = "a/MBA_Flex MBA 2";
deptarray[2] [1] = "a/MBA_Flex MBA 2";
deptarray[3] [0] = "a/cand.oecon";
deptarray[3] [1] = "a/cand.oecon";
and so forth
This is what I'm thinking after overlooking the suggestions:
Retrieve the javascript file in my c# code by making an httprequest for it
paste it together with some code i made myself
from c# call an execute on a javascript function selfmade function that will turn the javascript array into json (with help from json.org/json2.js), and output it to a new file
retrieve the new file in c# parsing the json with the DataContractJsonSerializer resulting hopefully resulting in a c# array
does it sound doable to you guys?
I'm not in front of a computer with c# right now so I'm not able to fully try this.
What you're going to need to do #Jakob is the following:
Write a parser that will download the file and store it in memory.
For each section that you want to "parse" into a c# array (for example zonearray), you need to setup bounds to begin searching and end searching the file. Example: We know that zonearray starts building the array the two lines after zonearray[i] = new Array(1); and ends on zonearray.sort().
So with these bounds we can then zip through each line between and parse a C# array. This is simple enough I think that you can figure out. You'll need to keep track of sub-index as well remember.
Repeat this 2-3 for each array you want to parse (zonearray, roomarray..etc).
If you can't quite figure out how to code the bounds or how to parse the line and dump them into arrays, I might be able to write something tomorrow (even though it's a holiday here in Canada).
EDIT: It should be noted that you can't use some JSON parser for this; you have to write your own. It's not really that difficult to do, you just need to break it into small steps (first figure out how to zip through each line and find the right "bounds").
HTH
EDIT: I just spent ~20 minutes writing this up for you. It should parse the file and load each array into a List<string[]>. I've heavily commented it so you can see what's going on. If you have any questions, don't hesitate to ask. Cheers!
private class SearchBound
{
public string ArrayName { get; set; }
public int SubArrayLength { get; set; }
public string StartBound { get; set; }
public int StartOffset { get; set; }
public string EndBound { get; set; }
}
public static void Main(string[] args)
{
//
// NOTE: I used FireFox to determine the encoding that was used.
//
List<string> lines = new List<string>();
// Step 1 - Download the file and dump all the lines of the file to the list.
var request = WebRequest.Create("http://skema.ku.dk/life1011/js/filter.js");
using (var response = request.GetResponse())
using(var stream = response.GetResponseStream())
using(var reader = new StreamReader(stream, Encoding.GetEncoding("ISO-8859-1")))
{
string line = null;
while ((line = reader.ReadLine()) != null)
{
lines.Add(line.Trim());
}
Console.WriteLine("Download Complete.");
}
var deptArrayBounds = new SearchBound
{
ArrayName = "deptarray", // The name of the JS array.
SubArrayLength = 2, // In the JS, the sub array is defined as "new Array(X)" and should always be X+1 here.
StartBound = "deptarray[i] = new Array(1);",// The line that should *start* searching for the array values.
StartOffset = 1, // The StartBound + some number line to start searching the array values.
// For example: the next line might be a '}' so we'd want to skip that line.
EndBound = "deptarray.sort();" // The line to stop searching.
};
var zoneArrayBounds = new SearchBound
{
ArrayName = "zonearray",
SubArrayLength = 2,
StartBound = "zonearray[i] = new Array(1);",
StartOffset = 1,
EndBound = "zonearray.sort();"
};
var staffArrayBounds = new SearchBound
{
ArrayName = "staffarray",
SubArrayLength = 3,
StartBound = "staffarray[i] = new Array(2);",
StartOffset = 1,
EndBound = "staffarray.sort();"
};
List<string[]> deptArray = GetArrayValues(lines, deptArrayBounds);
List<string[]> zoneArray = GetArrayValues(lines, zoneArrayBounds);
List<string[]> staffArray = GetArrayValues(lines, staffArrayBounds);
// ... and so on ...
// You can then use deptArray, zoneArray etc where you want...
Console.WriteLine("Depts: " + deptArray.Count);
Console.WriteLine("Zones: " + zoneArray.Count);
Console.WriteLine("Staff: " + staffArray.Count);
Console.ReadKey();
}
private static List<string[]> GetArrayValues(List<string> lines, SearchBound bound)
{
List<string[]> values = new List<string[]>();
// Get the enumerator for the lines.
var enumerator = lines.GetEnumerator();
string line = null;
// Step 1 - Find the starting bound line.
while (enumerator.MoveNext() && (line = enumerator.Current) != bound.StartBound)
{
// Continue looping until we've found the start bound.
}
// Step 2 - Skip to the right offset (maybe skip a line that has a '}' ).
for (int i = 0; i <= bound.StartOffset; i++)
{
enumerator.MoveNext();
}
// Step 3 - Read each line of the array.
while ((line = enumerator.Current) != bound.EndBound)
{
string[] subArray = new string[bound.SubArrayLength];
// Read each sub-array value.
for (int i = 0; i < bound.SubArrayLength; i++)
{
// Matches everything that is between an equal sign then the value
// wrapped in quotes ending with a semi-colon.
var m = Regex.Matches(line, "^(.* = \")(.*)(\";)$");
// Get the matched value.
subArray[i] = m[0].Groups[2].Value;
// Move to the next sub-item if not the last sub-item.
if (i < bound.SubArrayLength - 1)
{
enumerator.MoveNext();
line = enumerator.Current;
}
}
// Add the sub-array to the list of values.
values.Add(subArray);
// Move to the next line.
if (!enumerator.MoveNext())
{
break;
}
}
return values;
}
If I understand your question right, you are asking whether you can execute JavaScript code from C#, and then pass the result (which in your example would be a JavaScript Array object) into C# code.
The answer is: Of course it’s theoretically possible, but you would need to have an actual JavaScript interpreter to execute the JavaScript. You’ll have to find one or write your own, but given that JavaScript is a full-blown programming language, and writing interpreters for such a large and full-featured programming language is quite an undertaking, I suspect that you won’t find a complete ready-made solution, nor will you be able to write one unless your dedication exceeds that of all other die-hard C#-and-JavaScript fans worldwide.
However, with a bit of trickery, you might be able to coerce an existing JavaScript interpreter to do what you want. For obvious reasons, all browsers have such an interpreter, including Internet Explorer, which you can access using the WinForms WebBrowser control. Thus, you could try the following:
Have your C# code generate an HTML file containing the JavaScript you downloaded plus some JavaScript that turns it into JSON (you appear to have already found something that does this) and outputs it in the browser.
Open that HTML file in the WebBrowser control, have it execute the JavaScript, and then read the contents of the website back, now that it contains the result of the executed JavaScript.
Turn the JSON into a C# array using DataContractJsonSerializer as you suggested.
This is a pretty roundabout way to do it, but it is the best I can think of.
I have to wonder, though, why you are retrieving a JavaScript file from the web in the first place. What generates this JavaScript file? Whatever generates it, surely could generate some properly readable stuff instead (e.g. an XML file)? If it is not generated but written by humans, then why is it written in JavaScript instead of XML, CSV, or some other data format? Hopefully with these thoughts you might be able to find a solution that doesn’t require JavaScript trickery like the above.
Easiest solution is to just execute the Javascript function that makes the array. Include there a function that makes it an JSON (http://www.json.org/js.html). After that make a XMLHttpRequest (AJAX) to the server and from there extract the JSON to a custom class.
If I may use jQuery, here's an example of the needed Javascript:
var myJSONText = JSON.stringify(deptarray);
(function($){
$.ajax({
type: "POST",
url: "some.aspx",
data: myJSONText,
success: function(msg){
alert( "Data Saved: " + msg );
}
});
})(jQuery);
Only now need some code to rip the JSON string to an C# Array.
EDIT:
After looking around a bit, I found Json.NET: http://json.codeplex.com/
There are also a lot of the same questions on Stackoverflow that ask the same.
Related
I have a load of data files in numpy .npz format written from python.
I want to read them directly into C# for a few reasons.
The data files contain a number of 1D arrays of different types - some will by byte arrays, and other double arrays.
Can anyone give me some advice on how to achieve this? Or otherwise what I might be doing wrong below?
I have tried using Accord.NET.NPZFormat but can't figure out how to make it work. I think probably because you have to give it a type to return, and because the arrays are of different types it fails.
Here is a link to it:
http://accord-framework.net/docs/html/M_Accord_IO_NpzFormat_Load__1.htm
I am struggling with syntax here, unsure of what to use as "T". The closest I have got is with the following, but doesn't seem to have any data in the result. Accord.IO has no example code.
public static void LoadNPZ(string zip_file, string npz_file)
{
byte[] ret = new byte[0];
using (ZipArchive zip = ZipFile.OpenRead(zip_file))
{
foreach (ZipArchiveEntry entry in zip.Entries)
{
if (entry.Name == npz_file + ".npz")
{
Stream fs = entry.Open();
ret = new byte[fs.Length];
fs.Read(ret, 0, (int)fs.Length);
}
}
}
if (ret.Length==0)
{
return;
}
var ret2 = NpzFormat.Load<object[]>(ret);
};
You can use the NumSharp library.
Let say you have this data created in Python.
import numpy as np
arr = np.array([1,2,3,4])
single = arr.astype(np.single)
double = arr.astype(np.double)
np.savez('single.npz', data=single)
np.savez('double.npz', data=double)
The C# code to read them is below.
using NumSharp;
var singleContent = np.Load_Npz<float[]>("single.npz"); // type is NpzDictionary
var singleArray = singleContent["data.npy"]; // type is float[]
var doubleContent = np.Load_Npz<double[]>("double.npz"); // type is NpzDictionary
var doubleArray = doubleContent["data.npy"]; // type is double[]
If you don't specify name for your array then the default name is arr_0, and the C# code would be like this.
var singleArray = singleContent["arr_0.npy"];
var doubleArray = doubleContent["arr_0.npy"];
Note that NumSharp has the following limitation.
The size of each dimension must be smaller than 2,147,483,591 bytes. Example: for integer (4 bytes), each dimension must have less than 536,870,898 elements.
If you are using .NET Framework, the maximum array size is 2GB (all dimensions considered). On 64-bit platform this limit can be avoided by enabling the gcAllowVeryLargeObjects flag.
More information can be found in this answer and this blog post (Disclaimer: I'm the author of both of them).
I work with C# and python quite a bit, and my reccomendation is to create a COM Server
http://timgolden.me.uk/pywin32-docs/html/com/win32com/HTML/QuickStartServerCom.html
then in python you could simply have something like
import numpy as np
class NPtoCSharp:
_reg_clsid_ = "{7CC9F362-486D-11D1-BB48-0000E838A65F}"
_public_methods_ = ['load_file']
_public_attrs_ = ['arr', 'the_file']
_reg_desc_ = "Python NPZ Loader"
_reg_progid_ = "NPtoCSharp"
def __init__(self):
self.arr = None
self.the_file = None
def load_file(self):
self.arr = np.load(self.the_file)
return self.arr
Then in C#
public void init_python()
{
Type NPtoCSharp = Type.GetTypeFromProgID("NPtoCSharp");
NPtoCSharpInst = Activator.CreateInstance(NPtoCSharp);
NPtoCSharpInst.the_file = 'myfile.npz';
}
Not complete but I hope you get the idea.
A little background. I am new to using C# in a professional setting. My experience is mainly in SQL. I have a file that I need to parse through to pull out certain pieces of information. I can figure out how to parse through each line, but have gotten stuck on searching for specific pieces of information. I am not interested in someone finishing this code for me. Instead, I am interested in pointers on where I can go from here.
Here is an example of the code I have written.
class Program
{
private static Dictionary<string, List<string>> _arrayLists = new Dictionary<string, List<string>>();
static void Main(string[] args)
{
string filePath = "c:\\test.txt";
StreamReader reader = new StreamReader(filePath);
string line;
while (null !=(line = reader.ReadLine()))
{
if (line.ToLower().Contains("disconnected"))
{
// needs to continue on search for Disconnected or Subscribed
}
else
{
if (line.ToLower().Contains("subscribed"))
{
// program needs to continue reading file
// looking for and assigning values to
// dvd, cls, jhd, dxv, hft
// records start at Subscribed and end at ;
}
}
}
}
}
A little bit of explanation of the file. I basically need to pull data existing between the word Subscribed and the first ; i come to. Specifically I need to take the values such as dvd = 234 and assign them to their same variables in the code. Not every record will have the same variables.
Here is an example of the text file that I need to parse through.
test information
annoying information
Subscribed more annoying info
more annoying info
dvd = 234,
cls = 453,
jhd = 567,
more annoying info
more annoying info
dxv = 456,
hft = 876;
more annoying info
test information
annoying information
Subscribed more annoying info
more annoying info
dvd = 234,
cls = 455,
more annoying info
more annoying info
dxv = 456,
hft = 876,
jjd = 768;
more annoying info
test information
annoying information
Disconnected more annoying info
more annoying info
more annoying info
Edit
My apologies on the vague question. I have to learn how to ask better questions.
My thought process was to make sure the program associated all the details between subscribed and the ; as one record. I think the part that I am confused on is in reading the lines. In my head I see the loop reading the line Subscribed, and then going into a method and reading the next line and assigning the value, and so on until it hits the ;. Once that was done I am trying to figure out how to tell the program to exit that method, but to continue reading from the line right after the semi-colon. Perhaps I am over thinking this.
I will take the advice I have been give and see what I can come up with to solve this. Thank you.
From you question as it is now it is not clear what specific problem you are struggling with. I'd suggest you edit your question providing specific challenges you'd like to overcome. currently you problem statement is "have gotten stuck on searching for specific pieces of information". This is as unspecific as it can get.
Having said that I'll try to help you.
First, you will never get into an if like that:
line.ToLower().Contains("Disconnected")
Here you convert all the characters to lower case, and then you are trying to find a substring with capital "D" in it. The expression above will (almost) always evaluate to false.
Secondly, in order for your application to do what you want to do it needs to track the current parsing state. I'm going to ignore the "Disconnected" bit now, as you have not shown what significance it has.
I'll be assuming that you are trying to find everything between Subscribed and first semicolon in the file. I'll also make a couple of other assumption regarding to what can constitute a string, which I won't list here. These can be wrong, but this is my best guess given the information you've provided.
You program will start in a state "looking for subscription". You already set up the read loop, which is good. In this loop you read lines of the file, and you find one that contains word Subscription.
Once you found such line your parser need to move to "parsing subscription" state. In this state, when you read lines you look for lines like jjd = 768, perhaps with a semicolon in the end. You can check if the line match a pattern by using Regular Expressions.
Regular Expressions also can divide match to capturing groups, so that you can extract the name (jjd) and the value (768) separately. Presences or absence of the semicolon could be another RegEx group.
Note that RegEx is not the only way to handle this, but this is the first that comes to mind.
You then keeping matching the lines to your regex and extracting names and values until you come across the semicolon, at which point you switch back to "looking for subscription" state.
You use the current state, to decide how to process the next read line.
You continue until the end of the file.
Generally you want to read up on parsing.
Hope this helps.
As with all code solutions to problems there are many possible ways to achieve what you are looking for. Some will work better then others. Below is one way that could help point you in the right direction.
You can check if the string starts with a keyword or value such as "dvd" (see MSDN String.StartsWith).
If it does then you can split the string into an array of parts (see MSDN String.Split).
You can then get the values of each part from the string array using the index of the value you want.
Do what you need to with the value retrieved.
Continue checking each line for your key business rules (ie. The semicolon that will end the section). Maybe you could check the last character of the string. (see String.EndsWith)
When processing text files containing semi-structured data, state variables can simplify the algorithm. In the code below, a boolean state variable isInRecord is used to track when a line is in a record.
using System;
using System.Collections.Generic;
using System.IO;
namespace ConsoleApplication19
{
public class Program
{
private readonly static String _testData = #"
test information
annoying information
Subscribed more annoying info
more annoying info
dvd = 234,
cls = 453,
jhd = 567,
more annoying info
more annoying info
dxv = 456,
hft = 876;
more annoying info
test information
annoying information
Subscribed more annoying info
more annoying info
dvd = 234,
cls = 455,
more annoying info
more annoying info
dxv = 456,
hft = 876,
jjd = 768;
more annoying info
test information
annoying information
Disconnected more annoying info
more annoying info
more annoying info";
public static void Main(String[] args)
{
/* Create a temporary file containing the test data. */
var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData), Path.GetRandomFileName());
File.WriteAllText(testFile, _testData);
try
{
var p = new Program();
var records = p.GetRecords(testFile);
foreach (var kvp in records)
{
Console.WriteLine("Record #" + kvp.Key);
foreach (var entry in kvp.Value)
{
Console.WriteLine(" " + entry);
}
}
}
finally
{
File.Delete(testFile);
}
}
private Dictionary<String, List<String>> GetRecords(String path)
{
var results = new Dictionary<String, List<String>>();
var recordNumber = 0;
var isInRecord = false;
using (var reader = new StreamReader(path))
{
String line;
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.StartsWith("Disconnected"))
{
// needs to continue on search for Disconnected or Subscribed
isInRecord = false;
}
else if (line.StartsWith("Subscribed"))
{
// program needs to continue reading file
// looking for and assigning values to
// dvd, cls, jhd, dxv, hft
// records start at Subscribed and end at ;
isInRecord = true;
recordNumber++;
}
else if (isInRecord)
{
// Check if the line has a general format of "something = something".
var parts = line.Split("=".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
if (parts.Length != 2)
continue;
// Update the relevant dictionary key, or add a new key.
List<String> entries;
if (results.TryGetValue(recordNumber.ToString(), out entries))
entries.Add(line);
else
results.Add(recordNumber.ToString(), new List<String>() { line });
// Determine if the isInRecord state variable should be toggled.
var lastCharacter = line[line.Length - 1];
if (lastCharacter == ';')
isInRecord = false;
}
}
}
return results;
}
}
}
I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}
You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.
There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);
First let me start by thanking you all for being part of this site, I have already gained so much helpful information from it. including some basic parsing of text files in to Arrays, but i now want to go a step further.
I have a text file that looks some thing like this
Start Section 1 - foods
apple
bannana
pear
pineapple
orange
end section 1
Start section 2 - animals
dog
cat
horse
cow
end section 2
what I want to do is using a single read of the file copy the data from section 1 in to an array called "foods" and section 2 in to and array called "animals"
now I can get it to work by using a new loop for each section, closing and reopening the file each time, looping till I find the section I want and creating the array.
But I was thinking there must be a way to read each section in to a separate array in one go saving time.
so my current code is
List<string> typel = new List<string>();
using (StreamReader reader = new StreamReader("types.txt")) // opens file using streamreader
{
string line; // reads line by line in to varible "line"
while ((line = reader.ReadLine()) != null) // loops untill it reaches an empty line
{
typel.Add(line); // adds the line to the list varible "typel"
}
}
Console.WriteLine(typel[1]); // test to see if list is beeing incremented
string[] type = typel.ToArray(); //converts the list to a true array
Console.WriteLine(type.Length); // returns the number of elements of the array created.
which is for a simple text file with no sections just list of values, using list seemed a good way to deal with unknown lengths of arrays.
I was also wondering how to deal with the first value.
for example if i do
while ((line = reader.ReadLine()) != Start Section 1 - foods)
{
}
while ((line = reader.ReadLine()) != end Section 1)
{
foods.Add(line);
}
...
....
I end up with the "start Section 1 - foods" as one of the array elements. I can remove it with code but is there an easy way to avoid this so only the list items get populated?
Cheers and once again thanks for all the help. Its great to be getting back in to programming after many many years.
Aaron
Reading the lines is not the issue, see System.IO.ReadAllLines(fileName) and its siblings.
What you need is a (very simple) interpreter:
// totally untested
Dictionary<string, List<string>> sections = new Dictionary<string, List<string>>();
List<string> section = null;
foreach(string line in GetLines())
{
if (IsSectionStart(line))
{
string name = GetSectionName(line);
section = new List<string>();
sections.Add(name, section);
}
else if (IsSectionEnd(line))
{
section = null; // invite exception when we're lost
}
else
{
section.Add(line);
}
}
...
List<string> foods = sections ["foods"];
Look for pointers for start and end. This is where you start putting things into arrays, lists, etc.
Here is a stab at making it very flexible:
class Program
{
private static Dictionary<string, List<string>> _arrayLists = new Dictionary<string, List<string>>();
static void Main(string[] args)
{
string filePath = "c:\\logs\\arrays.txt";
StreamReader reader = new StreamReader(filePath);
string line;
string category = "";
while (null != (line = reader.ReadLine()))
{
if (line.ToLower().Contains("start"))
{
string[] splitHeader = line.Split("-".ToCharArray());
category = splitHeader[1].Trim();
}
else
{
if (!_arrayLists.ContainsKey(category))
{
List<string> stringList = new List<string>();
_arrayLists.Add(category, stringList);
}
if((!line.ToLower().Contains("end")&&(line.Trim().Length > 0)))
{
_arrayLists[category].Add(line.Trim());
}
}
}
//testing
foreach(var keyValue in _arrayLists)
{
Console.WriteLine("Category: {0}",keyValue.Key);
foreach(var value in keyValue.Value)
{
Console.WriteLine("{0}".PadLeft(5, ' '), value);
}
}
Console.Read();
}
}
To add to the other answers, if you don't want to parse the text file yourself, you could always use a quick and dirty regular expression if you're comfortable with them:
var regex = new Regex(#"Start Section \d+ - (?<section>\w+)\r\n(?<list>[\w\s]+)End Section", RegexOptions.IgnoreCase);
var data = new Dictionary<string, List<string>>();
foreach (Match match in regex.Matches(File.ReadAllText("types.txt")))
{
string section = match.Groups["section"].Value;
string[] items = match.Groups["list"].Value.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
data.Add(section, new List<string>(items));
}
// data["animals"] now contains a list of "dog", "cat", "horse", and "cow"
In response to the comment:
but "list" sounds so simple and basic
(like i am going shopping), array has
much nicer ring to it ;) But I will
look in to them maybe a bit more, I
got the impression from my research
that arrays are more efficent code?
It's not about whether a list vs. array is "basic" or "has a nicer ring", it's about the purpose of the code. In your case, you're iterating a file line-by-line and adding items to a collection of an unknown size beforehand - which is one problem a list was designed to solve. Of course you could peek through the file and determine the exact size, but is doing that worth the extra "efficiency" you get from using an array, and is iterating the file twice going to take longer than using a list in the first place? You don't know unless you profile your code and conclude that specific portion is a bottleneck... which I'll say, will almost never be the case.
Uhmmm, like this?
//converting it to array called allLines, elements/index per line
string[] allLines = File.ReadAllLines("types.txt").ToArray();
//getting the index of allLines that contains "Start Section 1" and "end section 1"
int[] getIndexes = new int[] { Array.FindIndex(allLines, start => start.Contains("Start Section 1")), Array.FindIndex(allLines, start => start.Contains("end section 1")) };
//create list to get indexes of the list(apple,banana, pear, etc...)
List<int> indexOfList = new List<int>();
//get index of the list(apple,banana, pear,etc...)
for (int i = getIndexes[0]; i < getIndexes[1]; i++)
{
indexOfList.Add(i);
}
//remove the index of the element or line "Start Section 1"
indexOfList.RemoveAt(0);
//final list
string[] foodList = new string[]{ allLines[indexOfList[0]], allLines[indexOfList[1]], and so on...};
Then you can call them or edit then save.
//call them
Console.Writeline(foodList[0] + "\n" + foodList[1] + ...)
//edit the list
allLines[indexOfList[0]] = "chicken"; //from apple to chicken
allLines[indexOfList[1]] = "egg"; //from banana to egg
//save lines
File.WriteAllLines("types.txt", allLines);
I am trying to import a file with multiple record definition in it. Each one can also have a header record so I thought I would define a definition interface like so.
public interface IRecordDefinition<T>
{
bool Matches(string row);
T MapRow(string row);
bool AreRecordsNested { get; }
GenericLoadClass ToGenericLoad(T input);
}
I then created a concrete implementation for a class.
public class TestDefinition : IRecordDefinition<Test>
{
public bool Matches(string row)
{
return row.Split('\t')[0] == "1";
}
public Test MapColumns(string[] columns)
{
return new Test {val = columns[0].parseDate("ddmmYYYY")};
}
public bool AreRecordsNested
{
get { return true; }
}
public GenericLoadClass ToGenericLoad(Test input)
{
return new GenericLoadClass {Value = input.val};
}
}
However for each File Definition I need to store a list of the record definitions so I can then loop through each line in the file and process it accordingly.
Firstly am I on the right track
or is there a better way to do it?
I would split this process into two pieces.
First, a specific process to split the file with multiple types into multiple files. If the files are fixed width, I have had a lot of luck with regular expressions. For example, assume the following is a text file with three different record types.
TE20110223 A 1
RE20110223 BB 2
CE20110223 CCC 3
You can see there is a pattern here, hopefully the person who decided to put all the record types in the same file gave you a way to identify those types. In the case above you would define three regular expressions.
string pattern1 = #"^TE(?<DATE>[0-9]{8})(?<NEXT1>.{2})(?<NEXT2>.{2})";
string pattern2 = #"^RE(?<DATE>[0-9]{8})(?<NEXT1>.{3})(?<NEXT2>.{2})";
string pattern3 = #"^CE(?<DATE>[0-9]{8})(?<NEXT1>.{4})(?<NEXT2>.{2})";
Regex Regex1 = new Regex(pattern1);
Regex Regex2 = new Regex(pattern2);
Regex Regex3 = new Regex(pattern3);
StringBuilder FirstStringBuilder = new StringBuilder();
StringBuilder SecondStringBuilder = new StringBuilder();
StringBuilder ThirdStringBuilder = new StringBuilder();
string Line = "";
Match LineMatch;
FileInfo myFile = new FileInfo("yourFile.txt");
using (StreamReader s = new StreamReader(f.FullName))
{
while (s.Peek() != -1)
{
Line = s.ReadLine();
LineMatch = Regex1.Match(Line);
if (LineMatch.Success)
{
//Write this line to a new file
}
LineMatch = Regex2.Match(Line);
if (LineMatch.Success)
{
//Write this line to a new file
}
LineMatch = Regex3.Match(Line);
if (LineMatch.Success)
{
//Write this line to a new file
}
}
}
Next, take the split files and run them through a generic process, that you most likely already have, to import them. This works well because when the process inevitably fails, you can narrow it to the single record type that is failing and not impact all the record types. Archive the main text file along with the split files and your life will be much easier as well.
Dealing with these kinds of transmitted files is hard, because someone else controls them and you never know when they are going to change. Logging the original file as well as a receipt of the import is very import and shouldn't be overlooked either. You can make that as simple or as complex as you want, but I tend to write a receipt to a db and copy the primary key from that table into a foreign key in the table I have imported the data into, then never change that data. I like to keep a unmolested copy of the import on the file system as well as on the DB server because there are inevitable conversion / transformation issues that you will need to track down.
Hope this helps, because this is not a trivial task. I think you are on the right track, but instead of processing/importing each line separately...write them to a separate file. I am assuming this is financial data, which is one of the reasons I think provability at every step is important.
I think the FileHelpers library solves a number of your problems:
Strong types
Delimited
Fixed-width
Record-by-Record operations
I'm sure you could consolidate this into a type hierarchy that could tie in custom binary formats as well.
Have you looked at something using Linq? This is a quick example of Linq to Text and Linq to Csv.
I think it would be much simpler to use "yield return" and IEnumerable to get what you want working. This way you could probably get away with only having 1 method on your interface.