I have a file that I am reading in, splitting up into different lists and outputting them into RichTextBox to then be read into 3 different Listboxes. I am currently doing all of this, however I have come across something I do not know how to fix/work around.
My code is below and I seem to be having trouble understanding why it fails to properly work when it gets to the Match twoRegex = Regex.Match(...) section of the code.
CODE:
private void SortDataLines()
{
try
{
// Reads the lines in the file to format.
var fileReader = File.OpenText(openGCFile.FileName);
// Creates a list for the lines to be stored in.
var placementUserDefinedList = new List<string>();
// Reads the first line and does nothing with it.
fileReader.ReadLine();
// Adds each line in the file to the list.
while (true)
{
var line = fileReader.ReadLine();
if (line == null)
break;
placementUserDefinedList.Add(line);
}
// Creates new lists to hold certain matches for each list.
var oneResult = new List<string>();
var twoResult = new List<string>();
var mainResult = new List<string>();
foreach (var userLine in placementUserDefinedList)
mainResult.Add(string.Join(" ", userLine));
foreach (var oneLine in mainResult)
{
// PLACEMENT ONE Regex
Match oneRegex = Regex.Match(oneLine, #"^.+(RES|0402|0201|0603|0805|1206|1306|1608|3216|2551"
+ #"|1913|1313|2513|5125|2525|5619|3813|1508|6431|2512|1505|2208|1005|1010|2010|0505|0705"
+ #"|1020|1812|2225|5764|4532|1210|0816|0363|SOT)");
if (oneRegex.Success)
oneResult.Add(string.Join(" ", oneLine));
}
//
// THIS IS THE SECTION THAT FAILS..
//
foreach(var twoLine in mainResult)
{
//PLACEMENT TWO Regex
Match twoRegex = Regex.Match(twoLine, #"^.+(BGA|SOP8|QSOP|TQSOP|SOIC16|SOIC12|SOIC8|SO8|SO08"
+ #"CQFP|LCC|LGA|OSCCC|PLCC|QFN|QFP|SOJ|SON");
if (twoRegex.Success)
twoResult.Add(string.Join(" ", twoLine));
}
// Removes the matched values from both of the Regex used above.
List<string> userResult = mainResult.Except(oneResult).ToList();
userResult = userResult.Except(twoResult).ToList();
// Prints the proper values into the assigned RichTextBoxes.
foreach (var line in userResult)
userDefinedRichTextBox.AppendText(line + "\n");
foreach (var line in oneResult)
placementOneRichTextBox.AppendText(line + "\n");
foreach (var line in twoResult)
placementTwoRichTextBox.AppendText(line + "\n");
}
// Catches an exception if the file was not opened.
catch (Exception)
{
MessageBox.Show("Could not match any regex values.", "Regular Expression Match Error",
MessageBoxButtons.OK, MessageBoxIcon.Warning);
}
}
QUESTIONS:
Does anyone understand why I am unable to to find, or fail at, the second set of REGEX?
With that, is there a way to fix it?
Suggestions please! :)
Haven't you missed the pipeline character in your second regex between two lines?
Match twoRegex = Regex.Match(twoLine, #"^.+(BGA|SOP8|QSOP|TQSOP|SOIC16|SOIC12|SOIC8|SO8|SO08"
+ #"|CQFP|LCC|LGA|OSCCC|PLCC|QFN|QFP|SOJ|SON)");
Related
I'm building a simple dictionary from a reg file (export from Windows Regedit). The .reg file contains a key in square brackets, followed by zero or more lines of text, followed by a blank line. This code will create the dictionary that I need:
var a = File.ReadLines("test.reg");
var dict = new Dictionary<String, List<String>>();
foreach (var key in a) {
if (key.StartsWith("[HKEY")) {
var iter = a.GetEnumerator();
var value = new List<String>();
do {
iter.MoveNext();
value.Add(iter.Current);
} while (String.IsNullOrWhiteSpace(iter.Current) == false);
dict.Add(key, value);
}
}
I feel like there is a cleaner (prettier?) way to do this in a single Linq statement (using a group by), but it's unclear to me how to implement the iteration of the value items into a list. I suspect I could do the same GetEnumerator in a let statement but it seems like there should be a way to implement this without resorting to an explicit iterator.
Sample data:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.msu]
#="Microsoft.System.Update.1"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS]
#="WMP11.AssocFile.M2TS"
"Content Type"="video/vnd.dlna.mpeg-tts"
"PerceivedType"="video"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\OpenWithProgIds]
"WMP11.AssocFile.M2TS"=hex(0):
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx\{BB2E617C-0920-11D1-9A0B-00C04FC2D6C1}]
#="{9DBD2C50-62AD-11D0-B806-00C04FD706EC}"
Update
I'm sorry I need to be more specific. The files am looking at around ~300MB so I took the approach I did to keep the memory footprint down. I'd prefer an approach that doesn't require pulling the entire file into memory.
You can always use Regex:
var dict = new Dictionary<String, List<String>>();
var a = File.ReadAllText(#"test.reg");
var results = Regex.Matches(a, "(\\[[^\\]]+\\])([^\\[]+)\r\n\r\n", RegexOptions.Singleline);
foreach (Match item in results)
{
dict.Add(
item.Groups[1].Value,
item.Groups[2].Value.Split(new[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries).ToList()
);
}
I whipped this out real quick. You might be able to improve the regex pattern.
Instead of using GetEnumerator you can take advantage of TakeWhile and Split methods to break your list into smaller list (each sublist represents one key and its values)
var registryLines = File.ReadLines("test.reg");
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
while (registryLines.Count() > 0)
{
// Take the key and values into a single list
var keyValues = registryLines.TakeWhile(x => !String.IsNullOrWhiteSpace(x)).ToList();
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyValues != null && keyValues.Count > 0)
resultKeys.Add(keyValues[0], keyValues.Skip(1).ToList());
// Jumps to the next registry (+1 to skip the blank line)
registryLines = registryLines.Skip(keyValues.Count + 1);
}
EDIT based on your update
Update I'm sorry I need to be more specific. The files am looking at
around ~300MB so I took the approach I did to keep the memory
footprint down. I'd prefer an approach that doesn't require pulling
the entire file into memory.
Well, if you can't read the whole file into memory, it makes no sense to me asking for a LINQ solution. Here is a sample of how you can do it reading line by line (still no need for GetEnumerator)
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
using (StreamReader reader = File.OpenText("test.reg"))
{
List<string> keyAndValues = new List<string>();
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
// Adds key and values to a list until it finds a blank line
if (!string.IsNullOrWhiteSpace(line))
keyAndValues.Add(line);
else
{
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyAndValues != null && keyAndValues.Count > 0)
resultKeys.Add(keyAndValues[0], keyAndValues.Skip(1).ToList());
// Starts a new Key collection
keyAndValues = new List<string>();
}
}
}
I think you can use a code like this - if you can use memory -:
var lines = File.ReadAllText(fileName);
var result =
Regex.Matches(lines, #"\[(?<key>HKEY[^]]+)\]\s+(?<value>[^[]+)")
.OfType<Match>()
.ToDictionary(k => k.Groups["key"], v => v.Groups["value"].ToString().Trim('\n', '\r', ' '));
C# Demo
This will take 24.173 seconds for a file with more than 4 million lines - Size:~550MB - by using 1.2 GB memory.
Edit :
The best way is using File.ReadAllLines as it is lazy:
var lines = File.ReadAllLines(fileName);
var keyRegex = new Regex(#"\[(?<key>HKEY[^]]+)\]");
var currentKey = string.Empty;
var currentValue = string.Empty;
var result = new Dictionary<string, string>();
foreach (var line in lines)
{
var match = keyRegex.Match(line);
if (match.Length > 0)
{
if (!string.IsNullOrEmpty(currentKey))
{
result.Add(currentKey, currentValue);
currentValue = string.Empty;
}
currentKey = match.Groups["key"].ToString();
}
else
{
currentValue += line;
}
}
This will take 17093 milliseconds for a file with 795180 lines.
I'm trying to add a new line after looping through the group names inside of a foreach loop. However, it never adds the new line. Everything is printed in single line.
string [] groups = client.GetGroups(username.TrimEnd());
StringBuilder groupNames = new StringBuilder();
foreach (string groupName in groups)
{
groupNames.Append(string.Format(groupName,Environment.NewLine));
}
Label1.Text = groupNames.ToString();
After reading few questions posted here in SO, I have tried many different solutions such as:
{
groupNames.Append(groupName);
groupNames.AppendLine();
}
Label1.Text = groupNames.ToString();
Also tried:
{
groupNames.Append(groupName);
groupNames.Append(System.Environment.NewLine);
}
Label1.Text = groupNames.ToString();
However, if in any of the solution I add:
groupNames.Append("|");
//or
groupNames.Append(",");
it will work. The only thing is not working is the newline.
One thing to note is I'm grabbing the users groupNames from Active Directory and when the groupNames returned it contains \ in the name. I also tried removing the \ before adding new line, didn't work either.
groupNames.Append(groupName);
groupNames.Replace(#"\", " ");
groupNames.Append(System.Environment.NewLine);
Any suggestions?
In html new line is not \r\n but it's <br>, so you need to add <br> after each element, a simple string.Join should work fine:
var result = string.Join("<br>", groups);
I am reading a file in my winform and saving it in a list.i have a button "remove" and on clicking it an item (each list item is a line from the file) from the list is removeed and when i write back this list to a file the items i have removed are replaceed by a blank line.
I don't want these blank lines in my file. can anyone please tell me how to remove them.
I have tried using list.Remove(item) to remove the item from the list.
here is what i have tried...
ListView.CheckedListViewItemCollection chkditems = listView1.CheckedItems;
Regex regex1 = new Regex(".*\"(?<vm_name>.*)\".*:.*{.*\"vmx_path\".*:.*r?\"(?<vmx_path>.*)\",.*\"vm_base\".*:.*r?\"(?<vm_base>.*)\".*");
List<string> list_to_items = new List<string>();
foreach (ListViewItem chkitem in chkditems)
{
foreach (string line in list)
{
Match match1 = regex1.Match(line);
if (match1.Success)
{
if (match1.Groups["vm_name"].Value == chkitem.Text)
{
list_to_items.Add(line);
}
}
}
listView1.Items.Remove(chkitem);
}
foreach (string tormv in index)
{
list.Remove(tormv);
}
for sample data for the list you can consider it containing any text.
Should be a comment, but with code, it's difficult...
I don't know what you're doing, but if you run this code, you'll see that the assumptions of your question are incorrect:
var list = new List<string>{"1","2","3"};
Console.WriteLine(string.Join(", ", list)); // 1, 2, 3
list.Remove("2");
Console.WriteLine(string.Join(", ", list)); // 1, 3
I have a text file that contains some comma separated values. and it looks like this:
3,23500,R,5998,20.38,06/12/2013 01:44:17
2,23500,P,5983,20.234,06/12/2013 01:44:17
3,23501,R,5998,20.38,06/12/2013 01:44:18
2,23501,P,5983,20.235,06/12/2013 01:44:18
3,23502,R,6000,20.4,06/12/2013 01:44:19
2,23502,P,5983,20.236,06/12/2013 01:44:19
3,23503,R,5999,20.39,06/12/2013 01:44:20
2,23503,P,5983,20.236,06/12/2013 01:44:20
My task is to extract lines that start with same number in unique files. Eg in the above case you see some lines are starting with 2 and some with 3...there can be more cases like 4 and etc...
What would be the best and fastes approach to do this? The files that I am working with are quite big and sometimes are in magnitude of gigabytes...
I did split each line and store the first value that will be the number I am looking for in an array and then remove duplicate values from the array...it works but it is very slow!
This is my own code:
private void buttonBeginProcess_Click(object sender, EventArgs e)
{
var file = File.ReadAllLines(_fileName);
var nodeId = new List<int>();
foreach (var line in file)
{
nodeId.Add(int.Parse(line.Split(',')[0]));
}
//Unique numbers
nodeId = nodeId.Distinct().ToList();
}
var lines = File.ReadLines(myFilePath);
var lineGroups = lines
.Where(line => line.Contains(","))
.Select(line => new{key = line.Split(',')[0], line})
.GroupBy(x => x.key);
foreach(var lineGroup in lineGroups)
{
var key = lineGroup.Key;
var keySpecificLines = lineGroup.Select(x => x.line);
//save keySpecificLines to file
}
You could try using StreamReader / StreamWriter to process each file one line at a time:
var writers = new Dictionary<string, StreamWriter>();
using (StreamReader sr = new StreamReader(pathToFile))
{
while (sr.Peek() >= 0)
{
var line = sr.ReadLine();
var key = line.Split(new[]{ ',' },2)[0];
if (!lineGroups.ContainsKey(key))
{
writers[key] = new StreamWriter(GetPathToOutput(key));
}
writers[key].WriteLine(line);
}
}
foreach(StreamWriter sw in writers.Values)
{
sw.Dispose();
}
With this method, you ensure that your code never has to consume the entire input file, so it shouldn't matter how large your input files are. Of course the downside is it would have to keep an arbitrary number of files open throughout the process.
I have a basic C# console application that reads a text file (CSV format) line by line and puts the data into a HashTable. The first CSV item in the line is the key (id num) and the rest of the line is the value. However I've discovered that my import file has a few duplicate keys that it shouldn't have. When I try to import the file the application errors out because you can't have duplicate keys in a HashTable. I want my program to be able to handle this error though. When I run into a duplicate key I would like to put that key into a arraylist and continue importing the rest of the data into the hashtable. How can I do this in C#
Here is my code:
private static Hashtable importFile(Hashtable myHashtable, String myFileName)
{
StreamReader sr = new StreamReader(myFileName);
CSVReader csvReader = new CSVReader();
ArrayList tempArray = new ArrayList();
int count = 0;
while (!sr.EndOfStream)
{
String temp = sr.ReadLine();
if (temp.StartsWith(" "))
{
ServMissing.Add(temp);
}
else
{
tempArray = csvReader.CSVParser(temp);
Boolean first = true;
String key = "";
String value = "";
foreach (String x in tempArray)
{
if (first)
{
key = x;
first = false;
}
else
{
value += x + ",";
}
}
myHashtable.Add(key, value);
}
count++;
}
Console.WriteLine("Import Count: " + count);
return myHashtable;
}
if (myHashtable.ContainsKey(key))
duplicates.Add(key);
else
myHashtable.Add(key, value);
A better solution is to call ContainsKey to check if the key exist before adding it to the hash table instead. Throwing exception on this kind of error is a performance hit and doesn't improve the program flow.
ContainsKey has a constant O(1) overhead for every item, while catching an Exception incurs a performance hit on JUST the duplicate items.
In most situations, I'd say check for the key, but in this case, its better to catch the exception.
Here is a solution which avoids multiple hits in the secondary list with a small overhead to all insertions:
Dictionary<T, List<K>> dict = new Dictionary<T, List<K>>();
//Insert item
if (!dict.ContainsKey(key))
dict[key] = new List<string>();
dict[key].Add(value);
You can wrap the dictionary in a type that hides this or put it in a method or even extension method on dictionary.
If you have more than 4 (for example) CSV values, it might be worth setting the value variable to use a StringBuilder as well since the string concatenation is a slow function.
Hmm, 1.7 Million lines? I hesitate to offer this for that kind of load.
Here's one way to do this using LINQ.
CSVReader csvReader = new CSVReader();
List<string> source = new List<string>();
using(StreamReader sr = new StreamReader(myFileName))
{
while (!sr.EndOfStream)
{
source.Add(sr.ReadLine());
}
}
List<string> ServMissing =
source
.Where(s => s.StartsWith(" ")
.ToList();
//--------------------------------------------------
List<IGrouping<string, string>> groupedSource =
(
from s in source
where !s.StartsWith(" ")
let parsed = csvReader.CSVParser(s)
where parsed.Any()
let first = parsed.First()
let rest = String.Join( "," , parsed.Skip(1).ToArray())
select new {first, rest}
)
.GroupBy(x => x.first, x => x.rest) //GroupBy(keySelector, elementSelector)
.ToList()
//--------------------------------------------------
List<string> myExtras = new List<string>();
foreach(IGrouping<string, string> g in groupedSource)
{
myHashTable.Add(g.Key, g.First());
if (g.Skip(1).Any())
{
myExtras.Add(g.Key);
}
}
Thank you all.
I ended up using the ContainsKey() method. It takes maybe 30 secs longer, which is fine for my purposes. I'm loading about 1.7 million lines and the program takes about 7 mins total to load up two files, compare them, and write out a few files. It only takes about 2 secs to do the compare and write out the files.