Fastest way of reading txt files in C# - c#

I'm working with a project and I'm a little bit confused. I've got from my teacher some txt files (from his site files: wt40.txt, wt50.txt, wt100.txt).
Every file structure looks similiar:
26 24 79 46 32 35 73 74 14 67 86 46 78 40 29 94 64 27 90 55
35 52 36 69 85 95 14 78 37 86 44 28 39 12 30 68 70 9 49 50
1 10 9 10 10 4 3 2 10 3 7 3 1 3 10 4 7 7 4 7
5 3 5 4 9 5 2 8 10 4 7 4 9 5 7 7 5 10 1 3
Every number has 6 chars, but instead of leading zeros there are
spaces
At every line there are 20 numbers
File wt40.txt should be read as: first two lines to first List, next two lines to next List and third pair of lines to the third list. Next lines again should be put in pairs to those Lists.
In C++ I'm doing it in this simple way:
for(int ins=0; ins<125; ins++) //125 instances in file
{
for(int i=0; i<N; i++) file>>tasks[i].p; //N elements at two first lines
for(int i=0; i<N; i++) file>>tasks[i].w;
for(int i=0; i<N; i++) file>>tasks[i].d;
tasks[i].putToLists();
}
But when I'm writing this in C# I have to open StreamReader, read every line, split it by regexp, cast them to int and add to lists. That's a lot of loops.
I cannot read every 6 chars and add them in three loops because those text files have messed up end of lines chars - sometimes it's just '\n' sometimes something more.
Isn't there any more simple way?

There is essentially a 20 by n table of 6 digit(character) numbers with leading spaces.
26 24 79 46 32 35 73 74 14 67 86 46 78 40 29 94 64 27 90 55
35 52 36 69 85 95 14 78 37 86 44 28 39 12 30 68 70 9 49 50
1 10 9 10 10 4 3 2 10 3 7 3 1 3 10 4 7 7 4 7
5 3 5 4 9 5 2 8 10 4 7 4 9 5 7 7 5 10 1 3
I don't understand the last sentence:
File wt40.txt should be read as: first two lines to first List, next
two lines to next List and third pair of lines to the third list. Next
lines again should be put in pairs to those Lists.
Say you want to get the first 6 rows and create 3 lists each with 2 rows, you do could something like:
It is eager in that it reads everything into memory and then does its work.
const int maxNumberDigitLength = 6;
const int rowLengthInChars = maxNumberDigitLength * 20;
const int totalNumberOfCharsToRead = rowLengthInChars * maxNumberDigitLength;
char[] buffer = new char[totalNumberOfCharsToRead];
using (StreamReader reader = new StreamReader("wt40.txt")
{
int numberOfCharsRead = reader.Read(buffer, 0, totalNumberOfCharsToRead);
}
// put them in your lists
IEnumerable<char> l1 = buffer.Take(rowLengthInChars);
IEnumerable<char> l2 = buffer.Skip(rowLengthInChars).Take(rowLengthInChars);
IEnumerable<char> l3 = buffer.Skip(rowLengthInChars*2).Take(rowLengthInChars);
// Get the list of strings from the list of chars using non LINQ method.
List<string> list1 = new List<string>();
int i = 0;
StringBuilder sb = new StringBuilder();
foreach(char c in l1)
{
if(i < maxNumberDigitLength)
{
sb.Append(c);
i++;
}
i = 0;
list1.Add(sb.ToString());
}
// LINQ method
string s = string.Concat(l1);
List<string> list1 = Enumerable
.Range(0, s.Length / maxNumberDigitLength)
.Select(i => s.Substring(i * maxNumberDigitLength, maxNumberDigitLength))
.ToList();
// Parse to ints using LINQ projection
List<int> numbers1 = list1.Select(int.Parse);
List<int> numbers2 = list2.Select(int.Parse);
List<int> numbers3 = list3.Select(int.Parse);

Isn't there any more simple way?
Don't know if it's simpler but there is only one loop and a bit of LINQ:
List<List<int>> lists = new List<List<int>>();
using (StreamReader reader = new StreamReader("wt40.txt"))
{
string line;
int count = 0;
while ((line = reader.ReadLine()) != null)
{
List<int> currentList =
Regex.Split(line, "\\s")
.Where(s => !string.IsNullOrWhiteSpace(s))
.Select(int.Parse).ToList();
if (currentList.Count > 0) // skip empty lines
{
if (count % 2 == 0) // append each second list to the previous one
{
lists.Add(currentList);
}
else
{
lists[count / 2].AddRange(currentList);
}
}
count++;
}
}
In total you end up with 375 lists each containing 40 numbers (at least for wt40.txt input).

Related

Autoadjust chracters within specific string length

I have a string in form of:
1 name 25 11 45 66
I need to replace 11 with -55.88 and 45 with 99.67
But i dont want to break the sequcence of spaces.
Present string:
1 name 25 11 45 66
Expected result:
1 name 25 -55.88 99.67 66
white Spaces count between 25 to 11 and 11 to 45 and 45 to 66 is equal to 10.
At present when i split and replace value with another value the sequece of space shifted towards left for example
1 name 25 -55.88 99.67 66
Replace the target repeatedly using string.Replace.
public static void Main(string[] args)
{
string inputString = "1 name 25 11 45 66";
string replacedString = inputString.Replace("11", "-55.88").Replace("45", "99.67");
Console.WriteLine(replacedString);
}
Outputting:
1 name 25 -55.88 99.67 66

Uniformly distributing hash of given properties

I am trying to distribute a set of items across number of buckets. I am looking for following properties:
Bucket assignment needs to be deterministic. In different runs same
input should end up in the same bucket.
Distribution of data between buckets should be uniform.
This should work for fairly small number of inputs (e.g. if I want
to distribute 50 inputs across 25 buckets ideally each bucket will
have 2 items).
First try was to generate md5 from input data and form bucket from first bytes of md5. I am not too satisfied with uniformity. It works well when input is large but not so well for small input. E.g. distributing 100 items across 64 buckets:
List<string> l = new List<string>();
for (int i = 0; i < 100; i++)
{
l.Add(string.Format("data{0}.txt", i));
}
int[] buckets = new int[64];
var md5 = MD5.Create();
foreach (string str in l)
{
{
byte[] hash = md5.ComputeHash(Encoding.Default.GetBytes(str));
uint bucket = BitConverter.ToUInt32(hash, 0) % 64;
buckets[bucket % 64]++;
}
}
Any suggestions what could I do to achieve higher uniformity? Thanks.
Leaving aside the efficiency of using MD5 for this purpose (see the discussion here and in the marked duplicate of that question), basically the answer is that what you have is what a uniform distribution really looks like.
That might seem counter-intuitive, but it's easily demonstrable either mathematically or by experiment.
As a kind of motivating example, consider the task of choosing exactly 64 numbers in the range 0-63. The odds that you will get one per bucket are very close to 0. There are 6464 possible sequences, of which 64! contain all 64 numbers. The odds of getting one of these sequence is about one in 3.1×1026. In fact, the odds of getting a sequence in which no element appears three times is less than one in a thousand (it's about .000658). So it's almost certain that a random uniform sample of 64 numbers in the range 0-63 will have some triplets, and it's pretty likely that there will be some quadruplet. If the sample is 100 numbers, those probabilities just get even bigger.
But the maths are not so easy to compute in general, so here I chose to illustrate by experiment :-), using random.org, which is a pretty reliable source of random numbers. I asked it for 100 numbers in the range 0-63, and counted them (using bash, so my "graph" is not as pretty as yours). Here are two runs:
First run:
Random numbers:
44 17 50 11 16 4 24 29 12 36
27 32 12 63 4 30 19 60 28 39
22 40 19 16 23 2 46 31 52 41
13 2 42 17 29 39 43 9 20 50
45 40 38 33 17 45 28 6 48 12
56 26 34 33 35 40 28 44 22 10
50 55 49 43 63 62 22 50 15 52
48 54 53 26 4 53 13 56 42 60
49 30 14 55 29 62 15 13 35 40
22 38 37 36 10 36 5 41 43 53
Counts:
X X X
X XX X X XX X X X X X
X X X XX XXX X X X XXX X XX XXXXXXXX XXX XX XX X XX
X XXX XXXXXXXXX XX XXX XXXXXXXXXXXXXXXXXXXXX XXX XXXXX X XX
----------------------------------------------------------------
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
Second run:
Random numbers:
41 31 16 40 1 51 17 41 27 46
24 14 21 33 25 43 4 36 1 14
40 22 11 22 30 19 23 63 39 61
8 55 40 6 21 13 55 13 3 52
17 52 53 53 7 21 47 13 45 57
25 27 30 48 38 55 55 22 61 11
11 28 45 63 43 0 41 51 15 2
33 2 46 14 35 41 5 2 11 37
28 56 15 7 18 12 57 36 59 51
42 5 46 32 10 8 0 46 12 9
Counts:
X X X X
X X XX XX XX X X X
XXX X XX XXXXX X XX X XX X X X XX X XX XXX X X X X
XXXXXXXXXXXXXXXXXXXX XXXXX XX XXXX XXXXXXXXX XXXX XXX XXX X X X
----------------------------------------------------------------
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
You could try this with your favourite random number generator, playing around with the size of the distribution. You'll get the same sort of shape.

C# input matrix from text file crash [duplicate]

This question already has answers here:
Splitting string based on variable number of white spaces
(2 answers)
Closed 6 years ago.
here is my source code at the moment..
CODE:
static void InputValues()
{
int row, col;
string[] words;
matrixName = fileIn.ReadLine();
words = fileIn.ReadLine().Split(' ');
dimenOne = int.Parse(words[0]);
dimenTwo = int.Parse(words[1]);
matrix = new int[dimenOne+1, dimenTwo+1];
for (row = 1; row <= dimenOne; row++)
{
words = fileIn.ReadLine().Split(' ');
for (col = 1; col <= dimenTwo; col++)
{
matrix[row, col] = int.Parse(words[col-1]);
}
}
}
My program will crash after it reads in the first value of 45 after
matrix[row, col] = int.Parse(words[col-1]); there are 3 spaces between values in the text file which is posted below. How do i populate the 2-d array without crashing?
TXT FILE
3
Matrix One
5 7
45 38 5 56 18 34 4
87 56 23 41 75 87 97
45 97 86 7 6 8 85
67 6 79 65 41 37 4
7 76 57 68 8 78 2
Matrix Two
6 8
45 38 5 56 18 34 4 30
87 56 23 41 75 87 97 49
45 97 86 7 6 8 85 77
67 6 79 65 41 37 4 53
7 76 57 68 8 78 2 14
21 18 46 99 17 3 11 73
Matrix Three
6 6
45 38 5 56 18 34
87 56 23 41 75 87
45 97 86 7 6 8
67 6 79 65 41 37
7 76 57 68 8 78
21 18 46 99 17 3
Either test if you can convert the value to an integer (using TryParse) or better use a regular expression to parse the input string. Your problem is that the split function returns more results than you expect (can easily be seen if you set a breakpoint after words = filein....)
If you have a variable number of spaces in your lines, you should eliminate them.
words = fileIn.ReadLine()
.Split(' ')
.Where(x => !string.IsNullOrWhiteSpace(x))
.ToArray();

C# How to only select strings based on a common set of characters?

I have certain numbers placed in lines in a file, the only lines I am interested with are the lines that contain the set of characters "4 2 0" in this order example below:
.....
128 2 2 0 24 49 50 46
129 4 2 0 26 51 36 54 53
130 4 2 0 26 51 41 52 56
....
Here I would discard the line that starts by 128, and keep the two others. What is the best way to do this for the whole file(knowing that lines with such a set of characters are not necessarily at the same spot)? Thank you for your help...
The following should do the trick:
string str = #"128 2 2 0 24 49 50 46
129 4 2 0 26 51 36 54 53
130 4 2 0 26 51 41 52 56";
string[] strSplitted = str.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.RemoveEmptyEntries);
List<string> result = strSplitted.ToList();
foreach (var item in strSplitted)
{
if (!item.Contains("4 2 0"))
{
result.Remove(item);
}
}
The "result" variable will have the right results.

Why a .Union() changes the order of the items? [duplicate]

This question already exists:
Closed 10 years ago.
Possible Duplicate:
.Union() changes the order of the items?
According to this question (without scenario/example; you should remove it, I can't) this is my problem :
I've noticed that if I do a Union, than an Intersect between collections Attachment[], the order of the Items "can" change.
This is my code :
GalleryDataClassesDataContext db = new GalleryDataClassesDataContext();
List<Attachment> Allegati = db.ExecuteQuery<Attachment>("EXEC SelectAttachmentsByKey #Key={0}, #IDCliente={1}", new object[] { "", "47" }).ToList();
List<Attachment> AllegatiPerCategorie = new List<Attachment>();
AllegatiPerCategorie = AllegatiPerCategorie.Union(db.AttachmentAttachmentCategories.Where(aac => aac.IDAttachmentCategory == 72).OrderBy(p => p.Ordine == null ? 1 : 0).ThenBy(p => p.Ordine).Select(aac => aac.Attachment)).ToList();
Allegati = Allegati.Intersect(AllegatiPerCategorie).ToList();
count = 0;
foreach (Attachment a in AllegatiPerCategorie)
{
Response.Write(count.ToString() + " - " + a.IDAttachment + "<br />");
count++;
}
Response.Write("<br />### FILTERED ###<br /><br />");
count = 0;
foreach (Attachment a in Allegati)
{
Response.Write(count.ToString() + " - " + a.IDAttachment + "<br />");
count++;
}
And the output is :
0 - 6769
1 - 6792
2 - 6771
3 - 6699
4 - 6632
5 - 6774
6 - 6595
7 - 6602
8 - 6641
9 - 6643
10 - 6764
11 - 6634
12 - 6642
13 - 6660
14 - 6640
15 - 6665
16 - 6673
17 - 6767
18 - 6772
19 - 6766
20 - 6763
21 - 6768
22 - 6644
23 - 6635
24 - 6633
25 - 6793
26 - 6677
27 - 6608
28 - 6610
29 - 6558
30 - 6563
31 - 6631
32 - 6604
33 - 6606
34 - 6607
35 - 6596
36 - 6597
37 - 6598
38 - 6599
39 - 6600
40 - 6471
41 - 6470
42 - 6469
43 - 6601
44 - 6603
45 - 6663
46 - 6664
47 - 6645
48 - 6637
49 - 6638
50 - 6609
51 - 6611
52 - 6612
53 - 6613
54 - 6614
55 - 6615
56 - 6616
57 - 6617
58 - 6618
59 - 6619
60 - 6620
61 - 6622
62 - 6567
63 - 6568
64 - 6569
65 - 6570
66 - 6571
67 - 6572
68 - 6573
69 - 6575
70 - 6576
71 - 6577
72 - 6579
73 - 6580
74 - 6581
75 - 6582
76 - 6583
77 - 6584
78 - 6585
79 - 6586
80 - 6587
81 - 6588
82 - 6589
83 - 6590
84 - 6591
85 - 6592
86 - 6593
87 - 6594
88 - 6765
### FILTERED ###
0 - 6769
1 - 6792
2 - 6771
3 - 6699
4 - 6774
5 - 6595
6 - 6602
7 - 6634
8 - 6642
9 - 6640
10 - 6660
11 - 6665
12 - 6673
13 - 6772
14 - 6766
15 - 6768
16 - 6644
17 - 6635
18 - 6633
19 - 6793
20 - 6677
Well, notice for example the order of values 6660 and 6640 in the AllegatiPerCategorie list : 6660 before 6640 (at position 13 and 14).
Now, watch at the same values order on Allegati : 6640 is before 6660 (at position 9 and 10).
Why this behaviour? How can I fix it? Thank you
MSDN states:
When the object returned by this method is enumerated, Union enumerates first and second in that order and yields each element that has not already been yielded.
Here is a short example to demonstrate the behavior:
new int[] {1}.Union(new int[] {1, 2, 3}) // returns: 1,2,3
new int[] {2}.Union(new int[] {1, 2, 3}) // returns: 2,1,3
new int[] {3}.Union(new int[] {1, 2, 3}) // returns: 3,1,2
new int[] {1,3,5}.Union(new int[] {2, 4}) // returns: 1,3,5,2,4
Union:
Produces the set union of two sequences by using the default equality comparer.
A set by definition contains no duplicates and has no inherent sorting.
Also:
This method excludes duplicates from the return set. This is different behavior to the Concat<TSource> method, which returns all the elements in the input sequences including duplicates.
And:
When the object returned by this method is enumerated, Union enumerates first and second in that order and yields each element that has not already been yielded.

Categories

Resources