Override StreamReader's ReadLine method - c#

I'm trying to override a StreamReader's ReadLine method, but having difficulty doing so due to inability to access some private variables. Is this possible, or should I just write my own StreamReader class?

Assuming you want your custom StreamReader to be usable anywhere that a TextReader can be used there are typically two options.
Inherit from StreamReader and override the functions that you want to have work differently. In your case this would be StreamReader.ReadLine.
Inherit from TextReader and implement the reader functionality completely to your requirements.
NB: For option 2 above, you can maintain an internal reference to a StreamReader instance and delegate all the functions to the internal instance, except for the piece of functionality that you want to replace. In my view, this is just an implementation detail of option 2 rather than a 3rd option.
Based on your question I assume you have tried option 1 and found that overriding StreamReader.ReadLine is rather difficult because you could not access the internals of the class. Well for StreamReader you are lucky and can achieve this without having access to the internal implementation of the StreamReader.
Here is a simple example:
Disclaimer: The ReadLine() implementation is for demonstration purposes and is not intended to be a robust or complete implementation.
class CustomStreamReader : StreamReader
{
public CustomStreamReader(Stream stream)
: base(stream)
{
}
public override string ReadLine()
{
int c;
c = Read();
if (c == -1)
{
return null;
}
StringBuilder sb = new StringBuilder();
do
{
char ch = (char)c;
if (ch == ',')
{
return sb.ToString();
}
else
{
sb.Append(ch);
}
} while ((c = Read()) != -1);
return sb.ToString();
}
}
You will notice that I simply used the StreamReader.Read() method to read the characters from the stream. While definitely less per formant that working directly with the internal buffers, the Read() method does use the internal buffering so should still yield pretty good performance, but that should be tested to confirm.
For fun, here is a example of option 2. I used the encapsulated StreamReader to reduce the actual code, this is not tested at all..
class EncapsulatedReader : TextReader
{
private StreamReader _reader;
public EncapsulatedReader(Stream stream)
{
_reader = new StreamReader(stream);
}
public Stream BaseStream
{
get
{
return _reader.BaseStream;
}
}
public override string ReadLine()
{
int c;
c = Read();
if (c == -1)
{
return null;
}
StringBuilder sb = new StringBuilder();
do
{
char ch = (char)c;
if (ch == ',')
{
return sb.ToString();
}
else
{
sb.Append(ch);
}
} while ((c = Read()) != -1);
return sb.ToString();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
_reader.Close();
}
base.Dispose(disposing);
}
public override int Peek()
{
return _reader.Peek();
}
public override int Read()
{
return _reader.Read();
}
public override int Read(char[] buffer, int index, int count)
{
return _reader.Read(buffer, index, count);
}
public override int ReadBlock(char[] buffer, int index, int count)
{
return _reader.ReadBlock(buffer, index, count);
}
public override string ReadToEnd()
{
return _reader.ReadToEnd();
}
public override void Close()
{
_reader.Close();
base.Close();
}
}

this class can help you
public class MyStreamReader : System.IO.StreamReader
{
public MyStreamReader(string path)
: base(path)
{
}
public override string ReadLine()
{
string result = string.Empty;
int b = base.Read();
while ((b != (int)',') && (b > 0))
{
result += this.CurrentEncoding.GetString(new byte[] { (byte)b });
b = base.Read();
}
return result;
}
}

Try This, I wrote this because I have some very large '|' delimited files that have \r\n inside of some of the columns and I needed to use \r\n as the end of the line delimiter. I was trying to import some files using SSIS packages but because of some corrupted data in the files I was unable to. The File was over 5 GB so it was too large to open and manually fix. I found the answer through looking through lots of Forums to understand how streams work and ended up coming up with a solution that reads each character in a file and spits out the line based on the definitions I added into it. this is for use in a Command Line Application, complete with help :). I hope this helps some other people out, I haven't found a solution quite like it anywhere else, although the ideas were inspired by this forum and others. This will not fix the files it only splits them... please be aware that this is still a work in progress :).
class Program
{
static long _fileposition = 0;
static void Main(string[] args)
{
// Check information passed in
if (args.Any())
{
if (args[0] == "/?")
{
var message = "Splits a file into smaller pieces";
message += "\n";
message += "\n";
message += "SplitFile [sourceFileName] [destinationFileName] [RowBatchAmount] [FirstRowHasHeader]";
message += "\n";
message += "\n";
message += " [sourceFileName] (STRING) required";
message += "\n";
message += " [destinationFileName] (STRING) will default to the same location as the sourceFileName";
message += "\n";
message += " [RowBatchAmount] (INT) will create files that have this many rows";
message += "\n";
message += " [FirstRowHasHeader] (True/False) Will Add Header Row to each new file";
Console.WriteLine(message);
}
else
{
string sourceFileName = args[0];
string destFileLocation = args.Count() >= 2 ? args[1] : sourceFileName.Substring(0, sourceFileName.LastIndexOf("\\"));
int RowCount = args.Count() >= 3 ? int.Parse(args[2]) : 500000;
bool FirstRowHasHeader = true;
FirstRowHasHeader = args.Count() != 4 || bool.Parse(args[3]);
// Create Directory If Needed
if (!Directory.Exists(destFileLocation))
{
Directory.CreateDirectory(destFileLocation);
}
string line = "";
int linecount = 0;
int FileNum = 1;
string newFileName = Path.Combine(destFileLocation, Path.GetFileNameWithoutExtension(sourceFileName));
newFileName += FileNum + Path.GetExtension(sourceFileName);
// Always add Header Line
string HeaderLine = GetLine(sourceFileName, _fileposition);
int HeaderCount = HeaderLine.Split('|').Count();
do
{
// Add Header Line
if ((linecount == 0 & FirstRowHasHeader) | (_fileposition == 1 & !FirstRowHasHeader))
{
using (FileStream NewFile = new FileStream(newFileName, FileMode.Append))
{
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
Byte[] bytes = encoding.GetBytes(HeaderLine);
int length = encoding.GetByteCount(HeaderLine);
NewFile.Write(bytes, 0, length);
}
}
//Evaluate Line
line = GetLine(sourceFileName, _fileposition, HeaderCount);
if (line == null) continue;
// Create File if it doesn't exist and write to it
using (FileStream NewFile = new FileStream(newFileName, FileMode.Append))
{
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
Byte[] bytes = encoding.GetBytes(line);
int length = encoding.GetByteCount(line);
NewFile.Write(bytes, 0, length);
}
//Add to the line count
linecount++;
//Create new FileName if needed
if (linecount == RowCount)
{
FileNum++;
// Create a new sub File, and read into it
newFileName = Path.Combine(destFileLocation, Path.GetFileNameWithoutExtension(sourceFileName));
newFileName += FileNum + Path.GetExtension(sourceFileName);
linecount = 0;
}
} while (line != null);
}
}
else
{
Console.WriteLine("You must provide sourcefile!");
Console.WriteLine("use /? for help");
}
}
static string GetLine(string sourceFileName, long position, int NumberOfColumns = 0)
{
byte[] buffer = new byte[65536];
var builder = new StringBuilder();
var finishedline = false;
using (Stream source = File.OpenRead(sourceFileName))
{
source.Position = position;
var crlf = "\r\n";
var lf = "\n";
var length = source.Length;
while (source.Position = 0 & finishedline == false & _fileposition = NumberOfColumns) | NumberOfColumns == 0)
{
// Remove all Control Line Feeds before the end of the line.
builder = builder.Replace(crlf, lf);
// Add Final Control Line Feed
var x = (char)NewLine.Read();
builder.Append(x);
finishedline = true;
_fileposition++;
continue;
}
}
break;
}
default:
builder.Append(c);
break;
}
}
}
break;
}
}
return (builder.ToString() == "" ? null: builder.ToString());
}
}
References: http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/b0d4cba1-471a-4260-94c1-fddd4244fa23/
this one helped me the most: https://stackoverflow.com/a/668003/1582188

Related

Get Difference Between Two Strings in Terms of Remove and Insert Actions

So I have a text box and on the text changed event I have the old text and the new text, and want to get the difference between them. In this case, I want to be able to recreate the new text with the old text using one remove function and one insert function. That is possible because there are a few possibilities of the change that was in the text box:
Text was only removed (one character or more using selection) - ABCD -> AD
Text was only added (one character or more using paste) - ABCD -> ABXXCD
Text was removed and added (by selecting text and entering text in the same action) - ABCD -> AXD
So I want to have these functions:
Sequence GetRemovedCharacters(string oldText, string newText)
{
}
Sequence GetAddedCharacters(string oldText, string newText)
{
}
My Sequence class:
public class Sequence
{
private int start;
private int end;
public Sequence(int start, int end)
{
StartIndex = start; EndIndex = end;
}
public int StartIndex { get { return start; } set { start = value; Length = end - start + 1; } }
public int EndIndex { get { return end; } set { end = value; Length = end - start + 1; } }
public int Length { get; private set; }
public override string ToString()
{
return "(" + StartIndex + ", " + EndIndex + ")";
}
public static bool operator ==(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return true;
else if(IsNull(a) || IsNull(b))
return false;
else
return a.StartIndex == b.StartIndex && a.EndIndex == b.EndIndex;
}
public override bool Equals(object obj)
{
return base.Equals(obj);
}
public static bool operator !=(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return false;
else if(IsNull(a) || IsNull(b))
return true;
else
return a.StartIndex != b.StartIndex && a.EndIndex != b.EndIndex;
}
public override int GetHashCode()
{
return base.GetHashCode();
}
static bool IsNull(Sequence sequence)
{
try
{
return sequence.Equals(null);
}
catch(NullReferenceException)
{
return true;
}
}
}
Extra Explanation: I want to know which characters were removed and which characters were added to the text in order to get the new text so I can recreate this. Let's say I have ABCD -> AXD. 'B' and 'C' would be the characters that were removed and 'X' would be the character that was added. So the output from the GetRemovedCharacters function would be (1, 2) and the output from the GetAddedCharacters function would be (1, 1). The output from the GetRemovedCharacters function refers to indexes in the old text and the output from the GetAddedCharacters function refers to indexes in the old text after removing the removed characters.
EDIT: I've thought of a few directions:
This code I created* which returns the sequence that was affected - if characters were removed it returns the sequence of the characters that were removed in the old text; if characters were added it returns the sequence of the characters that were added in the new text. It does not return the right value (which I myself not sure what I want it to be) when removing and adding text.
Maybe the SelectionStart property in the text box could help - the position of the caret after the text was changed.
*
private static Sequence GetChangeSequence(string oldText, string newText)
{
if(newText.Length > oldText.Length)
{
for(int i = 0; i < newText.Length; i++)
if(i == oldText.Length || newText[i] != oldText[i])
return new Sequence(i, i + (newText.Length - oldText.Length) - 1);
return null;
}
else if(newText.Length < oldText.Length)
{
for(int i = 0; i < oldText.Length; i++)
if(i == newText.Length || oldText[i] != newText[i])
return new Sequence(i, i + (oldText.Length - newText.Length) - 1);
return null;
}
else
return null;
}
Thanks.
A simple string comparison wont do the job since you are asking for a algorithm which supports added and removed chars at the same time and is hence not easy to achive in a few lines of code. Id suggest to use a library instead of writing your own comparison algorithm.
Have a look at this project for example.
I quickly threw this together to give you an idea of what I did to solve your question. It doesn't use your classes but it does find an index so it's customizable for you.
There are also obvious limitations to this as it is just bare bones.
This method will spot out changes made to the original string by comparing it to the changed string
// Find the changes made to a string
string StringDiff (string originalString, string changedString)
{
string diffString = "";
// Iterate over the original string
for (int i = 0; i < originalString.Length; i++)
{
// Get the character to search with
char diffChar = originalString[i];
// If found char in the changed string
if (FindInString(diffChar, changedString, out int index))
{
// Remove from the changed string at the index as we don't want to match to this char again
changedString = changedString.Remove(index, 1);
}
// If not found then this is a difference
else
{
// Add to diff string
diffString += diffChar;
}
}
return diffString;
}
This method will return true at the first matching occurrence (an obvious limitation but this is more to give you an idea)
// Find char at first occurence in string
bool FindInString (char c, string search, out int index)
{
index = -1;
// Iterate over search string
for (int i = 0; i < search.Length; i++)
{
// If found then return true with index
if (c == search[i])
{
index = i;
return true;
}
}
return false;
}
This is a simple helper method to show you an example
void SplitStrings(string oldStr, string newStr)
{
Console.WriteLine($"Old : {oldStr}, New: {newStr}");
Console.WriteLine("Removed - " + StringDiff(oldStr, newStr));
Console.WriteLine("Added - " + StringDiff(newStr, oldStr));
}
I've done it.
static void Main(string[] args)
{
while(true)
{
Console.WriteLine("Enter the Old Text");
string oldText = Console.ReadLine();
Console.WriteLine("Enter the New Text");
string newText = Console.ReadLine();
Console.WriteLine("Enter the Caret Position");
int caretPos = int.Parse(Console.ReadLine());
Sequence removed = GetRemovedCharacters(oldText, newText, caretPos);
if(removed != null)
oldText = oldText.Remove(removed.StartIndex, removed.Length);
Sequence added = GetAddedCharacters(oldText, newText, caretPos);
if(added != null)
oldText = oldText.Insert(added.StartIndex, newText.Substring(added.StartIndex, added.Length));
Console.WriteLine("Worked: " + (oldText == newText).ToString());
Console.ReadKey();
Console.Clear();
}
}
static Sequence GetRemovedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(startIndex, caretPosition + (oldText.Length - newText.Length) - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static Sequence GetAddedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(GetStartIndex(oldText, newText), caretPosition - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static int GetStartIndex(string oldText, string newText)
{
for(int i = 0; i < Math.Max(oldText.Length, newText.Length); i++)
if(i >= oldText.Length || i >= newText.Length || oldText[i] != newText[i])
return i;
return -1;
}
static bool SequenceValid(Sequence sequence)
{
return sequence.StartIndex >= 0 && sequence.EndIndex >= 0 && sequence.EndIndex >= sequence.StartIndex;
}

How to convert Quoted-Print String

I'm working on French String in .NET
Decoding a Mail body , I receive "Chasn=C3=A9 sur illet"
I would like to get "Chasné sur illet"
and i don't find any solution aver 2 days web search.
C# ou VB.NET
Can anyone helps me ?
thanks
Or the easiest of all, just use the QuotedPrintableDecoder from my MimeKit library:
static string DecodeQuotedPrintable (string input, string charset)
{
var decoder = new QuotedPrintableDecoder ();
var buffer = Encoding.ASCII.GetBytes (input);
var output = new byte[decoder.EstimateOutputLength (buffer.Length)];
int used = decoder.Decode (buffer, 0, buffer.Length, output);
var encoding = Encoding.GetEncoding (charset);
return encoding.GetString (output, 0, used);
}
Note that the other answers above assume the decoded content will be ASCII or UTF-8, but that isn't necessarily the case. You'll need to get the charset parameter from the Content-Type header of the MIME part that you are decoding.
Of course... if you don't know how to get that info, you could simply use my awesome MailKit library to get the MIME part from IMAP and have it do all of this work for you.
This is UTF8 encoding.
Using this post:
http://www.dpit.co.uk/decoding-quoted-printable-email-in-c/
Here is the code (don't forget to accept the answer if helped):
using System;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(DecodeQuotedPrintable("Chasn=C3=A9 sur illet"));
Console.ReadKey();
}
static string DecodeQuotedPrintable(string input)
{
var occurences = new Regex(#"(=[0-9A-Z][0-9A-Z])+", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match m in matches)
{
byte[] bytes = new byte[m.Value.Length / 3];
for (int i = 0; i < bytes.Length; i++)
{
string hex = m.Value.Substring(i * 3 + 1, 2);
int iHex = Convert.ToInt32(hex, 16);
bytes[i] = Convert.ToByte(iHex);
}
input = input.Replace(m.Value, Encoding.UTF8.GetString(bytes));
}
return input.Replace("=rn", "");
}
}
}
From : https://stackoverflow.com/a/36803911/6403521
My solution :
[TestMethod]
public void TestMethod1()
{
Assert.AreEqual("La Bouichère", quotedprintable("La Bouich=C3=A8re", "utf-8"));
Assert.AreEqual("Chasné sur illet", quotedprintable("Chasn=C3=A9 sur illet", "utf-8"));
Assert.AreEqual("é è", quotedprintable("=C3=A9 =C3=A8", "utf-8"));
}
private string quotedprintable(string pStrIn, string encoding)
{
String strOut = pStrIn.Replace("=\r\n", "");
// Find the first =
int position = strOut.IndexOf("=");
while (position != -1)
{
// String before the =
string leftpart = strOut.Substring(0, position);
// get the QuotedPrintable String in a ArrayList
System.Collections.ArrayList hex = new System.Collections.ArrayList();
// The first Part
hex.Add(strOut.Substring(1 + position, 2));
// Look for the next parts
while (position + 3 < strOut.Length && strOut.Substring(position + 3, 1) == "=")
{
position = position + 3;
hex.Add(strOut.Substring(1 + position, 2));
}
// In the hex Array, we have two items
// Convert using the GetEncoding Function
byte[] bytes = new byte[hex.Count];
for (int i = 0; i < hex.Count; i++)
{
bytes[i] = System.Convert.ToByte(new string(((string)hex[i]).ToCharArray()), 16);
}
string equivalent = System.Text.Encoding.GetEncoding(encoding).GetString(bytes);
// Part of the orignal String after the last QP Symbol
string rightpart = strOut.Substring(position + 3);
// Re build the String
strOut = leftpart + equivalent + rightpart;
// find the new QP Position
position = leftpart.Length + equivalent.Length;
if (rightpart.Length == 0)
{
position = -1;
}
else
{
position = strOut.IndexOf("=", position + 1);
}
}
return strOut;
}
We had an issue with this method - that it is VERY slow.
The following enhanced performance A LOT
public static string FromMailTransferEncoding(this string messageText, Encoding enc, string transferEncoding)
{
if (string.IsNullOrEmpty(transferEncoding))
return messageText;
if ("quoted-printable".Equals(transferEncoding.ToLower()))
{
StringBuilder sb = new StringBuilder();
string delimitorRegEx = #"=[\r][\n]";
string[] parts = Regex.Split(messageText, delimitorRegEx);
foreach (string part in parts)
{
string subPart = part;
Regex occurences = new Regex(#"(=[0-9A-Z][0-9A-Z])+", RegexOptions.Multiline);
MatchCollection matches = occurences.Matches(subPart);
foreach (Match m in matches)
{
byte[] bytes = new byte[m.Value.Length / 3];
for (int i = 0; i < bytes.Length; i++)
{
string hex = m.Value.Substring(i * 3 + 1, 2);
int iHex = Convert.ToInt32(hex, 16);
bytes[i] = Convert.ToByte(iHex);
}
subPart = occurences.Replace(subPart, enc.GetString(bytes), 1);
}
sb.Append(subPart);
}
return sb.ToString();
}
return messageText;
}
static string ConverFromHex(string source)
{
string target = string.Empty;
int startPos = source.IndexOf('=', 0);
int prevStartPos = 0;
while (startPos >= 0)
{
// concat with substring from source
target += source.Substring(prevStartPos, startPos - prevStartPos);
// next offset
startPos++;
// update prev pos
prevStartPos = startPos;
// get substring
string hexString = source.Substring(startPos, 2);
// get int equiv
int hexNum = 0;
if (int.TryParse(hexString, System.Globalization.NumberStyles.AllowHexSpecifier, System.Globalization.CultureInfo.InvariantCulture, out hexNum))
{
// add to target string
target += (char)hexNum;
// add hex length
prevStartPos += 2;
}
// next occurence
startPos = source.IndexOf('=', startPos);
}
// add rest of source
target += source.Substring(prevStartPos);
return target;
}

Replace the start of line in a file quickly

I have an initial file containing lines such as:
34 964:0.049759 1123:0.0031 2507:0.015979
32,48 524:0.061167 833:0.030133 1123:0.002549
34,52 534:0.07349 698:0.141667 1123:0.004403
106 389:0.013396 417:0.016276 534:0.023859
The first part of a line is the class number. A line can have several classes.
For each class, I create a new file.
For instance for class 34 the resulting file will be :
+1 964:0.049759 1123:0.0031 2507:0.015979
-1 524:0.061167 833:0.030133 1123:0.002549
+1 534:0.07349 698:0.141667 1123:0.004403
-1 389:0.013396 417:0.016276 534:0.023859
For class 106 the resulting file will be :
-1 964:0.049759 1123:0.0031 2507:0.015979
-1 524:0.061167 833:0.030133 1123:0.002549
-1 534:0.07349 698:0.141667 1123:0.004403
+1 389:0.013396 417:0.016276 534:0.023859
The problem is I have 13 files to write for 200 class.
I already ran a less optimized version of my code and it took several hours.
With my code below it takes 1 hour to generate the 2600 files.
Is there a way to perform such a replacement in a faster way? Are regex a viable option?
Below is my implementation (works on LINQPAD with this data file)
static void Main()
{
const string filePath = #"C:\data.txt";
const string generatedFilesFolderPath = #"C:\";
const string fileName = "data";
using (new TimeIt("Whole process"))
{
var fileLines = File.ReadLines(filePath).Select(l => l.Split(new[] { ' ' }, 2)).ToList();
var classValues = GetClassValues();
foreach (var classValue in classValues)
{
var directoryPath = Path.Combine(generatedFilesFolderPath, classValue);
if (!Directory.Exists(directoryPath))
Directory.CreateDirectory(directoryPath);
var classFilePath = Path.Combine(directoryPath, fileName);
using (var file = new StreamWriter(classFilePath))
{
foreach (var line in fileLines)
{
var lineFirstPart = line.First();
string newFirstPart = "-1";
var hashset = new HashSet<string>(lineFirstPart.Split(','));
if (hashset.Contains(classValue))
{
newFirstPart = "+1";
}
file.WriteLine("{0} {1}", newFirstPart, line.Last());
}
}
}
}
Console.Read();
}
public static List<string> GetClassValues()
{
// In real life there is 200 class values.
return Enumerable.Range(0, 2).Select(c => c.ToString()).ToList();
}
public class TimeIt : IDisposable
{
private readonly string _name;
private readonly Stopwatch _watch;
public TimeIt(string name)
{
_name = name;
_watch = Stopwatch.StartNew();
}
public void Dispose()
{
_watch.Stop();
Console.WriteLine("{0} took {1}", _name, _watch.Elapsed);
}
}
The output:
Whole process took 00:00:00.1175102
EDIT: I also ran a profiler and it looks like the split method is the hottest spot.
EDIT 2: Simple example:
2,1 1:0.8 2:0.2
3 1:0.4 3:0.6
12 1:0.02 4:0.88 5:0.1
Expected output for class 2:
+1 1:0.8 2:0.2
-1 1:0.4 3:0.6
-1 1:0.02 4:0.88 5:0.1
Expected output for class 3:
-1 1:0.8 2:0.2
+1 1:0.4 3:0.6
-1 1:0.02 4:0.88 5:0.1
Expected output for class 4:
-1 1:0.8 2:0.2
-1 1:0.4 3:0.6
-1 1:0.02 4:0.88 5:0.1
I have eliminated the hottest paths from your code by removing the split and using a bigger buffer on the FileStream.
Instead of Split I now call ToCharArray and then parse the first Chars to the first space and while I'm at it a match with classValue on a char by char basis is performed. The boolean found indicates an exact match for anything before the , of the first space. The rest of the handling is the same.
var fsw = new FileStream(classFilePath,
FileMode.Create,
FileAccess.Write,
FileShare.None,
64*1024*1024); // use a large buffer
using (var file = new StreamWriter(fsw)) // use the filestream
{
foreach(var line in fileLines) // for( int i = 0;i < fileLines.Length;i++)
{
char[] chars = line.ToCharArray();
int matched = 0;
int parsePos = -1;
bool takeClass = true;
bool found = false;
bool space = false;
// parse until space
while (parsePos<chars.Length && !space )
{
parsePos++;
space = chars[parsePos] == ' '; // end
// tokens
if (chars[parsePos] == ' ' ||
chars[parsePos] == ',')
{
if (takeClass
&& matched == classValue.Length)
{
found = true;
takeClass = false;
}
else
{
// reset matching
takeClass = true;
matched = 0;
}
}
else
{
if (takeClass
&& matched < classValue.Length
&& chars[parsePos] == classValue[matched])
{
matched++; // on the next iteration, match next
}
else
{
takeClass = false; // no match!
}
}
}
chars[parsePos - 1] = '1'; // replace 1 in front of space
var correction = 1;
if (parsePos > 1)
{
// is classValue before the comma (or before space)
if (found)
{
chars[parsePos - 2] = '+';
}
else
{
chars[parsePos - 2] = '-';
}
correction++;
}
else
{
// is classValue before the comma (or before space)
if (found)
{
// not enough space in the array, write a single char
file.Write('+');
}
else
{
file.Write('-');
}
}
file.WriteLine(chars, parsePos - correction, chars.Length - (parsePos - correction));
}
}
Instead of iterating over the un-parsed lines 200 times, how about parsing the lines upfront into a data structure then iterating over that 200 times? This should minimize the numer of string manipulation operations.
Also using StreamReader instead of File.ReadLines, so the entire file is not in memory twice -- once as string[] and another time as Detail[].
static void Main(string[] args)
{
var details = ReadDetail("data.txt").ToArray();
var classValues = Enumerable.Range(0, 10).ToArray();
foreach (var classValue in classValues)
{
// Create file/directory etc
using (var file = new StreamWriter("out.txt"))
{
foreach (var detail in details)
{
file.WriteLine("{0} {1}", detail.Classes.Contains(classValue) ? "+1" : "-1", detail.Line);
}
}
}
}
static IEnumerable<Detail> ReadDetail(string filePath)
{
using (StreamReader reader = new StreamReader(filePath))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
int separator = line.IndexOf(' ');
Detail detail = new Detail
{
Classes = line.Substring(0, separator).Split(',').Select(c => Int32.Parse(c)).ToArray(),
Line = line.Substring(separator + 1)
};
yield return detail;
}
}
}
public class Detail
{
public int[] Classes { get; set; }
public string Line { get; set; }
}

Reading lines from a text file, converting them, then writing back to new file

I have some basic knowledge of C#, but I am having trouble coding something that seems simple in concept. I want to read a file (.asm) containing values such as
#1
#12
#96
#2
#46
etc.
on multiple consecutive lines. I then want to get rid of the # symbols (if they are present), convert the remaining number values to binary, then write these binary values back to a new file (.hack) on their own lines. There isn't a set limit on the number of lines, which is my biggest issue as I don't know how to check for lines dynamically. So far I can only read and convert lines if I code to look for them, then I can't figure out how to write these values on their own lines in the new file. Sorry if this sounds a bit convoluted, but any help would be appreciated. Thanks!
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
var line = File.ReadAllText(openFileDialog1.FileName);
using (StreamWriter sw = File.CreateText("testCode.hack"))
{
var str = line;
var charsToRemove = new string[] {"#"};
foreach (var c in charsToRemove)
{
str = str.Replace(c, string.Empty);
}
int value = Convert.ToInt32(str);
string value2 = Convert.ToString(value, 2);
if (value2.Length < 16)
{
int zeroes = 16 - value2.Length;
if(zeroes == 12)
{
sw.WriteLine("000000000000" + value2);
}
}
else
{
sw.WriteLine(value2);
}
}
This code should help you get going real fast:
static void Main(string[] args)
{
string line = string.Empty;
System.IO.StreamReader reader = new System.IO.StreamReader(#"C:\test.txt");
System.IO.StreamWriter writer = new System.IO.StreamWriter(#"C:\test.hack");
while ((line = reader.ReadLine()) != null) // Read until there is nothing more to read
{
if (line.StartsWith("#"))
{
line = line.Remove(0, 1); // Remove '#'
}
int value = -1;
if (Int32.TryParse(line, out value)) // Check if the rest string is an integer
{
// Convert the rest string to its binary representation and write it to the file
writer.WriteLine(intToBinary(value));
}
else
{
// Couldn't convert the string to an integer..
}
}
reader.Close();
writer.Close();
Console.WriteLine("Done!");
Console.Read();
}
//http://www.dotnetperls.com/binary-representation
static string intToBinary(int n)
{
char[] b = new char[32];
int pos = 31;
int i = 0;
while (i < 32)
{
if ((n & (1 << i)) != 0)
{
b[pos] = '1';
}
else
{
b[pos] = '0';
}
pos--;
i++;
}
return new string(b);
}
My suggestion create a List<string>. Here are steps
Read input (.asm) file into List
Open StreamWriter for output (.hack) file.
Loop through List<string> modify the string and write into file.
Code Example:
List<string> lstInput = new List<string>();
using (StreamReader reader = new StreamReader(#"input.asm"))
{
string sLine = string.Empty;
//read one line at a time
while ((sLine = reader.ReadLine()) != null)
{
lstInput.Add(sLine);
}
}
using (StreamWriter writer = new StreamWriter(#"output.hack"))
{
foreach(string sFullLine in lstInput)
{
string sNumber = sFullLine;
//remove leading # sign
if(sFullLine.StartsWith("#"))
sNumber = sFullLine.Substring(1);
int iNumber;
if(int.TryParse(sNumber, out iNumber))
{
writer.WriteLine(IntToBinaryString(iNumber));
}
}
}
public string IntToBinaryString(int number)
{
const int mask = 1;
var binary = string.Empty;
while(number > 0)
{
// Logical AND the number and prepend it to the result string
binary = (number & 1) + binary;
number = number >> 1;
}
return binary;
}
Reference: IntToBinaryString method.
NOTE: Int to Binary String method mentioned in the answer of #TheDutchMan is better choice.

How to insert/remove hyphen to/from a plain string in c#?

I have a string like this;
string text = "6A7FEBFCCC51268FBFF";
And I have one method for which I want to insert the logic for appending the hyphen after 4 characters to 'text' variable. So, the output should be like this;
6A7F-EBFC-CC51-268F-BFF
Appending hyphen to above 'text' variable logic should be inside this method;
public void GetResultsWithHyphen
{
// append hyphen after 4 characters logic goes here
}
And I want also remove the hyphen from a given string such as 6A7F-EBFC-CC51-268F-BFF. So, removing hyphen from a string logic should be inside this method;
public void GetResultsWithOutHyphen
{
// Removing hyphen after 4 characters logic goes here
}
How can I do this in C# (for desktop app)?
What is the best way to do this?
Appreciate everyone's answer in advance.
GetResultsWithOutHyphen is easy (and should return a string instead of void
public string GetResultsWithOutHyphen(string input)
{
// Removing hyphen after 4 characters logic goes here
return input.Replace("-", "");
}
for GetResultsWithHyphen, there may be slicker ways to do it, but here's one way:
public string GetResultsWithHyphen(string input)
{
// append hyphen after 4 characters logic goes here
string output = "";
int start = 0;
while (start < input.Length)
{
output += input.Substring(start, Math.Min(4,input.Length - start)) + "-";
start += 4;
}
// remove the trailing dash
return output.Trim('-');
}
Use regex:
public String GetResultsWithHyphen(String inputString)
{
return Regex.Replace(inputString, #"(\w{4})(\w{4})(\w{4})(\w{4})(\w{3})",
#"$1-$2-$3-$4-$5");
}
and for removal:
public String GetResultsWithOutHyphen(String inputString)
{
return inputString.Replace("-", "");
}
Here's the shortest regex I could come up with. It will work on strings of any length. Note that the \B token will prevent it from matching at the end of a string, so you don't have to trim off an extra hyphen as with some answers above.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string text = "6A7FEBFCCC51268FBFF";
for (int i = 0; i <= text.Length;i++ )
Console.WriteLine(hyphenate(text.Substring(0, i)));
}
static string hyphenate(string s)
{
var re = new Regex(#"(\w{4}\B)");
return re.Replace (s, "$1-");
}
static string dehyphenate (string s)
{
return s.Replace("-", "");
}
}
}
var hyphenText = new string(
text
.SelectMany((i, ch) => i%4 == 3 && i != text.Length-1 ? new[]{ch, '-'} : new[]{ch})
.ToArray()
)
something along the lines of:
public string GetResultsWithHyphen(string inText)
{
var counter = 0;
var outString = string.Empty;
while (counter < inText.Length)
{
if (counter % 4 == 0)
outString = string.Format("{0}-{1}", outString, inText.Substring(counter, 1));
else
outString += inText.Substring(counter, 1);
counter++;
}
return outString;
}
This is rough code and may not be perfectly, syntactically correct
public static string GetResultsWithHyphen(string str) {
return Regex.Replace(str, "(.{4})", "$1-");
//if you don't want trailing -
//return Regex.Replace(str, "(.{4})(?!$)", "$1-");
}
public static string GetResultsWithOutHyphen(string str) {
//if you just want to remove the hyphens:
//return input.Replace("-", "");
//if you REALLY want to remove hyphens only if they occur after 4 places:
return Regex.Replace(str, "(.{4})-", "$1");
}
For removing:
String textHyphenRemoved=text.Replace('-',''); should remove all of the hyphens
for adding
StringBuilder strBuilder = new StringBuilder();
int startPos = 0;
for (int i = 0; i < text.Length / 4; i++)
{
startPos = i * 4;
strBuilder.Append(text.Substring(startPos,4));
//if it isn't the end of the string add a hyphen
if(text.Length-startPos!=4)
strBuilder.Append("-");
}
//add what is left
strBuilder.Append(text.Substring(startPos, 4));
string textWithHyphens = strBuilder.ToString();
Do note that my adding code is untested.
GetResultsWithOutHyphen method
public string GetResultsWithOutHyphen(string input)
{
return input.Replace("-", "");
}
GetResultsWithOutHyphen method
You could pass a variable instead of four for flexibility.
public string GetResultsWithHyphen(string input)
{
string output = "";
int start = 0;
while (start < input.Length)
{
char bla = input[start];
output += bla;
start += 1;
if (start % 4 == 0)
{
output += "-";
}
}
return output;
}
This worked for me when I had a value for a social security number (123456789) and needed it to display as (123-45-6789) in a listbox.
ListBox1.Items.Add("SS Number : " & vbTab & Format(SSNArray(i), "###-##-####"))
In this case I had an array of Social Security Numbers. This line of code alters the formatting to put a hyphen in.
Callee
public static void Main()
{
var text = new Text("THISisJUSTanEXAMPLEtext");
var convertText = text.Convert();
Console.WriteLine(convertText);
}
Caller
public class Text
{
private string _text;
private int _jumpNo = 4;
public Text(string text)
{
_text = text;
}
public Text(string text, int jumpNo)
{
_text = text;
_jumpNo = jumpNo < 1 ? _jumpNo : jumpNo;
}
public string Convert()
{
if (string.IsNullOrEmpty(_text))
{
return string.Empty;
}
if (_text.Length < _jumpNo)
{
return _text;
}
var convertText = _text.Substring(0, _jumpNo);
int start = _jumpNo;
while (start < _text.Length)
{
convertText += "-" + _text.Substring(start, Math.Min(_jumpNo, _text.Length - start));
start += _jumpNo;
}
return convertText;
}
}

Categories

Resources