I'm working on my own basic syntax highlight editor in C#. I've already completed the automatic coloring of keywords, functions etc etc. I don't need any other fancy stuff like automatic code indentation.
However, I do wish to have a code minify / maxify button. Nothing fancy. I just want it to automaticly set a newline before any opening bracket and one behind it with either tab characters or changing the SelectionIndent Property.
So something like this:
test { test { test } test }
Becomes:
test
{
test
{
test
}
}
And of course the minify button should do the exact opposite, putting everything on 1 line.
I've already tried working with the Regex.Replace Method. I didn't quite get it to work, but thinking about that approach, it would cause issues if the opening and closing brackets get mixed up. Anyway, this is what I had untill I gave up and decided to ask you guys for some help:
string tabs = "";
private void btnMax_Click(object sender, EventArgs e)
{
var count = codeRichTextBox.Text.Count(x => x == '{');
for(int i=1; i<= count; i++)
{
// The idea was to add \t to tabs here on each iteration
}
string pattern = "{";
string replacement = "\n{\n\t";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(codeRichTextBox.Text, replacement);
codeRichTextBox.Text = result;
}
Obviously that solution is the wrong approach and isn't going to work. So what should I do instead?
Edit: Although it would be nice, it doesn't have to take into account that part of the string already has code indentation. The maxify button only needs to work on a string that's on a single line.
My idea: You'll need to parse the text, counting the current nesting level of { and } .
For each { or } found, decide on the proper whitespace-string-before (prefix) and whitespace-string-after (suffix) based on the current nesting level (for example just \n { \n for the first level).
See if the desired prefix is already there. If not, delete any existing whitespace then add the prefix. Do the same for the suffix.
Related
I am currently struggling to get my regex to match both everything between two strings and then match multiple lines inside of this first match.
So I am trying to go from this
if
{
// A comment
foo(c.AAA);
bar(c.AAA);
foobar(c.AAA);
}
else
{
foo(c.AAA);
bar(c.AAA);
foobar(c.AAA);
}
To this
if
{
// A comment
foo(c.BBB);
bar(c.BBB);
foobar(c.BBB);
}
else
{
foo(c.AAA);
bar(c.AAA);
foobar(c.AAA);
}
I am able to match everything between the comment and the word ELSE.
But I then want to be able to Match "c.AAA" and replace it with "c.BBB" in a bulk way.
Any Help would be appreciated!
Edit: For clarity I just wanted to add that the code I am specifically using is c# and the find and replace is happening across a large number of files. I didn't mention it earlier as I am still interested in finding if this is possible with regex
Refactor your code
auto arg = c.AAA;
if (xyz) {
arg = c.BBB;
}
foo(arg);
bar(arg);
foobar(arg);
Edit
For C# you could write
var arg = c.AAA;
if (xyz) {
arg = c.BBB;
}
foo(arg);
bar(arg);
foobar(arg);
Edit 2
It can be done with Regex but I could not get it working with VSC.
With Notepad++ the regex search with negated character set includes the newline.
In Notepad++ you have Find in Files in the Find/Replace dialog and it shows the number of replacements made in the files. Select the checkbox Follow current doc and maybe In all sub-folders
Find: (// A comment[^}]+?)c\.AAA
Replace: \1c.BBB
Search Mode: Regular Expression
Apply this Find/Replace until the number of replacements is 0
I don't know of a method to achieve this with a single regex, but you could get it done with a small Python script.
First, you would split your text into chunks using the two strings as delimiters:
import re
text = """if
{
// A comment
foo(c.AAA);
bar(c.AAA);
foobar(c.AAA);
}
else
{
foo(c.AAA);
bar(c.AAA);
foobar(c.AAA);
}
"""
chunks = re.split(r'(// A comment|else)', text)
Which would give:
['\nif\n{\n ', '// A comment', ' \n foo(c.AAA);\n bar(c.AAA);\n foobar(c.AAA);\n}\n', 'else', ' \n{\n foo(c.AAA);\n bar(c.AAA);\n foobar(c.AAA);\n}\n']
And then you can use a loop to modify the part between the delimiters and produce the final string:
output = chunks[0]
prev = chunks[0]
for c in chunks[1:]:
if prev == "// A comment":
c = c.replace("c.AAA", "c.BBB")
output += c
prev = c
print(output)
I've been trying to understand how XML and CSV parsing work, without actually writing any code yet. I might have to parse a .csv file in the ongoing project and I'd like to be ready. (I'll have to convert them to .ofx files)
I'm also aware there's probably a thousand XLM and csv parsers out there, so I'm more curious than I am worried. I intend on using the XMLReader that I believe microsoft provides.
Let's say I have the following .csv file
02/02/2016 ; myfirstname ; mylastname ; somefield ; 321654 ; commentary ; blabla
Sometimes a field will be missing. Which means, for the sake of the example, that the lastname isn't mandatory, and somefield could be right after the first name.
My questions are :
How do I avoid the confusion between somefield and lastname?
I could count the total number of fields, but in my situation two are optional, if there is only one missing, I can't be sure which one it is.
How do I avoid false "tags"? I mean, if the user first comment includes a ;, how can I be sure it's a part of his comment and not the start of the following tag?
Again, I could count the remaining fields and find out where I am, but that excludes the optional fields problem.
My questions also apply to XML, what can I do if the user starts writing XML in his form ? Wether I decide to export the form as .csv or .xml, there can be trouble.
Right now I'm on the assumption that the c# Xml reader/parser are awesome enough to deal with it ; and if they are, I'm really curious on the how.
Assuming the CSV/XML data has been exported properly, none of this will be a problem. Missing fields will be handled by repeated separators:
02/02/2016;myfirstname;;somefield
Semi-colons within a field will normally be handled by quoting:
02/02/2016;"myfirst;name";
Quotes are escaped within a string:
02/02/2016;"my""first""name";
With XML it's even less of an issue since the tags or attributes will all have names.
If your CSV data is NOT well-formed, then you have a much bigger problem, as it may be impossible to distinguish missing fields and non-quoted separators.
How do I avoid false "tags"? String values should be quoted if the (can) contain separator characters. If you create the CSV file, quote and unquote all string values.
How do I avoid the confusion between somefield and lastname? No general solution for this, all case must be handled one by one. Can a general algorithm decide wheather first name or last name is missing? No.
If you know what field(s) can be omitted, you can write an "intelligent" handling.
Use XML and all of your problem will be solved.
Fisrt
How do I avoid the confusion between somefield and lastname?
There is no way to do this without change the logic of file. For example: when "mylastname" is empty You may have a "" value, empty string or like this ;;
How do I avoid false "tags"? I mean, if the user first comment includes a ;, how can I be sure it's a part of his comment and not the start of the following tag?
It is simple you have to file like this:
; - separor of columns
"" - delimetr of columns
value;value;"value;;;;value";value
To split this only for separtor ; without the separator in "" this code do this is tested and compiled
public static string[] SplitWithDelimeter(this string line, char separator, char checkSeparator, bool eraseCheckSeparator)
{
var separatorsIndexes = new List<int>();
var open = false;
for (var i = 0; i < line.Length; i++)
{
if (line[i] == checkSeparator)
{
open = !open;
}
if (!open && line[i] == separator )
{
separatorsIndexes.Add(i);
}
}
separatorsIndexes.Add(line.Length);
var result = new string[separatorsIndexes.Count];
var first = 0;
for (var j = 0; j < separatorsIndexes.Count; j++)
{
var tempLine = line.Substring(first, separatorsIndexes[j] - first);
result[j] = eraseCheckSeparator ? tempLine.Replace(checkSeparator, ' ').Trim() : tempLine;
first = separatorsIndexes[j] + 1;
}
return result;
}
Return would be:
value
value
"value;;;;value"
value
I have some data and I want to write them to a specific line in notepad using C#.
For example I have two textboxes and the data inside them are "123 Hello", for textBox1, and "565878 Hello2" for textBox2.
When I press SAVE button, those data will be saved into one file but with different line. I want to save the first data in the first line and the second data in the third line.
How can I do this?
This question is too broad. The simple answer is that you write the two lines to a file, but write a newline (either "\r\n" or Environment.NewLine) between each string. That will put the two strings on different lines. If you want the second string on the third line, then you should write two newlines between each string.
If neither of those are the answer, then you need to be a lot more specific about why not. Is the file empty to start with? What have you tried? Where, specifically, are you getting stuck? What platform?
And I really don't see what this has to do with NotePad.
EDIT:
You have clarified that you are starting with an existing text file and want to replace the content at the specified lines.
This is a more complex thing to do, and may be beyond your skills if you are just starting out. The basic approach is this:
Assuming you can read the entire file into memory, load the file into a string. You will have to parse new lines to find the lines you want to replace. You can then just replace those parts of the string with the new data. When finished, write the file back to disk.
If the file is too big to load into memory, then it becomes much more complex. I'm sorry, but since you've done such a poor job of describing the issue, I'm not going to the trouble of going over the details for this case. And such a task probably falls outside the scope of a stackoverflow answer any way.
If you line numbers are not fixed you can do something like below:
class Program
{
private static void Main()
{
var data = "";
const string data1 = "Data1";//First Data
const string data2 = "Data2";//Second Data
const int line1 = 1;//First Data Line
const int line2 = 3;//Second Data Line
var maxNoOfLines = Math.Max(line1, line2);
for (var i = 1; i <= maxNoOfLines; i++)
{
if (i == line1)
{
data += data1 + Environment.NewLine;
}
else if (i == line2)
{
data += data2 + Environment.NewLine;
}
else
{
data += Environment.NewLine;
}
}
File.WriteAllText(#"C:\NOBACKUP\test.txt", data);
}
}
Otherwise if line numbers are fixed it will be much more simpler. You can just remove the loop from above and hardcode the values.
Since I have not been able to find an resolution via my searching endeavor, I believe I may have a unique problem. Essentially I am creating a gene finding/creation application in c#.NET for my wife and am using RichTextBoxes for her to be able to highlight, color, export, etc the information she needs. I have made several custom methods for it because, as I am sure we all know, RichTextBoxes from Microsoft leave much to the imagination.
Anyway, here is my issue: I need to be able to search for a term across hard returns. The users have strings in 60 letter intervals and they need to search for items that may cross that hard return barrier. For instance let's say I have 2 lines (I will make them short for simplicity):
AAATTTCCCGGG
TTTCCCGGGAAA
If the user runs a search for GGGTTT, I need to be able to pull the result even though there is a line break/hard return in there. For the life of me I cannot think of a good way to do this and still select the result in the RichTextBox. I can always find the result but getting a proper index for the RichTextBox is what eludes me.
If needed I am not against richTextBox.SaveFile() and LoadFile() and parsing the rtf text as a string manually. It doesnt have to be pretty, in this case, it just has to work.
I appreciate any help/guidance you may give.
Here is a relevant snippet:
//textbox 2 search area (examination area)
private void button5_Click(object sender, EventArgs e)
{
textBox3.Text = textBox3.Text.ToUpper();
if (textBox3.Text.Length > 0)
{
List<string> lines = richTextBox2.Lines.ToList();
string allText = "";
foreach (string line in lines)
allText = allText + line.Replace("\r", "").Replace("\n", "");
if (findMultiLineRTB2(allText, textBox3.Text) != -1)
{
richTextBox2.Select(lastMatchForRTB2, textBox3.Text.Length);
richTextBox2.SelectionColor = System.Drawing.Color.White;
richTextBox2.SelectionBackColor = System.Drawing.Color.Blue;
}//end if
else
MessageBox.Show("Reached the end of the sequence", "Finished Searching");
}//end if
}//end method
private int findMultiLineRTB2(string rtbText, string searchString)
{
lastMatchForRTB2 = rtbText.IndexOf(searchString, lastMatchForRTB2 + 1);
return lastMatchForRTB2;
}
So i make an assumption: you want to search a word across all lines where each line is 60 characters long. The desired result is the index of that word.
You just have to build a string that has no line breaks, for example with string.Join:
string allText = string.Join("", richTextBox.Lines);
int indexOf = allText.IndexOf("GGGTTT"); // 9 in your sample
I wrote a C# program to read an Excel .xls/.xlsx file and output to CSV and Unicode text. I wrote a separate program to remove blank records. This is accomplished by reading each line with StreamReader.ReadLine(), and then going character by character through the string and not writing the line to output if it contains all commas (for the CSV) or all tabs (for the Unicode text).
The problem occurs when the Excel file contains embedded newlines (\x0A) inside the cells. I changed my XLS to CSV converter to find these new lines (since it goes cell by cell) and write them as \x0A, and normal lines just use StreamWriter.WriteLine().
The problem occurs in the separate program to remove blank records. When I read in with StreamReader.ReadLine(), by definition it only returns the string with the line, not the terminator. Since the embedded newlines show up as two separate lines, I can't tell which is a full record and which is an embedded newline for when I write them to the final file.
I'm not even sure I can read in the \x0A because everything on the input registers as '\n'. I could go character by character, but this destroys my logic to remove blank lines.
I would recommend that you change your architecture to work more like a parser in a compiler.
You want to create a lexer that returns a sequence of tokens, and then a parser that reads the sequence of tokens and does stuff with them.
In your case the tokens would be:
Column data
Comma
End of Line
You would treat '\n' ('\x0a') by its self as an embedded new line, and therefore include it as part of a column data token. A '\r\n' would constitute an End of Line token.
This has the advantages of:
Doing only 1 pass over the data
Only storing a max of 1 lines worth of data
Reusing as much memory as possible (for the string builder and the list)
It's easy to change should your requirements change
Here's a sample of what the Lexer would look like:
Disclaimer: I haven't even compiled, let alone tested, this code, so you'll need to clean it up and make sure it works.
enum TokenType
{
ColumnData,
Comma,
LineTerminator
}
class Token
{
public TokenType Type { get; private set;}
public string Data { get; private set;}
public Token(TokenType type)
{
Type = type;
}
public Token(TokenType type, string data)
{
Type = type;
Data = data;
}
}
private IEnumerable<Token> GetTokens(TextReader s)
{
var builder = new StringBuilder();
while (s.Peek() >= 0)
{
var c = (char)s.Read();
switch (c)
{
case ',':
{
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
yield return new Token(TokenType.Comma);
break;
}
case '\r':
{
var next = s.Peek();
if (next == '\n')
{
s.Read();
}
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
yield return new Token(TokenType.LineTerminator);
break;
}
default:
builder.Append(c);
break;
}
}
s.Read();
if (builder.Length > 0)
{
yield return new Token(TokenType.ColumnData, ExtractText(builder));
}
}
private string ExtractText(StringBuilder b)
{
var ret = b.ToString();
b.Remove(0, b.Length);
return ret;
}
Your "parser" code would then look like this:
public void ConvertXLS(TextReader s)
{
var columnData = new List<string>();
bool lastWasColumnData = false;
bool seenAnyData = false;
foreach (var token in GetTokens(s))
{
switch (token.Type)
{
case TokenType.ColumnData:
{
seenAnyData = true;
if (lastWasColumnData)
{
//TODO: do some error reporting
}
else
{
lastWasColumnData = true;
columnData.Add(token.Data);
}
break;
}
case TokenType.Comma:
{
if (!lastWasColumnData)
{
columnData.Add(null);
}
lastWasColumnData = false;
break;
}
case TokenType.LineTerminator:
{
if (seenAnyData)
{
OutputLine(lastWasColumnData);
}
seenAnyData = false;
lastWasColumnData = false;
columnData.Clear();
}
}
}
if (seenAnyData)
{
OutputLine(columnData);
}
}
You can't change StreamReader to return the line terminators, and you can't change what it uses for line termination.
I'm not entirely clear about the problem in terms of what escaping you're doing, particularly in terms of "and write them as \x0A". A sample of the file would probably help.
It sounds like you may need to work character by character, or possibly load the whole file first and do a global replace, e.g.
x.Replace("\r\n", "\u0000") // Or some other unused character
.Replace("\n", "\\x0A") // Or whatever escaping you need
.Replace("\u0000", "\r\n") // Replace the real line breaks
I'm sure you could do that with a regex and it would probably be more efficient, but I find the long way easier to understand :) It's a bit of a hack having to do a global replace though - hopefully with more information we'll come up with a better solution.
Essentially, a hard-return in Excel (shift+enter or alt+enter, I can't remember) puts a newline that is equivalent to \x0A in the default encoding I use to write my CSV. When I write to CSV, I use StreamWriter.WriteLine(), which outputs the line plus a newline (which I believe is \r\n).
The CSV is fine and comes out exactly how Excel would save it, the problem is when I read it into the blank record remover, I'm using ReadLine() which will treat a record with an embedded newline as a CRLF.
Here's an example of the file after I convert to CSV...
Reference,Name of Individual or Entity,Type,Name Type,Date of Birth,Place of Birth,Citizenship,Address,Additional Information,Listing Information,Control Date,Committees
1050,"Aziz Salih al-Numan
",Individual,Primary Name,1941 or 1945,An Nasiriyah,Iraqi,,Ba’th Party Regional Command Chairman; Former Governor of Karbala and An Najaf Former Minister of Agriculture and Agrarian Reform (1986-1987),Resolution 1483 (2003),6/27/2003,1518 (Iraq)
1050a,???? ???? ???????,Individual,Original script,1941 or 1945,An Nasiriyah,Iraqi,,Ba’th Party Regional Command Chairman; Former Governor of Karbala and An Najaf Former Minister of Agriculture and Agrarian Reform (1986-1987),Resolution 1483 (2003),6/27/2003,1518 (Iraq)
As you can see, the first record has an embedded new-line after al-Numan. When I use ReadLine(), I get '1050,"Aziz Salih al-Numan' and when I write that out, WriteLine() ends that line with a CRLF. I lose the original line terminator. When I use ReadLine() again, I get the line starting with '1050a'.
I could read the entire file in and replace them, but then I'd have to replace them back afterwards. Basically what I want to do is get the line terminator to determine if its \x0a or a CRLF, and then if its \x0A, I'll use Write() and insert that terminator.
I know I'm a little late to the game here, but I was having the same problem and my solution was a lot simpler than most given.
If you are able to determine the column count which should be easy to do since the first line is usually the column titles, you can check your column count against the expected column count. If the column count doesn't equal the expected column count, you simply concatenate the current line with the previous unmatched lines. For example:
string sep = "\",\"";
int columnCount = 0;
while ((currentLine = sr.ReadLine()) != null)
{
if (lineCount == 0)
{
lineData = inLine.Split(new string[] { sep }, StringSplitOptions.None);
columnCount = lineData.length;
++lineCount;
continue;
}
string thisLine = lastLine + currentLine;
lineData = thisLine.Split(new string[] { sep }, StringSplitOptions.None);
if (lineData.Length < columnCount)
{
lastLine += currentLine;
continue;
}
else
{
lastLine = null;
}
......
Thank you so much with your code and some others I came up with the following solution! I have added a link at the bottom to some code I wrote that used some of the logic from this page. I figured I'd give honor where honor was due! Thanks!
Below is a explanation about what I needed:
Try This, I wrote this because I have some very large '|' delimited files that have \r\n inside of some of the columns and I needed to use \r\n as the end of the line delimiter. I was trying to import some files using SSIS packages but because of some corrupted data in the files I was unable to. The File was over 5 GB so it was too large to open and manually fix. I found the answer through looking through lots of Forums to understand how streams work and ended up coming up with a solution that reads each character in a file and spits out the line based on the definitions I added into it. this is for use in a Command Line Application, complete with help :). I hope this helps some other people out, I haven't found a solution quite like it anywhere else, although the ideas were inspired by this forum and others.
https://stackoverflow.com/a/12640862/1582188