Text de-justification to get correct substring

Text de-justification to get correct substring - c#

I have a text file which is made up with justify (all tabs aligned - different size).
Therefore I can't get the desired value at a certain column (substring).
Since this is a migration I can't change the format of the file.
How can I "de-justify" the text to spaces while preserving the spacing length, any scripts out there?
I need the upper value. replacing \t with a fixed value doesn't help.
*EDIT: files seems to be formatted with fmt
*EDIT2: Solution found it seems; when I use fmt on windows (coreutils) it stays the same.
However on my mac I get the desired result (maybe something in the win command not setup right).
fmt original_file >> new.txt
Grts

There's probably lots of other ways to do this, but sublime text has built in tab to space conversion.
http://css-tricks.com/changing-spaces-tabs-sublime-text/

Related

Read specific parts of ASCII file in C#

I am trying to make a FXB file previewer (VST preset banks for those who don't know) for Sylenth1 banks. I have encoded the FXB as an ASCII string and had it print to the console. The preset names show up fine. My issue is that the parameters for the oscillators, filters and effects are encoded as random characters (mainly "?" and fairly big spaces).
Underlined in red: file header (?)
Underlined in blue: preset name (which I want to keep)
Underlined in yellow: osc/FX/filter parameters (which I want to discard from the string)
Here's the code I wrote:
byte[] arr = File.ReadAllBytes(Properties.Resources.pointer); /* pointer is a string in resources I
used to point to the external FXB file for testing */
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
string fstr = enc.GetString(arr);
Console.Write(fstr);
Console.ReadKey();
I had written a foreach loop to replace every unwanted character with string.Empty, but it also removes parts of the preset names (e.g. the L from "Lead"), leaves the spaces intact and creates new ones, so I deleted it.
My end goal for those that are curious is this:
Preset 1
Preset 2
Preset 3
Preset 4
...
I'm at a total loss. I've tried different solutions from various websites and Stack Overflow posts, but none gave me the desired result.
(I also noticed that the preset names have almost the same space between them (~ 200 chars apart), can I use the difference to exclude the unwanted parts?)

It looks like a binary file not ascii. Some data in the file is easily readable because it is ASCII encoded, but other data, for example numbers, are encoded in their binary format.
Not all binary data can be converted to printable ASCII characters, so when you print it out like this you get the ???? mess.
It is better to read this file using a binary editor. Visual studio has one, there is probably an extension for vs code, other editors have a binary viewer (e.g. sublime). This will show you data in the file as it is encoded, usually using hex with the ascii in a second column.
But that is just so you can accurately see the content. It does not help you for understanding the meaning or the layout. You might be able to make something work by reverse engineering like this, but chances are it will not work for all cases. Using and API is going to be way easier.
I'm not familiar with these files but did you find this? https://new.steinberg.net/developers/ There is a forum there that might help.

I found the answer to this myself. I basically somewhat reverse engineered the FXB in a hex editor, and proceeded to load specific bytes of the file (31 to be exact) in order to encode those in a string and have that print to the console.
I managed to do so by literally counting how many bytes there are from the beginning to the 1st preset name, then from the end of the preset name (31 bytes) to the beginning of the other preset name, and so on.
For those who are interested, I am going to develop a GUI version of it in the future. But it does (and probably will) support only Sylenth1 v2 soundbanks/FXBs.
Also thanks to the people who reached out. They helped in their own way.

Why does the Notepad++ [NULL] character not paste?

I am new to this site, and I don't know if I am providing enough info - I'll do my best =)
If you use Notepad++, then you will know what I am talking about -- When a user loads a .exe into Notepad++, the NUL / \x0 character is replaced by NULL, which has a black background, and white text. I tried pasting it into Visual Studio, hoping to obtain the same output, but it just pasted some spaces...
Does anyone know if this is a certain key-combination, or something? I would like to put the NULL character in replacement of \x0, just like Notepad++ =)

Notepad++ is a rich text editor unlike your regular notepad. It can display custom graphics so common in all modern text editors. While reading a file whenever notepad++ encounters the ASCII code of a null character then instead of displaying nothing it adds the string "NULL" to the UI setting the text background colour to black and text colour to white which is what you are seeing. You can show any custom style in your rich text editor too.
NOTE: This is by no means an efficient solution. I'm clearly traversing a read string 2 times just to take benefit of already present methods. This can be done manually in a single pass. It is just to give a hint about how you can do it. Also I wrote the code carefully but haven't ran it because I don't have the tools at the moment. I apologise for any mistakes let me know I'll update it
Step 1 : Read a text file by line (line ends at '\n') and replace all instances of null character of that line with the string "NUL" using the String.Replace(). Finally append the modified text to your RichTextBox.
Step 2 : Re traverse your read line using String.IndexOf() finding start indexes of each "NUL" word. Using these indexed you select text from RichTextBox and then style that selected text using RichTextBox.SelectionColor and RichTextBox.SelectionBackColor
richTextBoxCursor basically just represents the start index of each line in RichTextBox
StreamReader sr = new StreamReader(#"c:\test.txt" , Encoding.UTF8);
int richTextBoxCursor = 0;
while (!sr.EndOfStream){
richTextBoxCursor = richTextBox.TextLength;
string line = sr.ReadLine();
line = line.Replace(Convert.ToChar(0x0).ToString(), "NUL");
richTextBox.AppendText(line);
i = 0;
while(true){
i = line.IndexOf("NUL", i) ;
if(i == -1) break;
// This specific select function select text start from a certain start index to certain specified character range passed as second parameter
// i is the start index of each found "NUL" word in our read line
// 3 is the character range because "NUL" word has three characters
richTextBox.Select(richTextBoxCursor + i , 3);
richTextBox.SelectionColor = Color.White;
richTextBox.SelectionBackColor = Color.Black;
i++;
}
}

Notepad++ may use custom or special fonts to show these particular characters. This behavior also may not appropriate for all text editors. So, they don't show them.
If you want to write a text editor that visualize these characters, you probably need to implement this behavior programmatically. Seeing notepad++ source can be helpful If you want.

Text editor
As far as I know in order to make Visual Studio display non printable characters you need to install an extension from the marketplace at https://marketplace.visualstudio.com.
One such extension, which I have neither tried nor recomend - I just did a quick search and this is the first result - is
Invisible Character Visualizer.
Having said that, copy-pasting binaries is a risky business.
You may try Edit > Advanced > View White Space first.
Binary editor
To really see what's going on you could use the VS' binary editor: File->Open->(Open with... option)->Binary Editor -> OK

To answer your question.
It's a symbolic representation of 00H double byte.
You're copying and pasting the values. Notepad++ is showing you symbols that replace the representation of those values (because you configured it to do so in that IDE).

Mixture of RTL and LTR in windows file name

I am trying to write a file to disk (in Windows) that contains both RTL (right to left) and LTR (left to right) text.
The filename is composed from different bits of data eg:
{data_part1} - {data_part2} - {data_part3} - {data_part4}.{extension}
Any of the data parts could be RTL or LTR.
What I have noticed is that if data part2 is RTL and data part3 is numeric, data part 2 appears in the position where data part 3 should be and also causes data part 3 to be printed RTL.
If data part 3 is non-numeric (ie a word such as 'hello') this problem doesn't occur.
However, if I copy that file name and paste it into word it appears correctly?
Which implies that Windows explorer is not displaying the text correctly
I have tried using the POP DIRECTIONAL FORMATTING character but that hasn't made a difference.
Has Anyone else had this issue and does anyone have any ideas of getting around it?

string fileName = '\u200E'+ dataPart1+ '\u200E'+ dataPart2+ ... '\u200E'+ dataPart4+ extension

Why String element orders in VS IDE debug mode is different with memory order in C#?

Can anyone say why string character orders is variant with immediate window ?
https://i.stack.imgur.com/sDv0y.jpg

Based on the picture you are talking about showing string with RTL text in it. Correct visual representation of such text (especially with LTR characters like "?" mixed in) is different from in memory order.
The only reliable way I know to see characters in order they are stored is to convert to char or int array and look or print that array. All other ways may try to render string following at least some of RTL rules.

Outputting Programmatically to MSword; sensing end of line

I'm trying to use the MSWord Interop Library to write a C# application that outputs specially formated text (isolated arabic letters) to a file. The problem I'm running into is determining how many characters remain before the text wraps onto a new line. I need the words to be on the same line, without wrapping, which is the default behavior. I'm finding this difficult because when I have the Arabic letters of the word isolated with spaces, they are treated as individual characters and therefore behave differently then connected words.
Any help is appreciated. Thanks.

Add each character to your range and then check the number of lines in the range
LineCount = range.ComputeStatistics(Word.WdStatistic.wdStatisticLines);
When the line count changes, you know it has been wrapped, and can remove the last character or reformat accordingly

Actually I don't know how this behaves today, but I've written something for the MSWork API when I was facing a somewhat weird fact. Actually you can't find that out. In MSWord, text in a document is always in paragraphs.
If you input text to your document, you won't get it in a page only, but this page will at least contain a paragraph for the text you wrote into it.
Unfortunately I can't figure this out again, because I don't have a license for MS Word these day.
Give it a try and look at the problem again in this way.
Hope this helps, and if not, please provide the code that generates the input and the exact version of MSWord.
Greetings,
Kjellski

I'm not sure what "Arabic letters of the word isolated with spaces" means exactly, but I assume that non breaking space is what you need.
Here's more details.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Text de-justification to get correct substring - c#

There's probably lots of other ways to do this, but sublime text has built in tab to space conversion. http://css-tricks.com/changing-spaces-tabs-sublime-text/

Related

Read specific parts of ASCII file in C#

Why does the Notepad++ [NULL] character not paste?

Mixture of RTL and LTR in windows file name

Why String element orders in VS IDE debug mode is different with memory order in C#?

Outputting Programmatically to MSword; sensing end of line

Categories

Resources