I want to write Translit.net but on autohotkey. So I succesfully done with the part where I have only one letter:
:*:a::а
:*:b::б
:*:v::в
:*:g::г
:*:d::д
...
But now I have a problem with the translation of "shh" to "щ" and other 'two to one' char translations. When I start typing shh i get схх back, but I want to get щ. What could I do?
My current idea: When I press a key it should write down the letter and add non translated letter to a 3 element array and check if the array elements create a shh ,ch, sh or any other combination larger than one. Then I could remove the last 3 or 2 typed letter and send a russian letter what I need. Maybe someone know an easier way to do that. I want my script to work exactly like that page I posted. A solution in C or C# instead of AutoHotkey would help me too.
I have the same problem, while using the unicode version of Autohotkey, but only if the file is saved in UTF-8 without BOM format.
Saving the file as UNICODE (UCS-2, must be Little Endian) solves the problem.
It also works with UTF-8 with BOM, so apparently autohotkey has truble determining endianness on its own.
Related
So:
The C# compiler outputs the (line,column) style location.
The Roslyn API expects sequential text location
How to map the former to the latter?
The C# code could be UTF8 with or without the BOM or even UTF16. It could contain all kinds of characters in the form of comments or embedded strings.
Let us assume we know the encoding and have the respective Encoding object handy. I can convert the file bytes to char[]. The problem is that some chars may contribute zero to the final sequential position. I know that the BOM character does. I have no idea if others may too.
Now, if we know for sure that BOM is the only character that contributes 0 to the length, then I can skip it and count the characters and my question becomes trivial. This is what I do today - I just assume that the BOM is the only "bad" player.
But maybe there is a better way? Maybe Roslyn API contains some hidden gem that knows for a change to accept (line,column) and spit the sequential position? Or maybe some of the Microsoft.Build libraries?
EDIT 1
As per the accepted answer the following gives the location:
var srcText = SourceText.From(File.ReadAllText(err.FilePath));
int location = srcText.Lines[err.Line - 1].Start + err.Column - 1;
You have uncovered the reason that the SourceText type exists in the roslyn apis. Its entire purpose is to handle encoding of strings and preform calculations of lines, columns, and spans.
Due to the way .NET handles unicode and depending on which code pages are installed in your OS there could be cases that SourceText does not do what you need. It has generally proven "good enough" for our purposes though.
I am trying to make a FXB file previewer (VST preset banks for those who don't know) for Sylenth1 banks. I have encoded the FXB as an ASCII string and had it print to the console. The preset names show up fine. My issue is that the parameters for the oscillators, filters and effects are encoded as random characters (mainly "?" and fairly big spaces).
Underlined in red: file header (?)
Underlined in blue: preset name (which I want to keep)
Underlined in yellow: osc/FX/filter parameters (which I want to discard from the string)
Here's the code I wrote:
byte[] arr = File.ReadAllBytes(Properties.Resources.pointer); /* pointer is a string in resources I
used to point to the external FXB file for testing */
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
string fstr = enc.GetString(arr);
Console.Write(fstr);
Console.ReadKey();
I had written a foreach loop to replace every unwanted character with string.Empty, but it also removes parts of the preset names (e.g. the L from "Lead"), leaves the spaces intact and creates new ones, so I deleted it.
My end goal for those that are curious is this:
Preset 1
Preset 2
Preset 3
Preset 4
...
I'm at a total loss. I've tried different solutions from various websites and Stack Overflow posts, but none gave me the desired result.
(I also noticed that the preset names have almost the same space between them (~ 200 chars apart), can I use the difference to exclude the unwanted parts?)
It looks like a binary file not ascii. Some data in the file is easily readable because it is ASCII encoded, but other data, for example numbers, are encoded in their binary format.
Not all binary data can be converted to printable ASCII characters, so when you print it out like this you get the ???? mess.
It is better to read this file using a binary editor. Visual studio has one, there is probably an extension for vs code, other editors have a binary viewer (e.g. sublime). This will show you data in the file as it is encoded, usually using hex with the ascii in a second column.
But that is just so you can accurately see the content. It does not help you for understanding the meaning or the layout. You might be able to make something work by reverse engineering like this, but chances are it will not work for all cases. Using and API is going to be way easier.
I'm not familiar with these files but did you find this? https://new.steinberg.net/developers/ There is a forum there that might help.
I found the answer to this myself. I basically somewhat reverse engineered the FXB in a hex editor, and proceeded to load specific bytes of the file (31 to be exact) in order to encode those in a string and have that print to the console.
I managed to do so by literally counting how many bytes there are from the beginning to the 1st preset name, then from the end of the preset name (31 bytes) to the beginning of the other preset name, and so on.
For those who are interested, I am going to develop a GUI version of it in the future. But it does (and probably will) support only Sylenth1 v2 soundbanks/FXBs.
Also thanks to the people who reached out. They helped in their own way.
I am currently working on a little dictionary app for Korean in C# (which I am trying to learn). I would like to add a feature where a conjugation chart is given with all basic verb forms for a certain verb. To ensure the verbs are conjugated correctly I have to check wether a verb is irregular. To do this I have to check wether a verb stem ends with a certain character or not.
The problem is, however, that a computer sees an entire syllable of a Korean word as a character, not the individual 2 or 3 letters that form that specific syllable, but I need to compare the final letter of a syllable to do it correctly.
For example the Korean verb 춥다 is an irregular verb and we can tell because the verb stem 춥 has ㅂ as the final letter. Yet 춥 is the char, not ㅂ in the case of the verb stem. So this does not work:
verbStem = "춥";
verbStem.EndsWith("ㅂ");
I am currently a bit puzzled on how to make this work and thus I would be quite happy if I could get some directions.
Using the popular Korean Q&A service 지식IN (link to orginal answer) I was able to find the answer to my question. I am so grateful to.
The first step is to seperate the individual letters by normalizing the string. This is done using Normalize method:
string a = "안녕";
string b = a.Normalize( System.Text.NormalizationForm.FormKD);
When using the Normalize method with the Korean string it will be split into its individual component unicode characters.
However, the extremely helpful answer at 지식IN did not stop there with helping me with directions. It pointed out I needed to be aware that even when it has been split there is a different unicode for characters depending whether it is in the initial possition or not and thus I will have to use the appropriate unicode for it. ('ᄋ' is different from 'ᆼ') The unicodes for these are found at Hangul Jamo (Unicode block).
I am so glad someone managed to answer this question for me, but I felt I ought to write out the answer at Stackoverflow as well since you might never know someone else might want to learn how to do something similar.
First of all I'd like to mention that I'm new to programming and this sight so I'm still an infant in this world, however, I have a problem.
I have to make code that can compare two strings but the second string (from a file) will have unique identifiers within it. For example:
first string:
I have 10 cats and their fur is #000000
Second string from a file:
I have <d> cats and their fur is <h>
Although I probably don't need to explain, 'd' is for numbers or decimal and 'h' for hex. There are also 's' and 'a' associated to ASCII.
What's supposed to happen is that the first string can have any different number which can be of different length and/or Hex when the data comes in but the rest of the message stays the same, E.G.
I have 1500 cats and their fur is #000000
the code will still match the two strings as True matches as it'll effectively ignore anything that is an int and hex. (this identifiers are User defined so they can be anywhere in any string).
The end game is that if it finds a relative match the code will change the colour of the text in the app among other things. it's basically to highlight errors in a log file.
I've searched High an low on Stackflow and looked into Regex and string comparisons. I'm currently going to make a start on the code, however, would like some input/help.
Obviously I'm not asking for something to be written for me, just to be pointed in the right direction so I can learn.
Many thanks in advance! And apologies if there is a similar post out there, but alas I couldn't find it if there is.
If I understand it correctly I think I would solve this by replacing the <d> etc. by a RegEx expression. Then use that RegEx to replace the values by an empty string. That way you can compare them without the values.
Hope that makes sense. I didn't include any code because you asked for just some directions.
I'm trying to use the MSWord Interop Library to write a C# application that outputs specially formated text (isolated arabic letters) to a file. The problem I'm running into is determining how many characters remain before the text wraps onto a new line. I need the words to be on the same line, without wrapping, which is the default behavior. I'm finding this difficult because when I have the Arabic letters of the word isolated with spaces, they are treated as individual characters and therefore behave differently then connected words.
Any help is appreciated. Thanks.
Add each character to your range and then check the number of lines in the range
LineCount = range.ComputeStatistics(Word.WdStatistic.wdStatisticLines);
When the line count changes, you know it has been wrapped, and can remove the last character or reformat accordingly
Actually I don't know how this behaves today, but I've written something for the MSWork API when I was facing a somewhat weird fact. Actually you can't find that out. In MSWord, text in a document is always in paragraphs.
If you input text to your document, you won't get it in a page only, but this page will at least contain a paragraph for the text you wrote into it.
Unfortunately I can't figure this out again, because I don't have a license for MS Word these day.
Give it a try and look at the problem again in this way.
Hope this helps, and if not, please provide the code that generates the input and the exact version of MSWord.
Greetings,
Kjellski
I'm not sure what "Arabic letters of the word isolated with spaces" means exactly, but I assume that non breaking space is what you need.
Here's more details.