How to perform mutliple Replace calls at once

How to perform mutliple Replace calls at once - c#

I have a bit of a weird question here at hands. I have a text that's encoded in such a way that each character is replaced by another character and I'm creating an application that will replace each character with a correct one. But I've come across a problem that I have trouble solving. Let me show with an example:
Original text: This is a line.
Encoded text: (.T#*T#*%*=T50;
Now, as I said, each character represents another character, '(' is 'T', '.' is actually a 'h' and so on.
Now I could just go with
string decoded = encoded.Replace('(','T'); //T.T#*T#*%*=T50;
And that will solve one problem, but when I reach character 'T' that is actually encoded character 'i' I will have to replace all 'T' with 'i', which means that all previously decoded letter 'T's (that were once '(') will also change along with the encoded 'T'.
//T.T#*T#*%*=T50; -> i.i#*i#*%*=i50;
in this situation it's obvious that I should've just went the other way around, first change 'T' to 'i' and then '(' to 'T', but in the text I'm changing that kind of analysis is not an option.
What's the alternative here that I could do to perform the task correctly?
Thank you!

One possible solution is do not use replace string method at all.
Instead you can create method which for every encoded character will output decoded one, and then go through your string as through array of char and for every character in this array use "decryption" method to get decoded character - thus you'll receive decoded string.
For example (using StringBulder to create new string):
private static char Decode(char source)
{
if (source == '(')
return 'T';
else if (source == '.')
return 'h';
//.... and so on
}
string source = "ABC";
var builder = new StringBuilder();
foreach (var c in source)
builder.Append(Decode(c));
var result = builder.ToString();

Using .Replace() probably isn't the way to go in the first place, since as you're finding it covers the whole string every time. And once you've modified the whole string once, the encoding is lost.
Instead, loop over the string one time and replace characters individually.
Create a function that accepts a char and returns the replaced char. For simplicity, I'll just show the signature:
private char Decode(char c);
Then just loop over the string and call that function on each character. LINQ can make short work of that:
var decodedString = new string(encodedString.Select(c => Decode(c)).ToArray());
(This is freehand and untested, you may or may not need that .ToArray() for the string constructor to be happy, I'm not certain. But you get the idea.)
If it's easier to read you can also just loop manually over the string and perhaps use a StringBuilder with each successive char to build the final decoded result.

Without knowledge of your encryption algorithm, this answer assumes that it's a simple character translation akin to the Caesar Cipher.
Pass in your encrypted string, the method loops over each character, adjusting it by the value of shiftDelta and returns the resulting string.
private string Decrypt(string input)
{
const int shiftDelta = 10;
var inputChars = input.ToCharArray();
var outputChars = new char[inputChars.Length];
for (var i = 0; i < outputChars.Length; i++)
{
// Perform character translation here
outputChars[i] = (char)(inputChars[i] + shiftDelta);
}
return outputChars.ToString();
}

Related

How to get data from each row of website (DownloadString) by after : char and stop till ; char

I use WebClient and DownloadString to get RAW text file into an string, and I would like to get the first words, like each thing before : char in to a string, AND also the stuff after : but before ; in other string. In this case I want word111 and floor271 in seperate string, and table123 and fan891 into an other string.
The text file looks like this:
word111:table123;
floor271:fan891;
I've tried to look around for days, because I use in my code the Contains method to see if the whole line of text sometimes matches, example word111:table123; if that exists in the raw text file, then the code continues. I looked at Split, IndexOf, and things like that but I don't understand if they can even achieve this goal, because I need them from every row/line, not just one.
WebClient HWIDReader = new WebClient();
string HWIDList = HWIDReader.DownloadString("/*the link would be here, can't share it*/");
if (HWIDList.Contains(usernameBox.Text + ":" + passwordBox.Text))
{
/* show other form because username and password did exist */
}
I expect the code wouldn't work with Split, because it can split the string by : and ; characters tho, but I don't want the usernames and passwords visible in the same string. IndexOf would delete before or after specified characters tho. (Maybe can use IndexOf to both ways? to remove from : till ; is that possible, and how?) Thank you in advance.

You can split the string based on the newline character.
string HWIDList = HWIDReader.DownloadString("/*the link would be here, can't share it*/");
string[] lines = HWIDList.Split('\n');
// NOTE: Should create a hash from the password and compare to saved password hashes
// so that nobody knows what the users' passwords are
string userInput = string.Format("{0}:{1};", usernameBox.Text, passwordBox.Text);
foreach (string pair in lines)
{
if (pair != userInput)
continue;
// found match: do stuff
}

Base64 string encoding contains +, / and = instead of A, B, C

I need to apply the following transformation to a string:
convert the string in byte[]
apply the sha256 function
encode the result in base64
I wrote the following code:
string codeRaw = "C0643778W.EUC06AG978W.EUFWELP2014-11-2153.50000GBP24.00000MWh/h10YCB-EUROPEU--12015-01-012015-01-31";
byte[] utiCodeByteArr = Encoding.UTF8.GetBytes(codeRaw);
byte[] hashByteArr = new SHA256Managed().ComputeHash(utiCodeByteArr);
string hash = Convert.ToBase64String(hashByteArr)
It works, but the result is a little bit different from was I should get: the string contains the chars '+', '/' and '=' instead of 'A', 'B' and 'C'.
"qWAIh1CgYAuvoRTGcvXKLBHC9UxRunSBRjRXlqhYh6gC" //expected result
"qW+Ih1CgYAuvoRTGcvXKLBHC9UxRunS/RjRXlqhYh6g=" //got result
I've solved with a replace
string hash = Convert.ToBase64String(hashByteArr)?.Replace("+", "A")?.Replace("/", "B")?.Replace("=", "C");
There is a better way to get the right string without using the replaces?
I don't like them.
The manual with the requirements say: "The APIs used are the ones provided by .NET framework", but it doesn't contains the source code: maybe there is a way to get immediately the ABC chars, but I miss it.
Thanks.

The provider sent me the source code: there was a replace as the one I did.
They forgot to wrote that information in the manual.

grouping adjacent similar substrings

I am writing a program in which I want to group the adjacent substrings, e.g ABCABCBC can be compressed as 2ABC1BC or 1ABCA2BC.
Among all the possible options I want to find the resultant string with the minimum length.
Here is code what i have written so far but not doing job. Kindly help me in this regard.
using System;
using System.Collections.Generic;
using System.Linq;
namespace EightPrgram
{
class Program
{
static void Main(string[] args)
{
string input;
Console.WriteLine("Please enter the set of operations: ");
input = Console.ReadLine();
char[] array = input.ToCharArray();
List<string> list = new List<string>();
string temp = "";
string firstTemp = "";
foreach (var x in array)
{
if (temp.Contains(x))
{
firstTemp = temp;
if (list.Contains(firstTemp))
{
list.Add(firstTemp);
}
temp = "";
list.Add(firstTemp);
}
else
{
temp += x;
}
}
/*foreach (var item in list)
{
Console.WriteLine(item);
}*/
Console.ReadLine();
}
}
}

You can do this with recursion. I cannot give you a C# solution, since I do not have a C# compiler here, but the general idea together with a python solution should do the trick, too.
So you have an input string ABCABCBC. And you want to transform this into an advanced variant of run length encoding (let's called it advanced RLE).
My idea consists of a general first idea onto which I then apply recursion:
The overall target is to find the shortest representation of the string using advanced RLE, let's create a function shortest_repr(string).
You can divide the string into a prefix and a suffix and then check if the prefix can be found at the beginning of the suffix. For your input example this would be:
(A, BCABCBC)
(AB, CABCBC)
(ABC, ABCBC)
(ABCA, BCBC)
...
This input can be put into a function shorten_prefix, which checks how often the suffix starts with the prefix (e.g. for the prefix ABC and the suffix ABCBC, the prefix is only one time at the beginning of the suffix, making a total of 2 ABC following each other. So, we can compact this prefix / suffix combination to the output (2ABC, BC).
This function shorten_prefix will be used on each of the above tuples in a loop.
After using the function shorten_prefix one time, there still is a suffix for most of the string combinations. E.g. in the output (2ABC, BC), there still is the string BC as suffix. So, need to find the shortest representation for this remaining suffix. Wooo, we still have a function for this called shortest_repr, so let's just call this onto the remaining suffix.
This image displays how this recursion works (I only expanded one of the node after the 3rd level, but in fact all of the orange circles would go through recursion):
We start at the top with a call of shortest_repr to the string ABABB (I selected a shorter sample for the image). Then, we split this string at all possible split positions and get a list of prefix / suffix pairs in the second row. On each of the elements of this list we first call the prefix/suffix optimization (shorten_prefix) and retrieve a shortened prefix/suffix combination, which already has the run-length numbers in the prefix (third row). Now, on each of the suffix, we call our recursion function shortest_repr.
I did not display the upward-direction of the recursion. When a suffix is the empty string, we pass an empty string into shortest_repr. Of course, the shortest representation of the empty string is the empty string, so we can return the empty string immediately.
When the result of the call to shortest_repr was received inside our loop, we just select the shortest string inside the loop and return this.
This is some quickly hacked code that does the trick:
def shorten_beginning(beginning, ending):
count = 1
while ending.startswith(beginning):
count += 1
ending = ending[len(beginning):]
return str(count) + beginning, ending
def find_shortest_repr(string):
possible_variants = []
if not string:
return ''
for i in range(1, len(string) + 1):
beginning = string[:i]
ending = string[i:]
shortened, new_ending = shorten_beginning(beginning, ending)
shortest_ending = find_shortest_repr(new_ending)
possible_variants.append(shortened + shortest_ending)
return min([(len(x), x) for x in possible_variants])[1]
print(find_shortest_repr('ABCABCBC'))
print(find_shortest_repr('ABCABCABCABCBC'))
print(find_shortest_repr('ABCABCBCBCBCBCBC'))
Open issues
I think this approach has the same problem as the recursive levenshtein distance calculation. It calculates the same suffices multiple times. So, it would be a nice exercise to try to implement this with dynamic programming.

If this is not a school assignment or performance critical part of the code, RegEx might be enough:
string input = "ABCABCBC";
var re = new Regex(#"(.+)\1+|(.+)", RegexOptions.Compiled); // RegexOptions.Compiled is optional if you use it more than once
string output = re.Replace(input,
m => (m.Length / m.Result("$1$2").Length) + m.Result("$1$2")); // "2ABC1BC" (case sensitive by default)

Casting HexNumber as character to string

I need to process a numeral as a string.
My value is 0x28 and this is the ascii code for '('.
I need to assign this to a string.
The following lines do this.
char c = (char)0x28;
string s = c.ToString();
string s2 = ((char)0x28).ToString();
My usecase is a function that only accepts strings.
My call ends up looking cluttered:
someCall( ((char)0x28).ToString() );
Is there a way of simplifying this and make it more readable without writing '(' ?
The Hexnumber in the code is always paired with a Variable that contains that hex value in its name, so "translating" it would destroy that visible connection.
Edit:
A List of tuples is initialised with this where the first item has the character in its name and the second item results from a call with that character.
One of the answers below is exactly what i am looking for so i incorporated it here now.
{ existingStaticVar0x28, someCall("\u0028") }
The reader can now instinctively see the connection between item1 and item2 and is less likely to run into a trap when this gets refactored.

You can use Unicode character escape sequence in place of a hex to avoid casting:
string s2 = '\u28'.ToString();
or
someCall("\u28");

Well supposing that you have not a fixed input then you could write an extension method
namespace MyExtensions
{
public static class MyStringExtensions
{
public static string ConvertFromHex(this string hexData)
{
int c = Convert.ToInt32(hexCode, 16);
return new string(new char[] {(char)c});
}
}
}
Now you could call it in your code wjth
string hexNumber = "0x28"; // or whatever hexcode you need to convert
string result = hexNumber.ConvertFromHex();
A bit of error handling should be added to the above conversion.

How to prevent conversion of Windows-1252 argument into a Unicode string?

I've written my first COM classes. My unit tests work fine, but my first use of the COM objects has hit a snag.
The COM classes provide methods which accept a string, manipulate it and return a string. The consumer of the COM objects is a dBASE PLUS program.
When the input string contains common keyboard characters (ASCII 127 or lower), the COM methods work fine. However, if the string contains characters beyond the ASCII range, some of them get remapped from Windows-1252 to C#'s Unicode. This table shows the mapping that takes place: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
For example, if the dBASE program calls the COM object with:
oMyComObject.MyMethod("It will cost€123") where the € is hex 80,
the C# method receives it as Unicode:
public string MyMethod(string source)
{
// source is Unicode and now the Euro symbol is hex 20AC
...
}
I would like to avoid this remapping because I want the original hex content of the string.
I've tried adding the following to MyMethod to convert the string back to Windows-1252, but the Euro symbol gets lost because it becomes a question mark:
byte[] UnicodeBytes = Encoding.Unicode.GetBytes(source.ToString());
byte[] Win1252Bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), UnicodeBytes);
string Win1252 = Encoding.GetEncoding(1252).GetString(Win1252Bytes);
Is there a way to prevent this conversion of the "source" parameter to Unicode? Or, is there a way to convert it 100% from Unicode back to Windows-1252?

Yes, I'm answering my own question. The answer by "Jigsore" put me on the right track, but I want to explain more clearly in case someone else makes the same mistake I made.
I eventually figured out that I had misdiagnosed the problem. dBASE was passing the string fine and C# was receiving it fine. It was how I checked the contents of the string that was in error.
This turnkey builds on Jigsore's answer:
void Main()
{
string unicodeText = "\u20AC\u0160\u0152\u0161";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeText);
byte[] win1252bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), unicodeBytes);
for (int i = 0; i < win1252bytes.Length; i++)
Console.Write("0x{0:X2} ", win1252bytes[i]); // output: 0x80 0x8A 0x8C 0x9A
// win1252String represents the string passed from dBASE to C#
string win1252String = Encoding.GetEncoding(1252).GetString(win1252bytes);
Console.WriteLine("\r\nWin1252 string is " + win1252String); // output: Win1252 string is €ŠŒš
Console.WriteLine("looking at the code of the first character the wrong way: " + (int)win1252String[0]);
// output: looking at the code of the first character the wrong way: 8364
byte[] bytes = Encoding.GetEncoding(1252).GetBytes(win1252String[0].ToString());
Console.WriteLine("looking at the code of the first character the right way: " + bytes[0]);
// output: looking at the code of the first character the right way: 128
// Warning: If your input contains character codes which are large in value than what a byte
// can hold (ex: multi-byte Chinese characters), then you will need to look at more than just bytes[0].
}
The reason the first method was wrong is that casting (int)win1252String[0] (or the converse of casting an integer j to a character with (char)j) involves an implicit conversion with the Unicode character set C# uses.
I consider this resolved and would like to thank each person who took the time to comment or answer for their time and trouble. It is appreciated!

Actually you're doing the Unicode to Win-1252 conversion correctly, but you're performing an extra step. The original Win1252 codes are in the Win1252Bytes array!
Check the following code:
string unicodeText = "\u20AC\u0160\u0152\u0161";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeText);
byte[] win1252bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), unicodeBytes);
for (i = 0; i < win1252bytes.Length; i++)
Console.Write("0x{0:X2} ", win1252bytes[i]);
The output shows the Win-1252 codes for the unicodeText string, you can check this by looking at the CP1252.TXT table.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to perform mutliple Replace calls at once - c#

Related

How to get data from each row of website (DownloadString) by after : char and stop till ; char

Base64 string encoding contains +, / and = instead of A, B, C

grouping adjacent similar substrings

Casting HexNumber as character to string

How to prevent conversion of Windows-1252 argument into a Unicode string?

Categories

Resources