string type conversion C# - c#

Here is my problem:
I have a string that I think it is binary:
zv�Q6��.�����E3r
I want to convert this string to something which can be read. How I can do this in C#?

You may try enumerating (testing) all available encodings and find out that one
which encodes reasonable text. Unfortunately, when it's not an absolute solution:
it could be a information loss on erroneous conversion.
public static String GetAllEncodings(String value) {
List<Encoding> encodings = new List<Encoding>();
// Ordinary code pages
foreach (EncodingInfo info in Encoding.GetEncodings())
encodings.Add(Encoding.GetEncoding(info.CodePage));
// Special encodings, that could have no code page
foreach (PropertyInfo pi in typeof(Encoding).GetProperties(BindingFlags.Static | BindingFlags.Public))
if (pi.CanRead && pi.PropertyType == typeof(Encoding))
encodings.Add(pi.GetValue(null) as Encoding);
foreach (Encoding encoding in encodings) {
Byte[] data = Encoding.UTF8.GetBytes(value);
String test = encoding.GetString(data).Replace('\0', '?');
if (Sb.Length > 0)
Sb.AppendLine();
Sb.Append(encoding.WebName);
Sb.Append(" (code page = ");
Sb.Append(encoding.CodePage);
Sb.Append(")");
Sb.Append(" -> ");
Sb.Append(test);
}
return Sb.ToString();
}
...
// Test / usage
String St = "Некий русский текст"; // <- Some Russian Text
Byte[] d = Encoding.UTF32.GetBytes(St); // <- Was encoded as UTF 32
St = Encoding.UTF8.GetString(d); // <- And erroneously read as UTF 8
// Let's see all the encodings:
myTextBox.Text = GetAllEncodings(St);
// In the myTextBox.Text you can find the solution:
// ....
// utf-32 (code page = 12000) -> Некий русский текст
// ....

byte[] hexbytes = System.Text.Encoding.Unicode.GetBytes();
this gives you hex bytes of the string but you have to know the encoding of your string and replace the 'Unicode' with that.

Related

How to decode the email quoted printable encoded in C#

I have took a look for many solution here and still not found the work solution to decode email quoted printable.
Example input:
*** Hello, World *** =0D=0AURl: http://w=
ww.example.com?id=3D=
27a9dca9-5d61-477c-8e73-a76666b5b1bf=0D=0A=0D=0A
Name: Hello World=0D=0A
Phone: 61234567890=0D=0A
Email: hello#test.com=0D=0A=0D=0A
and the example expected output is:
*** Hello, World ***
URl: http://www.example.com?id=27a9dca9-5d61-477c-8e73-a76666b5b1bf
Name: Hello World
Phone: 61234567890
Email: hello#test.com
Following www.webatic.com/quoted-printable-convertor are correct rendering.
Do somebody have an idea to solve this problem in C#?
Try the below Snippet to decode Quoted Printable encoding
class Program
{
public static string DecodeQuotedPrintable(string input, string charSet)
{
Encoding enc;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
var occurences = new Regex(#"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, new String(hexChar));
}
catch
{ ;}
}
input = input.Replace("?=", "");
return input;
}
static void Main(string[] args)
{
string sData = #"*** Hello, World *** =0D=0AURl: http://www.example.com?id=3D=27a9dca9-5d61-477c-8e73-a76666b5b1bf=0D=0A=0D=0A
Name: Hello World=0D=0A
Phone: 61234567890=0D=0A
Email: hello#test.com=0D=0A=0D=0A";
Console.WriteLine(DecodeQuotedPrintable(sData,"utf-8"));
Console.ReadLine();
}
}
Running code is avaliable in dotnetfiddle
Taken the Snippet from this link

Searching for a Specific Word in a Text File and Displaying the line its on

I am having trouble attempting to find words in a text file in C#.
I want to find the word that is input into the console then display the entire line that the word was found on in the console.
In my text file I have:
Stephen Haren,December,9,4055551235
Laura Clausing,January,23,4054447788
William Connor,December,13,123456789
Kara Marie,October,23,1593574862
Audrey Carrit,January,16,1684527548
Sebastian Baker,October,23,9184569876
So if I input "December" I want it to display "Stephen Haren,December,9,4055551235" and "William Connor,December,13,123456789" .
I thought about using substrings but I figured there had to be a simpler way.
My Code After Given Answer:
using System;
using System.IO;
class ReadFriendRecords
{
public static void Main()
{
//the path of the file
FileStream inFile = new FileStream(#"H:\C#\Chapter.14\FriendInfo.txt", FileMode.Open, FileAccess.Read);
StreamReader reader = new StreamReader(inFile);
string record;
string input;
Console.Write("Enter Friend's Birth Month >> ");
input = Console.ReadLine();
try
{
//the program reads the record and displays it on the screen
record = reader.ReadLine();
while (record != null)
{
if (record.Contains(input))
{
Console.WriteLine(record);
}
record = reader.ReadLine();
}
}
finally
{
//after the record is done being read, the progam closes
reader.Close();
inFile.Close();
}
Console.ReadLine();
}
}
Iterate through all the lines (StreamReader, File.ReadAllLines, etc.) and check if
line.Contains("December") (replace "December" with the user input).
Edit:
I would go with the StreamReader in case you have large files. And use the IndexOf-Example from #Matias Cicero instead of contains for case insensitive.
Console.Write("Keyword: ");
var keyword = Console.ReadLine() ?? "";
using (var sr = new StreamReader("")) {
while (!sr.EndOfStream) {
var line = sr.ReadLine();
if (String.IsNullOrEmpty(line)) continue;
if (line.IndexOf(keyword, StringComparison.CurrentCultureIgnoreCase) >= 0) {
Console.WriteLine(line);
}
}
}
As mantioned by #Rinecamo, try this code:
string toSearch = Console.ReadLine().Trim();
In this codeline, you'll be able to read user input and store it in a line, then iterate for each line:
foreach (string line in System.IO.File.ReadAllLines(FILEPATH))
{
if(line.Contains(toSearch))
Console.WriteLine(line);
}
Replace FILEPATH with the absolute or relative path, e.g. ".\file2Read.txt".
How about something like this:
//We read all the lines from the file
IEnumerable<string> lines = File.ReadAllLines("your_file.txt");
//We read the input from the user
Console.Write("Enter the word to search: ");
string input = Console.ReadLine().Trim();
//We identify the matches. If the input is empty, then we return no matches at all
IEnumerable<string> matches = !String.IsNullOrEmpty(input)
? lines.Where(line => line.IndexOf(input, StringComparison.OrdinalIgnoreCase) >= 0)
: Enumerable.Empty<string>();
//If there are matches, we output them. If there are not, we show an informative message
Console.WriteLine(matches.Any()
? String.Format("Matches:\n> {0}", String.Join("\n> ", matches))
: "There were no matches");
This approach is simple and easy to read, it uses LINQ and String.IndexOf instead of String.Contains so we can do a case insensitive search.
For finding text in a file you can use this algorithim use this code in
static void Main(string[] args)
{
}
try this one
StreamReader oReader;
if (File.Exists(#"C:\TextFile.txt"))
{
Console.WriteLine("Enter a word to search");
string cSearforSomething = Console.ReadLine().Trim();
oReader = new StreamReader(#"C:\TextFile.txt");
string cColl = oReader.ReadToEnd();
string cCriteria = #"\b"+cSearforSomething+#"\b";
System.Text.RegularExpressions.Regex oRegex = new
System.Text.RegularExpressions.Regex(cCriteria,RegexOptions.IgnoreCase);
int count = oRegex.Matches(cColl).Count;
Console.WriteLine(count.ToString());
}
Console.ReadLine();

Converting email subject from "?UTF-8?..." to string?

I'm using these techniques to convert =?utf-8?B?...?= to a readable string:
How convert email subject from “?UTF-8?…?=” to readable string?
string encode / decode
It works for simple input, but I have some input that have nested =?utf-8?B?...?=, for example:
"=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?="
I know the part between =?UTF-8?B? and ?= is a base64 encoded string, But in this case I don't have any idea how to extract them.
You can use a regex to extract the string between =?UTF-8?B? and ?= then convert the rest. Here's an example:
string input = "=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?=";
Regex regex = new Regex(string.Format("{0}(.*?){1}",Regex.Escape("=?utf-8?B?"), Regex.Escape("?=")));
var matches = regex.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(
Encoding.UTF8.GetString(Convert.FromBase64String(match.Groups[1].Value))
);
}
This will print:
این یک متن ساده است
این یک متن ساده است
ندج
Don't forget to include these using statements:
using System.Text.RegularExpressions;
using System.Text;
Working example available here.
Try with something like:
string str = "=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?=";
const string utf8b = "=?utf-8?B?";
var parts = str.Split(new[] { "?=" }, 0);
foreach (var part in parts)
{
string str2 = part.Trim();
if (str2.StartsWith(utf8b, StringComparison.OrdinalIgnoreCase))
{
str2 = str2.Substring(utf8b.Length);
byte[] bytes = Convert.FromBase64String(str2);
string final = Encoding.UTF8.GetString(bytes);
Console.WriteLine(final);
}
else if (str2 == string.Empty)
{
// Nothing to do here
}
else
{
Console.WriteLine("Not recognized {0}", str2);
}
}
Note that technically the rfc 1342 is a little more complex... instead of utf-8 you could have any encoding, and instead of B you could have Q (for Quoted Printable)

Using C# to edit text within a binary file

I have a binary file (i.e., it contains bytes with values between 0x00 and 0xFF). There are also ASCII strings in the file (e.g., "Hello World") that I want to find and edit using Regex. I then need to write out the edited file so that it's exactly the same as the old one but with my ASCII edits having been performed. How?
byte[] inbytes = File.ReadAllBytes(wfile);
string instring = utf8.GetString(inbytes);
// use Regex to find/replace some text within instring
byte[] outbytes = utf8.GetBytes(instring);
File.WriteAllBytes(outfile, outbytes);
Even if I don't do any edits, the output file is different from the input file. What's going on, and how can I do what I want?
EDIT: Ok, I'm trying to use the offered suggestion and am having trouble understanding how to actually implement it. Here's my sample code:
string infile = #"C:\temp\in.dat";
string outfile = #"C:\temp\out.dat";
Regex re = new Regex(#"H[a-z]+ W[a-z]+"); // looking for "Hello World"
byte[] inbytes = File.ReadAllBytes(infile);
string instring = new SoapHexBinary(inbytes).ToString();
Match match = re.Match(instring);
if (match.Success)
{
// do work on 'instring'
}
File.WriteAllBytes(outfile, SoapHexBinary.Parse(instring).Value);
Obviously, I know I'll not get a match doing it that way, but if I convert my Regex to a string (or whatever), then I can't use Match, etc. Any ideas? Thanks!
Not all binary strings are valid UTF-8 strings. When you try to interpret the binary as a UTF-8 string, the bytes that can't be thus interpreted are probably getting mangled. Basically, if the whole file is not encoded text, then interpreting it as encoded text will not yield sensible results.
An alternative to playing with binary file can be: converting it to hex string, working on it(Regex can be used here) and then saving it back
byte[] buf = File.ReadAllBytes(file);
var str = new SoapHexBinary(buf).ToString();
//str=89504E470D0A1A0A0000000D49484452000000C8000000C808030000009A865EAC00000300504C544......
//Do your work
File.WriteAllBytes(file,SoapHexBinary.Parse(str).Value);
PS: Namespace : System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary
I got it! Check out the code:
string infile = #"C:\temp\in.dat";
string outfile = #"C:\temp\out.dat";
Regex re = new Regex(#"H[a-z]+ W[a-z]+"); // looking for "Hello World"
string repl = #"Hi there";
Encoding ascii = Encoding.ASCII;
byte[] inbytes = File.ReadAllBytes(infile);
string instr = ascii.GetString(inbytes);
Match match = re.Match(instr);
int beg = 0;
bool replaced = false;
List<byte> newbytes = new List<byte>();
while (match.Success)
{
replaced = true;
for (int i = beg; i < match.Index; i++)
newbytes.Add(inbytes[i]);
foreach (char c in repl)
newbytes.Add(Convert.ToByte(c));
Match nmatch = match.NextMatch();
int end = (nmatch.Success) ? nmatch.Index : inbytes.Length;
for (int i = match.Index + match.Length; i < end; i++)
newbytes.Add(inbytes[i]);
beg = end;
match = nmatch;
}
if (replaced)
{
var newarr = newbytes.ToArray();
File.WriteAllBytes(outfile, newarr);
}
else
{
File.WriteAllBytes(outfile, inbytes);
}

Reading a null-terminated string

I am reading strings from a binary file. Each string is null-terminated. Encoding is UTF-8. In python I simply read a byte, check if it's 0, append it to a byte array, and continue reading bytes until I see a 0. Then I convert byte array into a string and move on. All of the strings were read correctly.
How can I read this in C#? I don't think I have the luxury of simply appending bytes to an array since the arrays are fixed size.
Following should get you what you are looking for. All of text should be inside myText list.
var data = File.ReadAllBytes("myfile.bin");
List<string> myText = new List<string>();
int lastOffset = 0;
for (int i = 0; i < data.Length; i++)
{
if (data[i] == 0)
{
myText.Add(System.Text.Encoding.UTF8.GetString(data, lastOffset, i - lastOffset));
lastOffset = i + 1;
}
}
I assume you're using a StreamReader instance:
StringBuilder sb = new StringBuilder();
using(StreamReader rdr = OpenReader(...)) {
Int32 nc;
while((nc = rdr.Read()) != -1) {
Char c = (Char)nc;
if( c != '\0' ) sb.Append( c );
}
}
You can either use a List<byte>:
List<byte> list = new List<byte>();
while(reading){ //or whatever your condition is
list.add(readByte);
}
string output = Encoding.UTF8.GetString(list.ToArray());
Or you could use a StringBuilder :
StringBuilder builder = new StringBuilder();
while(reading){
builder.Append(readByte);
}
string output = builder.ToString();
If your "binary file" only contains null terminated UTF8 strings, then for .NET it isn't a "binary file" but just a text file because null characters are characters too. So you could just use a StreamReader to read the text and split it on the null characters.
(Six years later "you" would presumably be some new reader and not the OP.)
A one line (ish) solution would be:
using (var rdr = new StreamReader(path))
return rdr.ReadToEnd().split(new char[] { '\0' });
But that will give you a trailing empty string if the last string in the file was "properly" terminated.
A more verbose solution that might perform differently for very large files, expressed as an extension method on StreamReader, would be:
List<string> ReadAllNullTerminated(this System.IO.StreamReader rdr)
{
var stringsRead = new System.Collections.Generic.List<string>();
var bldr = new System.Text.StringBuilder();
int nc;
while ((nc = rdr.Read()) != -1)
{
Char c = (Char)nc;
if (c == '\0')
{
stringsRead.Add(bldr.ToString());
bldr.Length = 0;
}
else
bldr.Append(c);
}
// Optionally return any trailing unterminated string
if (bldr.Length != 0)
stringsRead.Add(bldr.ToString());
return stringsRead;
}
Or for reading just one at a time (like ReadLine)
string ReadNullTerminated(this System.IO.StreamReader rdr)
{
var bldr = new System.Text.StringBuilder();
int nc;
while ((nc = rdr.Read()) > 0)
bldr.Append((char)nc);
return bldr.ToString();
}

Categories

Resources