I have took a look for many solution here and still not found the work solution to decode email quoted printable.
Example input:
*** Hello, World *** =0D=0AURl: http://w=
ww.example.com?id=3D=
27a9dca9-5d61-477c-8e73-a76666b5b1bf=0D=0A=0D=0A
Name: Hello World=0D=0A
Phone: 61234567890=0D=0A
Email: hello#test.com=0D=0A=0D=0A
and the example expected output is:
*** Hello, World ***
URl: http://www.example.com?id=27a9dca9-5d61-477c-8e73-a76666b5b1bf
Name: Hello World
Phone: 61234567890
Email: hello#test.com
Following www.webatic.com/quoted-printable-convertor are correct rendering.
Do somebody have an idea to solve this problem in C#?
Try the below Snippet to decode Quoted Printable encoding
class Program
{
public static string DecodeQuotedPrintable(string input, string charSet)
{
Encoding enc;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
var occurences = new Regex(#"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, new String(hexChar));
}
catch
{ ;}
}
input = input.Replace("?=", "");
return input;
}
static void Main(string[] args)
{
string sData = #"*** Hello, World *** =0D=0AURl: http://www.example.com?id=3D=27a9dca9-5d61-477c-8e73-a76666b5b1bf=0D=0A=0D=0A
Name: Hello World=0D=0A
Phone: 61234567890=0D=0A
Email: hello#test.com=0D=0A=0D=0A";
Console.WriteLine(DecodeQuotedPrintable(sData,"utf-8"));
Console.ReadLine();
}
}
Running code is avaliable in dotnetfiddle
Taken the Snippet from this link
Related
I am trying to create a simple minifier because I am unsatisfied with the tools online. I have made a console application, but the problem is that nothing is removed, even though I split the text and remove /n and /t characters.
I've tried different methods of removing the whitespace.
static string restrictedSymbols = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,0123456789";
...
static void Compress(string command)
{
string[] commandParts = command.Split(' ');
string text = String.Empty;
try
{
using (StreamReader sr = new StreamReader(commandParts[1]))
{
text = sr.ReadToEnd();
text.Replace("\n", "");
text.Replace("\t", "");
string formattedText = text;
string[] splitText = text.Split(' ');
StringBuilder sb = new StringBuilder();
for (int i = 0; i < splitText.Length - 1; i++)
{
splitText[i].TrimStart();
StringBuilder tSB = new StringBuilder(splitText[i]);
if (splitText[i].Length > 1 && splitText[i + 1].Length > 1)
{
int textLength = splitText[i].Length - 1;
if (restrictedSymbols.Contains(splitText[i + 1][0]) && restrictedSymbols.Contains(splitText[i][textLength]))
{
tSB.Append(" ");
}
}
sb.Append(tSB.ToString());
}
sb.Append(splitText[splitText.Length - 1]);
text = sb.ToString();
Console.WriteLine(text);
}
} catch (IOException e)
{
Console.WriteLine(e.ToString());
}
if (text != String.Empty)
{
try
{
using (StreamWriter stream = File.CreateText(commandParts[2] + commandParts[3]))
{
stream.Write(text);
}
}
catch (IOException e)
{
Console.WriteLine(e.ToString());
}
}
Console.WriteLine("Process Complete...");
GetCommand();
}
It should print output a minified file, but it just outputs the same exact file that I put in.
Ignoring any other problem, Replace by it self does nothing
Returns a new string in which all occurrences of a specified Unicode character or String in the current string are replaced with
another specified Unicode character or String.
So basically you are ignoring any changes by not keeping the return value
At minimum you will need to do something like this
text = text.Replace("\n", "");
You're replacing the characters, but then doing nothing with it.
Your code should be:
text = text.Replace("\n", "");
text = text.Replace("\t", "");
I try to detect quotes in a loaded text file but it is not working. I have tried with '"' and '\"' without success. Any suggestion? thanks
void read()
{
txt = File.ReadAllText("txt/txttst");
for(int i=0;i<txt.Length;i++)
{
if(txt[i]=='"')
{
Debug.Log("Quotes at "+i);
}
}
}
How about this
string[] lines = File.ReadAllLines(#"txt/txttst");
for (int i=0;i<lines.Length;i++)
{
string line = lines[i];
// ASCII Code of Quotes is 34
var bytes = Encoding.UTF8.GetBytes(line.ToCharArray()).ToList();
if(bytes.Count(b=> b.ToString()=="34")>0)
Console.WriteLine("\"" + "at line " + (i + 1));
}
This is how you can do it, please see the code and screenshot below. Hope it helps.
namespace TestConsoleApp
{
class Program
{
static void Main(string[] args)
{
string txt = File.ReadAllText(#"C:\Users\Public\TestFolder\test.txt");
string[] lines = File.ReadAllLines(#"C:\Users\Public\TestFolder\test.txt");
var reg = new Regex("\"");
Console.WriteLine("Contents of test.txt are; ");
foreach (string line in lines)
{
Console.WriteLine(line);
var matches = reg.Matches(line);
foreach (var item in matches)
{
Console.WriteLine("Quotes at "+ ((System.Text.RegularExpressions.Capture)item).Index);
}
}
}
}
}
Ok I found the problem, my text editor did a subtle auto-correct from " to “ . Cheers.
The task:
Write a program, which counts the phrases in a text file. Any sequence of characters could be given as phrase for counting, even sequences containing separators. For instance in the text "I am a student in Sofia" the phrases "s", "stu", "a" and "I am" are found respectively 2, 1, 3 and 1 times.
I know the solution with string.IndexOf or with LINQ or with some type of algorithm like Aho-Corasick. I want to do same thing with Regex.
This is what I've done so far:
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
namespace CountThePhrasesInATextFile
{
class Program
{
static void Main(string[] args)
{
string input = ReadInput("file.txt");
input.ToLower();
List<string> phrases = new List<string>();
using (StreamReader reader = new StreamReader("words.txt"))
{
string line = reader.ReadLine();
while (line != null)
{
phrases.Add(line.Trim());
line = reader.ReadLine();
}
}
foreach (string phrase in phrases)
{
Regex regex = new Regex(String.Format(".*" + phrase.ToLower() + ".*"));
int mathes = regex.Matches(input).Count;
Console.WriteLine(phrase + " ----> " + mathes);
}
}
private static string ReadInput(string fileName)
{
string output;
using (StreamReader reader = new StreamReader(fileName))
{
output = reader.ReadToEnd();
}
return output;
}
}
}
I know my regular expression is incorrect but I don't know what to change.
The output:
Word ----> 2
S ----> 2
MissingWord ----> 0
DS ----> 2
aa ----> 0
The correct output:
Word --> 9
S --> 13
MissingWord --> 0
DS --> 2
aa --> 3
file.txt contains:
Word? We have few words: first word, second word, third word.
Some passwords: PASSWORD123, #PaSsWoRd!456, AAaA, !PASSWORD
words.txt contains:
Word
S
MissingWord
DS
aa
You need to post the file.txt contents first, otherwise it's difficult to verify if the regex is working correctly or not.
That being said, check out the Regex answer here:
Finding ALL positions of a substring in a large string in C#
and see if that helps with your code in the mean time.
edit:
So there's a simple solution, add "(?=(" and "))" to each of your phrases. This is a lookahead assertion in regex. The following code handles what you want.
foreach (string phrase in phrases) {
string MatchPhrase = "(?=(" + phrase.ToLower() + "))";
int mathes = Regex.Matches(input, MatchPhrase).Count;
Console.WriteLine(phrase + " ----> " + mathes);
}
You also had an issue with
input.ToLower();
which should be instead
input = input.ToLower();
as strings in c# are immutable. In total, your code should be:
static void Main(string[] args) {
string input = ReadInput("file.txt");
input = input.ToLower();
List<string> phrases = new List<string>();
using (StreamReader reader = new StreamReader("words.txt")) {
string line = reader.ReadLine();
while (line != null) {
phrases.Add(line.Trim());
line = reader.ReadLine();
}
}
foreach (string phrase in phrases) {
string MatchPhrase = "(?=(" + phrase.ToLower() + "))";
int mathes = Regex.Matches(input, MatchPhrase).Count;
Console.WriteLine(phrase + " ----> " + mathes);
}
Thread.Sleep(50000);
}
private static string ReadInput(string fileName) {
string output;
using (StreamReader reader = new StreamReader(fileName)) {
output = reader.ReadToEnd();
}
return output;
}
here is what happened. I am going to use Word as example.
the regex you built for "word" is ".word.". It is telling regex to match anything starts with anything, contains "word" and ends with anything.
for your input, it matched
Word? We have few words: first word, second word, third word.
which starts with "Word? We have few words: first" and ends with ", second word, third word."
then second line starts with "Some pass" contains "word" and ends with ": PASSWORD123, #PaSsWoRd!456, AAaA, !PASSWORD"
so the count is 2
the regex you want is simple, string "word" is sufficient.
Update:
for ignore case pattern try "(?i)word"
And for the multiple matches within AAaA, try "(?i)(?<=a)a"
?<= is a Zero-width positive lookbehind assertion
Try this code:
string input = File.ReadAllText("file.txt");
foreach (string word in File.ReadLines("words.txt"))
{
var regex = new Regex(word, RegexOptions.IgnoreCase);
int startat = 0;
int count = 0;
Match match = regex.Match(input, startat);
while (match.Success)
{
count++;
startat = match.Index + 1;
match = regex.Match(input, startat);
}
Console.WriteLine(word + "\t" + count);
}
To correctly find all substrings like "aa", had to use the overload Match method with startat parameter.
Note the RegexOptions.IgnoreCase parameter.
A shorter but less clear code:
Match match;
while ((match = regex.Match(input, startat)).Success)
{
count++;
startat = match.Index + 1;
}
Here is my problem:
I have a string that I think it is binary:
zv�Q6��.�����E3r
I want to convert this string to something which can be read. How I can do this in C#?
You may try enumerating (testing) all available encodings and find out that one
which encodes reasonable text. Unfortunately, when it's not an absolute solution:
it could be a information loss on erroneous conversion.
public static String GetAllEncodings(String value) {
List<Encoding> encodings = new List<Encoding>();
// Ordinary code pages
foreach (EncodingInfo info in Encoding.GetEncodings())
encodings.Add(Encoding.GetEncoding(info.CodePage));
// Special encodings, that could have no code page
foreach (PropertyInfo pi in typeof(Encoding).GetProperties(BindingFlags.Static | BindingFlags.Public))
if (pi.CanRead && pi.PropertyType == typeof(Encoding))
encodings.Add(pi.GetValue(null) as Encoding);
foreach (Encoding encoding in encodings) {
Byte[] data = Encoding.UTF8.GetBytes(value);
String test = encoding.GetString(data).Replace('\0', '?');
if (Sb.Length > 0)
Sb.AppendLine();
Sb.Append(encoding.WebName);
Sb.Append(" (code page = ");
Sb.Append(encoding.CodePage);
Sb.Append(")");
Sb.Append(" -> ");
Sb.Append(test);
}
return Sb.ToString();
}
...
// Test / usage
String St = "Некий русский текст"; // <- Some Russian Text
Byte[] d = Encoding.UTF32.GetBytes(St); // <- Was encoded as UTF 32
St = Encoding.UTF8.GetString(d); // <- And erroneously read as UTF 8
// Let's see all the encodings:
myTextBox.Text = GetAllEncodings(St);
// In the myTextBox.Text you can find the solution:
// ....
// utf-32 (code page = 12000) -> Некий русский текст
// ....
byte[] hexbytes = System.Text.Encoding.Unicode.GetBytes();
this gives you hex bytes of the string but you have to know the encoding of your string and replace the 'Unicode' with that.
I have a binary file (i.e., it contains bytes with values between 0x00 and 0xFF). There are also ASCII strings in the file (e.g., "Hello World") that I want to find and edit using Regex. I then need to write out the edited file so that it's exactly the same as the old one but with my ASCII edits having been performed. How?
byte[] inbytes = File.ReadAllBytes(wfile);
string instring = utf8.GetString(inbytes);
// use Regex to find/replace some text within instring
byte[] outbytes = utf8.GetBytes(instring);
File.WriteAllBytes(outfile, outbytes);
Even if I don't do any edits, the output file is different from the input file. What's going on, and how can I do what I want?
EDIT: Ok, I'm trying to use the offered suggestion and am having trouble understanding how to actually implement it. Here's my sample code:
string infile = #"C:\temp\in.dat";
string outfile = #"C:\temp\out.dat";
Regex re = new Regex(#"H[a-z]+ W[a-z]+"); // looking for "Hello World"
byte[] inbytes = File.ReadAllBytes(infile);
string instring = new SoapHexBinary(inbytes).ToString();
Match match = re.Match(instring);
if (match.Success)
{
// do work on 'instring'
}
File.WriteAllBytes(outfile, SoapHexBinary.Parse(instring).Value);
Obviously, I know I'll not get a match doing it that way, but if I convert my Regex to a string (or whatever), then I can't use Match, etc. Any ideas? Thanks!
Not all binary strings are valid UTF-8 strings. When you try to interpret the binary as a UTF-8 string, the bytes that can't be thus interpreted are probably getting mangled. Basically, if the whole file is not encoded text, then interpreting it as encoded text will not yield sensible results.
An alternative to playing with binary file can be: converting it to hex string, working on it(Regex can be used here) and then saving it back
byte[] buf = File.ReadAllBytes(file);
var str = new SoapHexBinary(buf).ToString();
//str=89504E470D0A1A0A0000000D49484452000000C8000000C808030000009A865EAC00000300504C544......
//Do your work
File.WriteAllBytes(file,SoapHexBinary.Parse(str).Value);
PS: Namespace : System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary
I got it! Check out the code:
string infile = #"C:\temp\in.dat";
string outfile = #"C:\temp\out.dat";
Regex re = new Regex(#"H[a-z]+ W[a-z]+"); // looking for "Hello World"
string repl = #"Hi there";
Encoding ascii = Encoding.ASCII;
byte[] inbytes = File.ReadAllBytes(infile);
string instr = ascii.GetString(inbytes);
Match match = re.Match(instr);
int beg = 0;
bool replaced = false;
List<byte> newbytes = new List<byte>();
while (match.Success)
{
replaced = true;
for (int i = beg; i < match.Index; i++)
newbytes.Add(inbytes[i]);
foreach (char c in repl)
newbytes.Add(Convert.ToByte(c));
Match nmatch = match.NextMatch();
int end = (nmatch.Success) ? nmatch.Index : inbytes.Length;
for (int i = match.Index + match.Length; i < end; i++)
newbytes.Add(inbytes[i]);
beg = end;
match = nmatch;
}
if (replaced)
{
var newarr = newbytes.ToArray();
File.WriteAllBytes(outfile, newarr);
}
else
{
File.WriteAllBytes(outfile, inbytes);
}