The following code only returns "Good job!" How can I get the actual URLs out of it? I followed the tutorial on the site given, and I'm still having a bit trouble wrapping my head around it. Also, I imagine that this isn't the best way to go about regex (mixing regex with html). Is there a simple way to capture text based on it's CSS class?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
namespace Scraper
{
class Program
{
static void Main(string[] args)
{
string target = #"http://www.omegacoder.com/?p=58";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(target);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Regex URL = new Regex("(?:href=)(?<link>.*?)");
string line;
using (Stream responseStream = response.GetResponseStream())
using (StreamReader htmlStream = new StreamReader(responseStream))
while ((line = htmlStream.ReadLine()) != null){
Match m = URL.Match(line);
if (m.Success) {
Console.WriteLine("Good job! " + URL.Match(line) + m.Groups[0].Value + m.Groups[1].Value + m.Groups["link"]);
Console.ReadLine();
} else {
}
}
/* if (Regex.IsMatch(line, "XXXXX"))
Console.WriteLine(line);
} */
Console.ReadLine();
}
}
}
You should use (?:href=)(?<link>\S*)
\S matches a character that is not space
Related
Problem Statement - I have a file continuously written by a process line by line. I would like a C# solution which will print line by line as soon as it is written by the process.
Solution - I would like to use Reactive extension C#, where I will subscribe the stream.
I tried below code, but how to print each line,
stream.Subscribe(e => Console.WriteLine(e.//how to print each line));
Here is full code,
using System;
using System.Collections.Generic;
using System.IO;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
using (FileStream fs = new FileStream(#"C:\Files\test_Copy.txt", FileMode.Open))
{
var stream = ObserveLines(fs);
stream.Subscribe(e => Console.WriteLine(e.//how to print each line));
}
}
public static IObservable<string> ObserveLines(Stream inputStream)
{
return ReadLines(inputStream).ToObservable(Scheduler.ThreadPool);
}
public static IEnumerable<string> ReadLines(Stream stream)
{
using (StreamReader reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
}
}
Im having a bit of a challenge here, I hope that some of you clever guys out there can guide/show me to a solution.
The code has been condensated/cut down a bit to ease you. Hense if a program error occurs, it is due to that.
The task of the program is to download a pdf from a URL, and save the PDF on a MS SQL server.
I can save it but it seems the data changes. E.g. The first byte from the downloaded PDF is 25h (correct) and when I save it in the database it changes to 1Fh (wrong).
I realise it must be the conversion between the download and saving, but unfortunately I cant make it work.
// This is where I suspect that my problem occurs.
byte[] myDataBuffer = client.DownloadData((new Uri(strFileUrlToDownload)));
Please excuse me if Im not clear in my writing. English isn't my first language.
Thanks in advance.
Script for create table and C# code below.
Script for creating the table on MS SQL server:
SET [ThePartikularDatabase]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[PDFTable](
[DokumentAID] [int] IDENTITY(1,1) NOT NULL,
[ident] [nchar](10) NOT NULL,
[AB] [nchar](10) NOT NULL,
[BI] [nchar](10) NOT NULL,
[dokumentType] [nchar](10) NOT NULL,
[base64] [varbinary](max) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Here comes the code:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Data;
using System.Data.Sql;
using System.Data.SqlClient;
using System.Diagnostics;
using System.Linq;
using System.Runtime.Serialization;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.IO.Compression;
using System.Net;
using System.Web;
using System.Web.Script.Serialization;
using System.Xml;
using System.Xml.Serialization;
using System.Xml.Linq;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Newtonsoft.Json.Converters;
using System.Web.UI;
using System.Text.RegularExpressions;
namespace test
{
class program
{
public static void DownloadData(string strFileUrlToDownload, WebClient client, string IV_ident, string IV_AB, string IV_BI, string IV_dokumentType)
{
// This function is made with inspiration from:
// http://www.c-sharpcorner.com/UploadFile/013102/save-and-read-pdf-file-using-sql-server-and-C-Sharp/
byte[] myDataBuffer = client.DownloadData((new Uri(strFileUrlToDownload)));
//string encodedText = Convert.ToBase64String(myDataBuffer);
//string decodedText = Encoding.UTF8.GetString(myDataBuffer);
// byte[] lars = Convert.ToByte(encodedText);
using (SqlConnection cn = new SqlConnection(GlobalVar.ConnectionString))
{
cn.Open();
using (SqlCommand cmd = new SqlCommand("INSERT INTO [Database].[dbo].[PDFTable] " + "(Ident, AB, BI, dokumentType, base64) values (#ident, #AB, #BI, #dokumentType, #data);", cn))
{
cmd.Parameters.Add("#data", myDataBuffer);
cmd.Parameters.Add("#Ident", ident);
cmd.Parameters.Add("#AB", IV_AB);
cmd.Parameters.Add("#BI", IV_BI);
cmd.Parameters.Add("#dokumentType", IV_dokumentType);
cmd.ExecuteNonQuery();
}
}
using (SqlConnection cn = new SqlConnection(GlobalVar.ConnectionString))
{
cn.Open();
using (SqlCommand cmd = new SqlCommand("SELECT [base64] FROM [Database].[dbo].[PDFTable] WHERE ident = " + IV_ident + " ;", cn))
{
using (SqlDataReader dr = cmd.ExecuteReader(System.Data.CommandBehavior.Default))
{
if (dr.Read())
{
byte[] fileData = (byte[])dr.GetValue(0);
using (System.IO.FileStream fs = new System.IO.FileStream("c:\\" + IV_ident + ".pdf", System.IO.FileMode.Create, System.IO.FileAccess.ReadWrite))
{
using (System.IO.BinaryWriter bw = new System.IO.BinaryWriter(fs))
{
bw.Write(fileData);
bw.Close();
}
}
}
dr.Close();
}
}
}
}
public static void SaveMemoryStream(MemoryStream ms, string FileName)
{
FileStream outStream = File.OpenWrite(FileName);
ms.WriteTo(outStream);
outStream.Flush();
outStream.Close();
}
static void Main(string[] args)
{
int LI_start = 0;
int LI_slut = 0;
int LI_documentURL = 0;
string LS_ident = "";
string credentials = "Username:Password";
string url = "http://pdfurl.com/find;
CredentialCache mycache = new CredentialCache();
WebClient client = new WebClient();
client.Headers["Authorization"] = "Basic " + Convert.ToBase64String(Encoding.ASCII.GetBytes(credentials));
Regex regex = new Regex("");
try
{
/*
ident is fetched from our database. Code removed for simplicity..
*/
LS_ident = Convert.ToString(theRow["ident"]);
string LS_json = "{\"from\":0,\"size\":1,\"query\":{\"term\":{\"ident\":" + LS_ident + "}}}";
string LS_pdfURL = "";
string reply = client.UploadString(url, "POST", LS_json);
/********************************************************/
/* Regex ************************************************/
/*
URL for the pdf is taken from the reply with Regular expression.
*/
/* Regex ************************************************/
/********************************************************/
LS_pdfURL = "" + dokURL[i].Substring(LI_start, (LI_slut + 1 - LI_start)) + "pdf";
/*
LS_pdfURL now contains the correct path for the pdf.
I can manually download the pdf with the url. It works.
So far, so good.
*/
DownloadData(LS_pdfURL, client, LS_ident, "StartDato", "SlutDato", "Doktype");
/**/
//break;
}
thisConnection.Close();
}
catch (SqlException ex)
{
Console.WriteLine("Der skete en fejl: (get) {0}", ex.ToString());
}
/**/
}
}
}
Your problem has it's reason with the different encodings.
Base64 is meant to store binary data within string based containers such as XML or HTML by using a reduced set of characters. The encoded content is a string consisting of "secure characters" only and can be handled as any other string.
A Hex-String is a chain of bits. They are packed in groups of 4 and displayed with characters from 0 to 9 and A to F... Behind the scenes this is just a chain of bits, packed in groups and displayed as string of 16 characters. This needs - but only in characters! - more space than base64... as a bit-chain it needs less...
It is always necessary to think about: What type has my data and which is the appropriate type to store this?
You decided to place your base64 within a column of type VARCHAR(MAX). Seems to be a good choice. But I'd still prefer the VARBINARY(MAX) and do the base64 encoding when I need it.
I don't understand this error I am getting. I have tried to clean and build my project several times. Can anyone help me?
Program.cs
using System;
using System.Collections.Generic;
using System.Net.NetworkInformation;
using System.Net.Sockets;
using System.IO;
using System.Threading.Tasks;
namespace HTTPrequestApp
{
class Program
{
static void Main(string[] args)
{
var lstWebSites = new List<string>
{
"www.mearstransportation.com",
"www.amazon.com",
"www.ebay.com",
"www.att.com",
"www.verizon.com",
"www.sprint.com",
"www.centurylink.com",
"www.yahoo.com"
};
string filename = #"RequestLog.txt";
{
using (var writer = new StreamWriter(filename, true))
{
foreach (string website in lstWebSites)
{
for (var i = 0; i < 4; i++)
{
MyWebRequest request = new MyWebRequest();
request.Request(website);
}
}
}
}
}
}
}
MyWebRequest.cs - the problem is here: public class MyWebRequest
Error is: "The namespace HttpRequestApp already contains a definition for MyWebRequest"
using HTTPrequestApp.MyWebRequest;
using System;
using System.Collections.Generic;
using System.IO;
using System.Net.NetworkInformation;
using System.Net.Sockets;
using System.Threading.Tasks;
namespace HTTPrequestApp
{
public class MyWebRequest : IWebRequest
{
public void Request(string strWebSite)
{
List<string> lstWebSites = Program.GetList(strWebSite);
using (var client2 = new TcpClient(strWebSite, 80))
{
using (NetworkStream stream = client2.GetStream())
using (StreamWriter writer = new StreamWriter(stream))
using (StreamReader reader2 = new StreamReader(stream))
{
//writer.AutoFlush = true;
writer.WriteLine("GET / HTTP/1.1");
writer.WriteLine("HOST: {0}:80", lstWebSites[1]);
writer.WriteLine("Connection: Close");
writer.WriteLine();
writer.WriteLine();
string theresponse = reader2.ReadToEnd();
Console.WriteLine(theresponse);
}
}
}
}
}
IWebRequest.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
//using System.IO;
using System.Threading.Tasks;
//using MyWebRequest.Lib;
namespace HTTPrequestApp.MyWebRequest
{
public interface IWebRequest
{
Task<List<strWebSite>> GetList();
void Request();
}
}
To give an over view of what I am trying to accomplish here is: Send HTTP request to get the initial page. Get back the HTTP response and check that it is a 200 response code. And time how long it took to retrieve the response.
This is a console app but I need to not have it depend on the console, it needs to be an independent application so I can use it somewhere else.
If anyone has any suggestions on how I can simplify my code please let me know.
thanks.
You have a HTTPrequestApp.MyWebRequest namespace and a HTTPrequestApp.MyWebRequest class name: c# compiler get confused (and human too...)
Consider renaming the namespace in something such HTTPrequestApp.MyWebRequestNameSpace if you really want a different namespace.
I am trying to create a console or form where you would drag a file onto their respective .exe
The program would get that file and hash it, then set the clipboard text to the proviously generated hash.
This is my code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Security.Cryptography;
using System.Windows.Forms;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string path = args[0];
StreamReader wer = new StreamReader(path.ToString());
wer.ReadToEnd();
string qwe = wer.ToString();
string ert = Hash(qwe);
string password = "~" + ert + "~";
Clipboard.SetText(password);
}
static public string Hash(string input)
{
MD5 md5 = MD5.Create();
byte[] inputBytes = Encoding.ASCII.GetBytes(input);
byte[] hash = md5.ComputeHash(inputBytes);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hash.Length; i++)
{
sb.Append(hash[i].ToString("X2"));
}
return sb.ToString();
}
}
}
When I get the single .exe from the release, and drag a file onto it, I get some sort of threading error- I can't provide it because it is in the console, not in vb2010. Thanks for any help
Clipboard API uses OLE internally and thus can only be called on a STA thread. Unlike WinForms applications, console applications aren't using STA by default.
Add the [STAThread] attribute to Main:
[STAThread]
static void Main(string[] args)
{
...
Just do what the exception message told you to:
Unhandled Exception: System.Threading.ThreadStateException: Current thread must
be set to single thread apartment (STA) mode before OLE calls can be made. Ensure that your Main function has STAThreadAttribute marked on it.
Cleaning up your program a bit:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Security.Cryptography;
using System.Windows.Forms;
namespace HashToClipboard
{
class Program
{
[STAThread]
static void Main(string[] args)
{
string hexHash = Hash(args[0]);
string password = "~" + hexHash + "~";
Clipboard.SetText(password);
}
static public string Hash(string path)
{
using (var stream = File.OpenRead(path))
using (var hasher = MD5.Create())
{
byte[] hash = hasher.ComputeHash(stream);
string hexHash = BitConverter.ToString(hash).Replace("-", "");
return hexHash;
}
}
}
}
This has several advantages over your program:
It doesn't need to load the whole file into RAM at the same time
It returns the correct result if the file contains non-ASCII characters/bytes
It'd shorter and cleaner
I hope someone can help me. I am a beginner at c# and programming in general and I'm trying to complete this program. Basically it looks in an XML file, grabs all of the occurrences of a specific tag and is supposed to write the File Names plus whatever is between any instances of these two tags. So far I've tried TextWriter, StreamWriter, FileStream and some others and nothing doing what I want. I realise this may be a stupid question but I'm a super noob and need help for my particular case. My code is as follows.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var files = from file in Directory.GetFiles("W:\\SRC\\hDefMl\\1.0\\Instrument_Files")
orderby file
ascending
select file;
StringBuilder sb_report = new StringBuilder();
string delimiter = ",";
sb_report.AppendLine(string.Join(delimiter, "Module", "Generator(s)"));
foreach (var file in files)
{
string filename = Path.GetFileNameWithoutExtension(file);
Console.WriteLine("The HDefML file for {0} contains these EEPROM Generators:", filename);
XDocument hdefml = XDocument.Load(file);
var GeneratorNames = from b in hdefml.Descendants("Generators")
select new
{
name = (string)b.Element("GeneratorName")
};
string description;
foreach (var generator in GeneratorNames)
{
Console.WriteLine(" GeneratorName is: {0}", generator.name);
sb_report.AppendLine(string.Join(delimiter, filename,
generator.name));
}
}
}
You should be able to just do something like this, if the string you have built with your string builder is formatted correctly.
static void WriteToCSV(string str, string path)
{
using (Stream stream = File.Create(path))
using (StreamWriter writer = new StreamWriter(stream))
{
writer.WriteLine(str);
}
}
try{
FileStream FS;
StreamWriter SW;
using (FS = new FileStream("HardCodedFileName.csv", FileMode.Append))
{
using (SW = new StreamWriter(FS))
{
foreach (var generator in GeneratorNames)
{
SW.WriteLine(string.Join(delimiter, filename,
generator.name));
}
}
}
}
catch (Exception e){
Console.Writeline(e.ToString());
}