I was hoping to get just the word count from a pdf document programmatically.
I've looked at PDFSharp, but it's awefully bulky for what I want to do. I don't have access to the server, so I can't install acrobat to get to their api's or anything. I'd be willing to do it in iTextSharp or another tool.
iTextSharp has a wonderful PdfTextExtractor object that will get you all of the text (assumming as #Rob A pointed out that its actually stored as text and not images or pure vector). Once you've got all of the text a simple RegEx will give you the word count.
The code below should do it for you. (Tested on iText 5.1.1.0)
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text.pdf.parser;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
string InputFile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Input.pdf");
//Get all the text
string T = ExtractAllTextFromPdf(InputFile);
//Count the words
int I = GetWordCountFromString(T);
}
public static string ExtractAllTextFromPdf(string inputFile)
{
//Sanity checks
if (string.IsNullOrEmpty(inputFile))
throw new ArgumentNullException("inputFile");
if (!System.IO.File.Exists(inputFile))
throw new System.IO.FileNotFoundException("Cannot find inputFile", inputFile);
//Create a stream reader (not necessary but I like to control locks and permissions)
using (FileStream SR = new FileStream(inputFile, FileMode.Open, FileAccess.Read, FileShare.Read))
{
//Create a reader to read the PDF
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(SR);
//Create a buffer to store text
StringBuilder Buf = new StringBuilder();
//Use the PdfTextExtractor to get all of the text on a page-by-page basis
for (int i = 1; i <= reader.NumberOfPages; i++)
{
Buf.AppendLine(PdfTextExtractor.GetTextFromPage(reader, i));
}
return Buf.ToString();
}
}
public static int GetWordCountFromString(string text)
{
//Sanity check
if (string.IsNullOrEmpty(text))
return 0;
//Count the words
return System.Text.RegularExpressions.Regex.Matches(text, "\\S+").Count;
}
}
}
You can use a pdf2text tool and then count the words:
tools pdf2text
Related
I'm fresh in c# forms i have an important question.
I want to make randomly text writer one line from url "list"
Let Me explain, i wanna to get a random line from pastebin, and rename a label with the random line chosed, and ignore blank lines.
This is a error:
Severity Code Description Project File Line Suppression State
Error CS0103 The name 'randomline' does not exist in the current context SNPCorp C:\Users\GhostStru\Desktop\SNP\Tester\Generator.cs 45 Active
i have that code:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Net.Sockets;
using System.Net.Security;
using System.Threading;
using System.Net.Http;
using System.Web;
using System.Net.WebSockets;
using System.Net;
using System.IO;
namespace Tester
{
public partial class Generator : Form
{
public Generator()
{
InitializeComponent();
}
private void Generator_Load(object sender, EventArgs e)
{
WebClient web = new WebClient();
string text = web.DownloadString("https://pastebin.com/raw/test");
var lines = text.Split(new string[] { "." }, StringSplitOptions.None);
Random rnd = new Random();
int randomLineIndex = rnd.Next(0, lines.Count());
var randomline = lines[randomLineIndex];
}
private void button3_Click(object sender, EventArgs e)
{
label11 = (randomline);
}
}
}
Your question has some things that aren't clear but I'll try with it. First, I define a method to get a random line:
private static string GetRandomLine()
{
using (var web = new WebClient())
{
var html = web.DownloadString("https://pastebin.com/raw/test");
var lines = html.Split(new string[] { "." }, StringSplitOptions.None);
var rnd = new Random();
int index = rnd.Next(0, lines.Length);
return lines[index];
}
}
Then, you can use that method to set Text property of your label. You must set the Text property, not the label variable
private void button3_Click(object sender, EventArgs e)
{
label11.Text = GetRandomLine();
}
The DownloadString is not working for me. Is that the Url? In any case, if you use a lot this method, is interesting save into a variable the HTML and use it some time (one hour... depending of your needs) as a cache instead of make a request each time. WebClient implements IDisposable so you must invoke Dispose after use it or allow to "using" do it automatically.
I'll try to answer your question even though you haven't explained some points clearly.
Create a function to return the random line
In button3_Click event, call a function that will return a random line, then set the text value of button3 to that returned value.
public static string GetRanLine()
{
using (var WC = new WebClient())
{
var Response = WC.DownloadString("https://pastebin.com/raw/test");
var AllLines = Response.Split(new string[] { "."}, StringSplitOptions.None);
var RanLine = AllLines[new Random().Next(0, AllLines.Length - 1)];
return RanLine;
}
}
private void button3_Click(object sender, EventArgs e)
{
label11 = GetRanLine();
}
I haven't really written anything outside of Powershell in a long time, and I know this is ugly, but I can't seem to figure out why my new PDF is not adding the page numbers. I pulled the example from this itext kb.
I tried to make this basic app so people in the office could add the page numbers to PDF's. Here's what I have so far. It will create the new PDF (duplicate of the original), but it's not adding the page numbers.
Basically they use button1 to find their PDF via the Windows File Explorer dialog. It just stores the filename in a textbox. The second button is the "save" and should take the src file and make a copy of the src with only adding the page number at the bottom of the file (or anywhere at this point).
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Security.Cryptography.X509Certificates;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using iText.Kernel.Pdf;
using iText.Layout;
using iText.Layout.Element;
using iText.Layout.Properties;
namespace PDFManipulation
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
int size = -1;
DialogResult result = openFileDialog1.ShowDialog(); // Show the dialog.
if (result == DialogResult.OK) // Test result.
{
string file = openFileDialog1.FileName;
try
{
string text = File.ReadAllText(file);
size = text.Length;
textBox1.Text = file;
}
catch (System.IO.IOException)
{
}
}
Console.WriteLine(size); // <-- Shows file size in debugging mode.
Console.WriteLine(result); // <-- For debugging use.
}
private void button2_Click(object sender, EventArgs e)
{
Stream myStream;
//SaveFileDialog saveFileDialog1 = new SaveFileDialog();
if (saveFileDialog1.ShowDialog() == DialogResult.OK)
{
if ((myStream = saveFileDialog1.OpenFile()) != null)
{
// Code to write the stream goes here.
myStream.Close();
string SRC = textBox1.Text;
string DEST = saveFileDialog1.FileName;
FileInfo file = new FileInfo(DEST);
file.Directory.Create();
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
Document doc = new Document(pdfDoc);
int numberOfPages = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= numberOfPages; i++)
{
// Write aligned text to the specified by parameters point
doc.ShowTextAligned(new Paragraph("page " + i + " of " + numberOfPages),559, 806, i, TextAlignment.CENTER, VerticalAlignment.TOP, 0);
}
doc.Close();
}
}
MessageBox.Show("PDF Page Numbering Added!", "Pages Added",MessageBoxButtons.OK);
Application.Exit();
}
}
}
I'm a dumb dumb. The x,y coordinates were off as the value 812 for the height is off the page.
I'm trying to make a code that converts memorystream to a png image, but I'm getting a ArgumentException "parameter is incorrect" error on using(Image img = Image.FromStream(ms)). It doesn't specify it any further so I don't know why I'm getting the error and what am I supposed to do to it.
Also, how do I use the Width parameter with img.Save(filename + ".png", ImageFormat.Png);? I know I can add parameters and it recognizes "Width", but I have no idea how it should be formatted so visual studio would accept it.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using System.Drawing.Imaging;
namespace WindowsFormsApp1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
MemoryStream ms = new MemoryStream();
public string filename;
private void button1_Click(object sender, EventArgs e)
{
OpenFile();
}
private void button2_Click(object sender, EventArgs e)
{
ConvertFile();
}
private void OpenFile()
{
OpenFileDialog d = new OpenFileDialog();
if(d.ShowDialog() == DialogResult.OK)
{
filename = d.FileName;
var fs = d.OpenFile();
fs.CopyTo(ms);
}
}
private void ConvertFile()
{
using(Image img = Image.FromStream(ms))
{
img.Save(filename + ".png", ImageFormat.Png);
}
}
}
}
I suspect the problem is with how you're reading the file here:
fs.CopyTo(ms);
You're copying the content of the file into the MemoryStream, but then leaving the MemoryStream positioned at the end of the data rather than the start. You can fix that by adding:
// "Rewind" the memory stream after copying data into it, so it's ready to read.
ms.Position = 0;
You should consider what happens if you click on the buttons multiple times though... and I'd strongly advise you to use a using directive for your FileStream, as currently you're leaving it open.
I am trying to create a recorder from audio coming out from soundcard and this is my progress so far, the problem is the recorded audio when saving to file is so large like a song can reach up to hundreds of megabyte.
here's my code
using NAudio.CoreAudioApi;
using NAudio.Wave;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace Record_From_Soundcard
{
public partial class frmMain : Form
{
private WaveFileWriter writer;
private WasapiLoopbackCapture waveInSel;
public frmMain()
{
InitializeComponent();
}
private void frmMain_Load(object sender, EventArgs e)
{
MMDeviceEnumerator deviceEnum = new MMDeviceEnumerator();
MMDeviceCollection deviceCol = deviceEnum.EnumerateAudioEndPoints(DataFlow.Render, DeviceState.Active);
cboAudioDrivers.DataSource = deviceCol.ToList();
}
private void btnStopRecord_Click(object sender, EventArgs e)
{
waveInSel.StopRecording();
writer.Close();
}
private void btnStartRecord_Click(object sender, EventArgs e)
{
using (SaveFileDialog _sfd = new SaveFileDialog())
{
_sfd.Filter = "Mp3 File (*.mp3)|*.mp3";
if (_sfd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
MMDevice _device = (MMDevice)cboAudioDrivers.SelectedItem;
waveInSel = new WasapiLoopbackCapture(_device);
writer = new WaveFileWriter(_sfd.FileName, waveInSel.WaveFormat);
waveInSel.DataAvailable += (n, m) =>
{
writer.Write(m.Buffer, 0, m.BytesRecorded);
};
waveInSel.StartRecording();
}
}
}
}
}
can anyone help me on how to compress audio upon saving?
maybe it will be added on this part
waveInSel.DataAvailable += (n, m) =>
{
writer.Write(m.Buffer, 0, m.BytesRecorded);
};
Thanks in advance.. ;)
Try this using naudio dll
Using NAudio.Wave; Using NAudio.Wave.SampleProviders;Using NAudio.Lame
Private void WaveToMP3(int bitRate = 128){
String waveFileName = Application.StartupPath + #"\Temporal\mix.wav";
String mp3FileName = Application.StartupPath + #"\Grabaciones\ " + Strings.Format(DateTime.Now, "dd-MM-yyyy.HH.mm.ss") + ".mp3";
Using (var reader = New AudioFileReader(waveFileName))
{
Using (var writer = New LameMP3FileWriter(mp3FileName, reader.WaveFormat, bitRate))
{
reader.CopyTo(writer);
}
reader.Close();
}
}
You can't make an MP3 file by saving a WAV file with a .MP3 extension (which is what you are doing here). You will need to select an encoder available on your machine, and pass the audio through that. I go into some detail about how to do this with NAudio in this article.
I'm creating a simple application to capture customer data from text boxes and save on a text file in drive C, the strings are delimited by commas. And display the stored data on a second form which is activated from the first form by a button The code I've so far written is below can someone please help me where I'm going wrong.
The code compiles without errors, but it displays the warning:
variable FILE in form2 is declared but not used... When I start without Debugging it crashes with an error report:
The type initializer for
'InvoiceDataAppGaoria.Form2' threw an exception(Form2 F2=new Form2())
When I assign the array elements to the text boxes the IDE(visual studio 2008) reports an error.
Form2
namespace InvoiceDataAppGaoria
{
public partial class Form2 : Form
{
static string FILE = #"C:\Csharp\coursework1\maucha.txt";
static FileStream outFile = new FileStream("FILE", FileMode.Open, FileAccess.Read);
static StreamReader read = new StreamReader("FILE");
static string line = read.ReadLine();
static string[] values = line.Split(',');
string invoicetxt = values[0];
string lname = values[1];
string fname = values[2];
string AMT = values[3];
public Form2()
{
InitializeComponent();
}
}
}
Form1
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Collections;
using System.IO;
namespace InvoiceDataAppGaoria
{
public partial class Form1 : Form
{
const string DELIM = ",";
const string FILENAME = #"C:\Csharp\coursework1\maucha.txt";
int invoNum;
string lname,fname;
double AMT;
static FileStream outFile = new FileStream("FILENAME",FileMode.Create,FileAccess.Write);
StreamWriter writer = new StreamWriter(outFile);
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
invoNum = Convert.ToInt32(invoicetxt.Text);
lname = lnameBox.Text;
fname = fnameBox.Text;
AMT = Convert.ToDouble(amtBox.Text);
writer.WriteLine(invoNum+DELIM+lname+DELIM+fname+DELIM+AMT);
}
private void button2_Click(object sender, EventArgs e)
{
Form2 F2 = new Form2();
F2.Show();
}
}
}
When you create the file stream, you use the string "FILE" instead of the variable FILE. It should probably look like this instead:
static string FILE = #"C:\Csharp\coursework1\maucha.txt";
static FileStream outFile = new FileStream(FILE, FileMode.Open, FileAccess.Read);
static StreamReader read = new StreamReader(FILE);
And just to have mentioned it: you should really name your variable file and not FILE. Sticking to common naming conventions makes reading others code much more pleasant.