itext7.pdfHTML library how to fix it to support multiple languages

itext7.pdfHTML library how to fix it to support multiple languages - c#

I have a created sample project for try to test the itext7.pdfHTML library. I think this library supports the English language only, But I think it's impossible to remember only one language support. Please help me to fix it to support multiple language. I will convert html to pdf only and I use itext7.pdfhtml 3.0.4
Controller
public IActionResult TestPDFHtml()
{
string html = #"<html><head><meta http-equiv=""content-type"" content=""text/html""; charset=""UTF-8""></head><body>สวัสดี</body>";
TestPDF test = new TestPDF();
byte[] vs = test.creatPDFByte(html);
return File(vs, "application/pdf");
}
Code method creatPDFByte
public byte[] creatPDFByte(string pdfHTML)
{
byte[] buffer;
try
{
using (MemoryStream ms = new MemoryStream())
{
using (PdfWriter pw = new PdfWriter(ms))
{
pw.SetCloseStream(true);
using (PdfDocument pdfDoc = new PdfDocument(pw))
{
ConverterProperties cProps = new ConverterProperties();
cProps.SetCharset("UTF-8");
pdfDoc.SetDefaultPageSize(PageSize.A4);
pdfDoc.SetCloseWriter(true);
pdfDoc.SetCloseReader(true);
pdfDoc.SetFlushUnusedObjects(true);
HtmlConverter.ConvertToPdf(pdfHTML, pdfDoc, cProps);
}
}
buffer = ms.ToArray();
}
}
catch ...

why do you think so? Have you tried adding fonts in css because the default font may not support your language.

Related

Combination of two word documents using Open XML in ASP.NET Core Web API - Images are missing

My approach is pretty simple. I am getting two files from internet (served as .docx files), get the byte[] for those two file. And performing Append() operation on the destination file, appending the cloned Body of the source file. The below is my code
using Microsoft.AspNetCore.Mvc;
using Newtonsoft.Json;
using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;
using System.Collections.Generic;
namespace WhatApp.Controllers
{
[Route("api/[controller]")]
[ApiController]
public class DocController : ControllerBase
{
[HttpGet]
public async Task<IActionResult> Get()
{
byte[] file1 = await GetBytes("https://dummyfileserver.io/file/1");
byte[] file2 = await GetBytes("https://dummyfileserver.io/file/2");
byte[] result = MergeFiles(file1, file2);
// To return the file
return File(result, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
}
private async Task<byte[]> GetBytes(string url)
{
using HttpClient httpClient = new HttpClient();
var res = await httpClient.GetAsync(url);
if (res.IsSuccessStatusCode)
{
using var filestream = await res.Content.ReadAsStreamAsync();
var filebytes = new byte[filestream.Length];
filestream.Read(filebytes, 0, filebytes.Length);
return filebytes;
}
throw new Exception();
}
private byte[] MergeFiles(byte[] dest, byte[] src)
{
using (MemoryStream destMem = new MemoryStream())
{
destMem.Write(dest, 0, (int)dest.Length);
using (WordprocessingDocument mywDoc =
WordprocessingDocument.Open(destMem, true))
{
mywDoc.MainDocumentPart.Document.Body.InsertAt(new PageBreakBefore(), 0);
mywDoc.MainDocumentPart.Document.Body.Append(new Paragraph(new Run(new Break() { Type = BreakValues.Page })));
var srcElements = GetSourceDoc(src);
mywDoc.MainDocumentPart.Document.Body.Append(srcElements);
mywDoc.Close();
}
return destMem.ToArray();
}
}
private OpenXmlElement GetSourceDoc(byte[] src)
{
using (MemoryStream srcMem = new MemoryStream())
{
srcMem.Write(src, 0, (int)src.Length);
using (WordprocessingDocument srcDoc =
WordprocessingDocument.Open(srcMem, true))
{
OpenXmlElement elem = srcDoc.MainDocumentPart.Document.Body.CloneNode(true);
srcDoc.Close();
return elem;
}
}
}
}
}
The result file does not show the images properly in the region where file2 is being added (second part of the response document).
What must be the reason for this problem? How to solve it?
Another issue I noticed is the the debugging forcefully stops after I save the file to local machine. What must be the cause of that?

I see your requirement to combine two word files using ASP.NET core. I highly suspect AltChunks is a good idea as your response is a FileContentResult coming out from a byte[] array. Indeen OpenXML does not hide the complexity. But OpenXML PowerTools is what I will recommend to consider. It is now maintained by Eric White and has a nuget package for .NET standard as well. Just go ahead and install the package and modify your MergeFiles() method as below:
private byte[] MergeFiles(byte[] dest, byte[] src)
{
var sources = new List<Source>();
var destMem = new MemoryStream();
destMem .Write(dest, 0, dest.Length);
sources.Add(new Source(new WmlDocument(destMem .Length.ToString(), destMem), true));
var srcMem = new MemoryStream();
srcMem .Write(src, 0, src.Length);
sources.Add(new Source(new WmlDocument(srcMem .Length.ToString(), srcMem ), true));
var mergedDoc = DocumentBuilder.BuildDocument(sources);
MemoryStream mergedFileStream = new MemoryStream();
mergedDoc.WriteByteArray(mergedFileStream);
return mergedFileStream.ToArray();
}
Source DocumentBuilder and WmlDocument are coming from OpenXmlPowerTools namespace. Good luck!

Images are stored seperately and you will need to manually include them as well. You will also need fix all the relationships within OpenXml. Unfortunately, OpenXML is not trivial and the SDK does not hide that complexity.
However, if you know, that your word document is opened by software (i.e. MS Word) that understands AltChunks, there might be an easy way for you:
I suggest you to look at Merge multiple word documents into one Open Xml
From my experience: How good this works depends heavily on the complexity of your documents and the intended usage. Opening it with MS Word is usually fine but for example converting it to PDF on the server (with a 3rd party library) might not give the intended results.

iText 7 - HTML to PDF write to MemoryStream instead of file

I'm using iText 7, specifically the HtmlConverter.ConvertToDocument method, to convert HTML to PDF. The problem is, I would really rather not create a PDF file on my server, I'd rather do everything in memory and just send it to the users browser so they can download it.
Could anyone show me an example of how to use this library but instead of writing to file write to a MemoryStream so I can send it directly to the browser?
I've been looking for examples and all I can seem to find are those which refer to file output.
I've tried the following, but keep getting an error about cannot access a closed memory stream.
public FileStreamResult pdf() {
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream)) {
pdfWriter.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter)) {
//Returns the written-to MemoryStream containing the PDF.
byte[] byteInfo = workStream.ToArray();
workStream.Write(byteInfo, 0, byteInfo.Length);
workStream.Position = 0;
return new FileStreamResult(workStream, "application/pdf");
}
//return new FileStreamResult(workStream, "application/pdf");
}
}

You meddle with the workStream before the document and pdfWriter have finished creating the result in it. Furthermore, the intent of your meddling is unclear, first you retrieve the bytes from the memory stream, then you write them back into it...?
public FileStreamResult pdf()
{
var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
pdfWriter.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
}
}
workStream.Position = 0;
return new FileStreamResult(workStream, "application/pdf");
}
By the way, as you are essentially doing nothing special with the document returned by HtmlConverter.ConvertToDocument, you probably could use a different HtmlConverter method with less overhead in your code.

Generally this approach works
using (var ms = new MemoryStream())
{
//yourStream.Seek(0, SeekOrigin.Begin)
yourStream.CopyTo(ms);
}

iText 7.0.4.0 - Converting PdfDocument to byte array

I'm attempting to split a PDF file page by page, and get each page file's byte array. However, I'm having trouble converting each page to byte array in iText version 7.0.4 for C#.
Methods referenced in other solutions rely on PdfWriter.GetInstance or PdfCopy, which seems to no longer exist in iText version 7.0.4.
I've gone through iText's sample codes and API documents, but I have not been able to extract any useful information out of them.
using (Stream stream = new MemoryStream(pdfBytes))
using (PdfReader reader = new PdfReader(stream))
using (PdfDocument pdfDocument = new PdfDocument(reader))
{
PdfSplitter splitter = new PdfSplitter(pdfDocument);
// My Attempt #1 - None of the document's functions seem to be of help.
foreach (PdfDocument splitPage in splitter.SplitByPageCount(1))
{
// ??
}
// My Attempt #2 - GetContentBytes != pdf file bytes.
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); i++)
{
PdfPage page = pdfDocument.GetPage(i);
byte[] bytes = page.GetContentBytes();
}
}
Any help would be much appreciated.

Your approach of using PdfSplitter is one of the best ways to approach your task. Maybe not so much is available out of the box, but PdfSplitter is highly customizable and if you take a look at the implementation or simply the API, it becomes clear which are correct points for injecting your own customized behavior.
You should override GetNextPdfWriter to provide any output media you want the documents to be created at. You can also use IDocumentReadyListener to define the action that will be performed once another document is ready.
I am attaching one of the implementations that can achieve your goal:
class ByteArrayPdfSplitter : PdfSplitter {
private MemoryStream currentOutputStream;
public ByteArrayPdfSplitter(PdfDocument pdfDocument) : base(pdfDocument) {
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
currentOutputStream = new MemoryStream();
return new PdfWriter(currentOutputStream);
}
public MemoryStream CurrentMemoryStream {
get { return currentOutputStream; }
}
public class DocumentReadyListender : IDocumentReadyListener {
private ByteArrayPdfSplitter splitter;
public DocumentReadyListender(ByteArrayPdfSplitter splitter) {
this.splitter = splitter;
}
public void DocumentReady(PdfDocument pdfDocument, PageRange pageRange) {
pdfDocument.Close();
byte[] contents = splitter.CurrentMemoryStream.ToArray();
String pageNumber = pageRange.ToString();
}
}
}
The calls would be basically as you did, but with custom document ready event:
PdfDocument docToSplit = new PdfDocument(new PdfReader(path));
ByteArrayPdfSplitter splitter = new ByteArrayPdfSplitter(docToSplit);
splitter.SplitByPageCount(1, new ByteArrayPdfSplitter.DocumentReadyListender(splitter));

ITextSharp remove producer metadata with active license on existing pdf not working

I have an existing PDF generated by Telerik Reporting which contains the Application and Producer metadata stating the version of the library. I want these two elements stripped. In the app we use a licensed version of ITextSharp 5.
The Creator (Application) line is properly removed, but the Producer one does not change. During debugging I can see that the code below changes both elements, but when inspecting the resulted PDF the Producer remains.
I have changed the license key to test that it is working and I receive an error leading me to believe that the license is valid.
Throughout Stack Overflow I read that it is only possible to change the Producer line with a valid license. Maybe it refers to PDFs created by ITextSharp and not existing ones?
My code below:
internal static class PdfMetadataStripService
{
internal static byte[] StripMetadata(byte[] pdfBytes)
{
LoadLicenseKey();
PdfReader reader = new PdfReader(pdfBytes);
using (MemoryStream ms = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, ms))
{
Dictionary<string, string> info = reader.Info;
info["Creator"] = string.Empty;
info["Producer"] = string.Empty;
stamper.MoreInfo = info;
using (var xmpMs = new MemoryStream())
{
var xmp = new XmpWriter(xmpMs, info);
stamper.XmpMetadata = xmpMs.ToArray();
xmp.Close();
}
}
return ms.ToArray();
}
}
private static void LoadLicenseKey()
{
var path = string.Format("{0}itextkey.xml", HostingEnvironment.ApplicationPhysicalPath);
LicenseKey.LoadLicenseFile(path);
}
}
"iTextSharp" version="5.5.10"
"itextsharp.licensekey" version="1.0.4"

change format from wav to mp3 in memory stream in NAudio

Hi there iam trying to convert text to speech (wav) in the memorystream convert it to mp3 and then play it on the users page.so need i help what to do next?
here is my asmx code :
[WebMethod]
public byte[] StartSpeak(string Word)
{
MemoryStream ms = new MemoryStream();
using (System.Speech.Synthesis.SpeechSynthesizer synhesizer = new System.Speech.Synthesis.SpeechSynthesizer())
{
synhesizer.SelectVoiceByHints(System.Speech.Synthesis.VoiceGender.NotSet, System.Speech.Synthesis.VoiceAge.NotSet, 0, new System.Globalization.CultureInfo("en-US"));
synhesizer.SetOutputToWaveStream(ms);
synhesizer.Speak(Word);
}
return ms.ToArray();
}
Thanks.

Just wanted to post my example too using NAudio.Lame:
NuGet:
Install-Package NAudio.Lame
Code Snip: Mine obviously returns a byte[] - I have a separate save to disk method b/c I think it makes unit testing easier.
public static byte[] ConvertWavToMp3(byte[] wavFile)
{
using(var retMs = new MemoryStream())
using (var ms = new MemoryStream(wavFile))
using(var rdr = new WaveFileReader(ms))
using (var wtr = new LameMP3FileWriter(retMs, rdr.WaveFormat, 128))
{
rdr.CopyTo(wtr);
return retMs.ToArray();
}
}

You need an MP3 compressor library. I use Lame via the Yeti Lame wrapper. You can find code and a sample project here.
Steps to get this working:
Copy the following files from MP3Compressor to your project:
AudioWriters.cs
Lame.cs
Lame_enc.dll
Mp3Writer.cs
Mp3WriterConfig.cs
WaveNative.cs
WriterConfig.cs
In the project properties for Lame_enc.dll set the Copy to Output property to Copy if newer or Copy always.
Edit Lame.cs and replace all instances of:
[DllImport("Lame_enc.dll")]
with:
[DllImport("Lame_enc.dll", CallingConvention = CallingConvention.Cdecl)]
Add the following code to your project:
public static Byte[] WavToMP3(byte[] wavFile)
{
using (MemoryStream source = new MemoryStream(wavFile))
using (NAudio.Wave.WaveFileReader rdr = new NAudio.Wave.WaveFileReader(source))
{
WaveLib.WaveFormat fmt = new WaveLib.WaveFormat(rdr.WaveFormat.SampleRate, rdr.WaveFormat.BitsPerSample, rdr.WaveFormat.Channels);
// convert to MP3 at 96kbit/sec...
Yeti.Lame.BE_CONFIG conf = new Yeti.Lame.BE_CONFIG(fmt, 96);
// Allocate a 1-second buffer
int blen = rdr.WaveFormat.AverageBytesPerSecond;
byte[] buffer = new byte[blen];
// Do conversion
using (MemoryStream output = new MemoryStream())
{
Yeti.MMedia.Mp3.Mp3Writer mp3 = new Yeti.MMedia.Mp3.Mp3Writer(output, fmt, conf);
int readCount;
while ((readCount = rdr.Read(buffer, 0, blen)) > 0)
mp3.Write(buffer, 0, readCount);
mp3.Close();
return output.ToArray();
}
}
}
Either add a reference to System.Windows.Forms to your project (if it's not there already), or edit AudioWriter.cs and WriterConfig.cs to remove the references. Both of these have a using System.Windows.Forms; that you can remove, and WriterConfig.cs has a ConfigControl declaration that needs to be removed/commented out.
Once all of that is done you should have a functional in-memory wave-file to MP3 converter that you can use to convert the WAV file that you are getting from the SpeechSynthesizer into an MP3.

This is a bit old now, but since you haven't accepted the answer I previously provided...
I have recently built an extension for NAudio that encapsulates the LAME library to provide simplified MP3 encoding.
Use the NuGet package manager to find NAudio.Lame. Basic example for using it available here.

Assuming you're trying to convert the output into MP3, you need something that can handle transcoding the audio. There are a number of tools available, but my personal preference is FFmpeg. It's a command line tool so you will need to take that into account, but otherwise it's very easy to use.
There's lots of information online, but you can start by checking out their documentation here.

I had a similar requirement in .net4.0 to convert 8bit 8Khz mono wav and used the following code
public void WavToMp3(string wavPath, string fileId)
{
var tempMp3Path = TempPath + "tempFiles\\" + fileId + ".mp3";
var mp3strm = new FileStream(tempMp3Path, FileMode.Create);
try
{
using (var reader = new WaveFileReader(wavPath))
{
var blen = 65536;
var buffer = new byte[blen];
int rc;
var bit16WaveFormat = new WaveFormat(16000, 16, 1);
using (var conversionStream = new WaveFormatConversionStream(bit16WaveFormat, reader))
{
var targetMp3Format = new WaveLib.WaveFormat(16000, 16, 1);
using (var mp3Wri = new Mp3Writer(mp3strm, new Mp3WriterConfig(targetMp3Format, new BE_CONFIG(targetMp3Format,64))))
{
while ((rc = conversionStream.Read(buffer, 0, blen)) > 0) mp3Wri.Write(buffer, 0, rc);
mp3strm.Flush();
conversionStream.Close();
}
}
reader.Close();
}
File.Move(tempMp3Path, TempPath + fileId + ".mp3");
}
finally
{
mp3strm.Close();
}
}
Prerequists:
.net 4 compiled yeti library (to obtain it download this older one (http://www.codeproject.com/KB/audio-video/MP3Compressor/MP3Compressor.zip) and convert it to .net4.0 then build the solution to obtain the new version dlls)
download the NAudio libraries (as Lame support 16bit wav sample only i had to first convert it from 8bit to 16bit wav)
I have used a buffer size of 64kpbs (my custom requirement)

have a try:
using (WaveStream waveStream = WaveFormatConversionStream.CreatePcmStream(new
Mp3FileReader(inputStream)))
using (WaveFileWriter waveFileWriter = new WaveFileWriter(outputStream, waveStream.WaveFormat))
{
byte[] bytes = new byte[waveStream.Length];
waveStream.Position = 0;
waveStream.Read(bytes, 0, waveStream.Length);
waveFileWriter.WriteData(bytes, 0, bytes.Length);
waveFileWriter.Flush();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

itext7.pdfHTML library how to fix it to support multiple languages - c#

why do you think so? Have you tried adding fonts in css because the default font may not support your language.

Related

Combination of two word documents using Open XML in ASP.NET Core Web API - Images are missing

iText 7 - HTML to PDF write to MemoryStream instead of file

iText 7.0.4.0 - Converting PdfDocument to byte array

ITextSharp remove producer metadata with active license on existing pdf not working

change format from wav to mp3 in memory stream in NAudio

Categories

Resources