JavaProspect: Natural Language Processing, Adobe PDF Extract, and Deep PDF Intelligence

Wednesday, November 17, 2021

Natural Language Processing, Adobe PDF Extract, and Deep PDF Intelligence

The Adobe PDF Extract API is a powerful tool to get information from your PDFs. This includes the layout and styling of your PDF, tabular data in easy-to-use CSV format, images, and raw text. All things considered, the raw text may be the least interesting aspect of the API. One useful possibility is to take the raw text and provide it to search engines (see Using PDFs with the Jamstack - Adding Search with Text Extraction). But another fascinating possibility for working with the text involves natural language processing or NLP.

Broadly (very broadly, see the Wikipedia article for deeper context), NLP is about understanding the contents of the text. Voice assistants are a great real-world example of this. What makes Alexa and Google Voice devices so powerful is that they don't just hear what you say, but they understand the intent of what you said. This is different from the raw text.

from DZone.com Feed https://ift.tt/3oCxbev

JavaProspect

Wednesday, November 17, 2021

Natural Language Processing, Adobe PDF Extract, and Deep PDF Intelligence

No comments:

Post a Comment