We have inherited many Word documents containing many Visio diagrams. Much of the text in the diagrams appears nowhere else in the Word documents. How can we copy (for pasting) all the text in all the diagrams?
Now we are getting somewhere. I promised to try all the sugggestions, and here are the results.
The Onenote trick almost works, because it does recover text, and that is great. However, in my test case, it seems to have made a few errors. For example, it changed "conv" to "cony", and "INPUT_ACE_" to "INPUThACE_". We can probably cope with most errors like this, but is this sort of error typical?
I also downloaded and tried Screenpresso. The results match what I think Jay said in his second note, namely, Screenpresso performs a screen capture, but does not seem to have the capability to extract the text (but I sent them an e-mail asking for clarification). If that functionality was explained in the tutorial, I missed it.
Finally, I tried the Visio instructions from ha14. Everything worked up to the point where I selected Excel as the report format. The response was the following error message:
"No shapes have the selected properties or satisfy the report selection criteria. Add property values to the shapes or select a different report definition, and try again."
I am not a Visio user, and admit to not knowing what I am doing here. Nevertheless, I followed the instructions for modifying the report format (explicitly selecting the displayed text), and the tool created a vrd output file. After a few complaints, Excel opened the file, but unfortunately the displayed text was not present.
It is not clear to me what steps are required to make this work. Thus, any further suggestions will be greatly appreciated.
As a possible alternative, somebody (not involved in this makeuseof discussion) mentioned something called "COM Interop". Does anybody know anything about it and how it might help?
It looks like, the font used in the diagram is not clear enough for the text reader of onenote.
now you can check this with two options.
if you have control over the visio diagram,
try a visio diagram with different fonts.
or try something else with ocr feature.
you can post a clear screenshot of diagram with the image option of disqus comment, so it can be more clear.
I also thought the screen capture of the diagram might be fuzzy, but in fact it was crystal clear. And then I tried it again using a different screen-capture app, and the results were identical. But please keep the ideas coming.
Other than that, I will try ocr next. As for google docs, I have used it in the past during e-meetings, but am not sure how it would help here. Is the idea to upload the Word file and get the pictures processed there?
no you don't have to upload word file, you have to upload the picture file.
once you sign in
you will see file list.
drag and drop the picture and there will be an option that will allow you to extract text from picture or pdf.
when the upload will complete,
it will show both picture and text below the picture which can be copied and pasted.
you may resize the picture before scanning it to read text in onenote.
try both increasing and decreasing the size 200% and 50 %
before capturing text process, it's just a guess that change in size may change the accuracy of ocr/text extracting.
but if text is clear, than all the ocr options may give you the same output.
and honestly, i don't know much about visio, i know that is used to draw diarams, flow chatr etc,
but i am not sure whether the diagram in word file is just a picture file or more than that, you can tell me about the format of diagram in word and available options.
but I knew that text could be extracted.
i will update you if i find something new.
I varied the font in a short text document. For each size (12, 18, and 24) I did a screen capture, inserted the screen shot into onenote, did the copy operation, and pasted the result into Notepad. Results:
For all three sizes, onenote changed the string "conv," to "cony,". My guess is that the comma is causing problems.
Given another text string (all upper case), all three sizes led to errors: three different errors.
I have not experimented with different fonts, but since the goal is to avoid having to edit hundreds of visio diagrams, that would be an unpleasant path anyway.
For now I am going to try your other suggestions. Thanks again.
Hello, do you have Visio installed in any computer? If you do, you can get the text out of visio diagrams by using the software in the link below. It is an add-on for Visio software. According to the site, the free version gives you 30 uses. Look in the description under shapes. Make sure to read the requirements.
Thank you FIDELIS. I will try your suggestion along with OCR. Walt
Thank you Jay, Fidelis, and ha14. We have tried some of those ideas, but without the details you have provided. We will now try all of your suggestions, carefully following your detailed instructions.
Note: our experience is that saving the Word document as a pdf captures the text in the Visio diagrams only if somebody has done some sort of MS conversion step on each figure (the details escape me at the moment). What if nobody has had the time or foresight to convert the figures? Opening more than 50 Word documents and converting from 10 to 60 figures in each document is a time-consuming, error-prone process.
In addition, the Visio -> pdf conversion sometimes seems to split text strings such as AA_BDB_CC at the underscores, so that FIND fails in the pdf. Does anybody have a solution for that problem? (ignoring the fact that pdfs sometimes use two different ascii characters for underscore)
you can convert to pdf then use pdf to txt extractor
use the Visio Shape Reports to extract the information into and Excel worksheet, and then copy to Word.
Go to the Review tab, and select Shape Reports. If you are in a Flowchart, one of the options should be a Flowchart report, which will export several fields of data plus the text in the shapes. Select the Flowchart report, then click Run, and select Excel from the list of export types, and click OK. Excel will open with the information, including a column called "Displayed Text".
Super Utilities and Tools
This software is an add-on for Microsoft Office Visio providing you with a toolbox and collection of utilities for anyone who uses or develops with Visio. It also gives you easy access to your 30 most recent used Visio stencils from the Visio File. Unlike Visio macros these utilities are available immediately to all your documents.
The unregistered version is fully functional and gives you 30 free uses of any of the built-in utilities, after 30 uses a delay gets longer by 1 second every time the software is used. We hope this is long enough for you to try out these utilities.
These instructions are for Visio 2007.
1.Make a new report and select Shapes on All pages or Shapes on the current page.
2.Then select only (you may want to Show all properties to make sure nothing else is checked)
3.Give it a Title e.g: Text Only Report
4.Give it a Name e.g: Text_Only_Report
5.Then select run and export it to either Excel, HTML, Visio Shape or XML
MS-Visio 2010, It has the same function but under:
Review -> Share Reports
Hello, if you have the office suite you can use the onenote program as explained above.
or if you don't have the onenote program, you could try the following free software:
If you are running vista/windows 7 you could use the snipping tools utility included with the operating system. There are several free/paid options for screen capturing. You can even use online tools for it. It is just a matter of trying a few and see which ones are easier to use and fulfill the requirements you have.
screenshot is not the target, need to extract text from diagram, like it was recently in a question, so you need something like ocr as step 2 after 1st step of capturing screenshots.
as you correctly suggested one note can capture screen shots, but then you will have to read text from the image.
first save the diagrams as an image file by taking a screen shot or any other possible option.
now just open microsoft onenote from microsoft office.
import the image
right click on the image
select an option that reads something like
"read text from picture"
you can paste the copied text anywhere you want.
upload the image to this site.
get the text and your desired file format.