Digital Archiving – Workflow Set up
Week 2-3: A Finer Level of Detail
So much has happened within the space of one month at the Cork LGBT Archive! The next series of blog posts will be published throughout August, as we catch up with all the excitement that this project has had to offer.
After some initial hurdles, we had some advances in terms of the workflow process. Previous scanning processes in the Cork LGBT Archive were made efficient by Orla and her colleagues by selecting software that was suitable and not too cumbersome. So my job was to look for ways to make this happen with their new scanner.
A series of different image scanning software applications were identified which were available to us for free, but as we will see in this post, it is not an easy task to locate one that is both simple and straightforward! The way that scanner software has evolved in the past few years is simultaneously excellent and flawed – informing us that automation is not always the best way forward. And the Epson Workforce DS-50000 flatbed scanner had its quirks!
setting up the scanner
After we unboxed our new scanner, we found that a hardware component called a “Network Interface Component” was already installed. This caused problems for the installation, as it was not possible to setup a USB connection to the laptop out of the box. With the interface connection the idea is that you are connecting the scanner to a number of computers instead of the method that we have become accustomed to of “plug-and-play” – you plug it in, and it works! Coupled with that, there were no instructions included with the scanner to show what needed to be done when setting up the machine to carry out a simple scan to laptop via USB. So the initial excitement of getting setup right away and firing up a scan were dampened, instead calling for a bit of serious investigative work (some tinkering was needed). After attempting the network connection on a number of different pieces of software (Epson Scan 2, Document Capture Pro, Windows Fax and Scan), I tried to connect to the printer using a Mac.
When this also failed, I had a conversation about it with Orla’s partner, Carol. Zooming out from the situation, we decided that the most obvious option to try was to actually remove the Network Interface component and to swap it for the standard component that had been left beside the printer.
I was a little wary of taking the scanner apart since we had only just bought the product! This initial work with the scanner required that I dis-assemble the Network Interface component with an Allen key (and awkwardly removing screw covers that had somehow been turned the wrong way around inside of the screw slot)! It was a bizarre situation to have to swap this out, as it was awkward and risked breaking something on the scanner. However, once this had been swapped for the standard component, we discovered that the scanner software eventually recognised the scanner and then everything worked without a hitch.
All better in hindsight of course!
The Network Interface Component did sound useful for some situations – it means that a number of devices can be connected to the scanner at one time (phones, laptops, PCs). However, for a job such as this, it was determined that only one person and one computer was needed for the scanning job at any one time, this suited our workflow best. It was decided that it would make more sense for the workflow if one person was positioned at a designated computer, connected by USB to the scanner whilst scanning documents instead of multiple people connected to the machine at once. The Network Interface Component was returned to the original box!
In terms of hardware, our initial sense of the scanner was that it operated quite fast, and the sound of a scan wasn’t too clunky! In general, it promised to make the scanning very efficient, one click of preview or scan button, the scanner moved across the bed quite fast, and you had an image within 4-5 seconds. However, a 600 dpi scan required the move across the bed to slow significantly, so we ended up with 10 seconds per scan.
For software needs, one of the most important areas needed for workflow in the Cork LGBT Archive was the ability to be able to create PDF documents that contained a number of different scanned images bundled together (to add or edit pages after scanning). It became apparent that this was available in both software packages that we tried out, Epson Scan and also Document Capture Pro. With this option available, we could then move on to testing for other features.
There were a number of software packages that allowed us to scan PDF/A documents, which was a major boon as the Digital Repository of Ireland (the DRI) preferred PDF/A format. One of those packages that afforded PDF/A was Epson Scan, which also provided the option of scanning to a searchable PDF for text documents. Epson scan also allowed for rotation and “correct document skew”, both very helpful features. Another plus was that there was the ability to create “Multiple TIFF” images, and in advanced settings to remove background, to enhance text and to change brightness, contrast, and gamma. Another option we had for this was that with the utility installation of this scanner software came an image editor named “Document Capture Pro”.
Overall, the settings available with both of these software packages were quite similar, as the Epson Scan software was used by Document Capture Pro every time a scan was requested. One feature that was not available in Document Caputre Pro was the ability to crop images during the workflow. During the testing of the scanner, it became apparent that the scanner software cropped images automatically, but in many cases not accurately. This meant that the white background surrounding images was in view on some occasions, and not on others. Worse still, sometimes part of documents were cut off.
I then tested other scanner software options to see if manual crop could be available. ArcSoft PhotoImpression was a good option, but it didn’t meet our needs. Another option that allowed us to crop images was Digitech’s PaperVision Plugin for Document Capture Pro. This website was useful in our search – it compared some of the other free versions that are available, plus important information on which ones had auto and manual cropping: https://turbofuture.com/graphic-design-video/Whats-the-Best-Multiple-Photo-Scanning-Software
We found that when using Document Capture Pro that there are two options in the main menu –
1) To do a job scan. This allows you to setup a workflow for a scanning job, which has predefined settings for the scan, a destination for the file once you have finished the scanning.
2) Simple scan. This allows you to scan without taking any actions, and is not a workflow but just a scan that can give you the scanned image and then you can do what you want with it yourself.
The following list of other scanning software options that demonstrate what is available as of summer 2021:
- AutoSplitter – very basic looking interface, has no PDF save format, has crop functionality but requires once-off payment of $30 for 2 years
- VueScan (also an app on android) Once-off payment of €20 – cropping is done based on pre-defined size of image, no crop tool
- PixEdit (Norway) €75 per month
In previous version of Epson scanner software, the manual cropping tool was available. For example, Orla had used a very nifty tool in the Epson Expression 11000XL. With little availability of open-source tools for cropping images, I decided to test to see if the new scanner would work with older versions of the Epson scanner software (Epson Scan). However, this application did not open correctly as the scanner was not recognised (Epson Expression 11000XL). Error message, “Epson Scan cannot be started”.
Eventually, after further searches for “cropping” in scanning software, a search result listed on an interesting website named UtahRails.net, where I found this thread: https://utahrails.net/tech-talk-photos.php … finally, a recommended software called “FastStone Image Viewer” proved to be an excellent find for our job.
One contributor on the UtahRails website mentioned that, “I generally use FastStone Image Viewer for the basic stuff, like adjusting gray levels on black & white photos, or color levels on color photos. It also works very well for straightening and cropping, and simple resizing. It also has a good batch conversion tool to convert TIF to JPG, for uploading to SmugMug.“
This was a watershed moment. After all of our searching and testing, finally we found a suitable piece of open-source software that could produce scans to our liking, which were also of an excellent quality and we had the ability to crop manually. On to the workflow!
Setting up a workflow
As week three started, the excitement was building with the knowledge that we were now set up to scan our first box of documents. We ran some tests to check if the scanner and accompanying software would produce the desired PDFs of documents.
Again, this was not without its problems, as we quickly discovered in some of the previews that the documents were not being imaged on the very outer edge of some of the documents. In some cases parts of documents were cut off when they were placed against the edge of the scanner. The scanner did have a line which demarcated the edge of an A4 scan, but even when following this rule we discovered that it sometimes cut off part of the image.
We needed to come up with a way to push the document further into the scanner. As an interim decision, we placed wooden pegs at the edge of the scanner bed. This allowed documents to be placed up against their edges, facilitating an easier cropping and allowing for the full version of the document to be seen after a scan was made.
Innovative scanning at Cork LGBT Archive today!— Cork LGBT Archive (@CorkLGBThistory) June 15, 2021
Trying to find something straight in @GayProjectIRL to line up Play Safe leaflet for scanning – eventually found colourful lollypop sticks in the Arts & Crafts corner. @MiraDean8 @drpatrickegan @OrlaEgan1
Creative #DigitalArchives pic.twitter.com/6pLZRIKDWZ
As we began to scan documents, we discovered that the workflow was turning out to be perfect for our needs. We were able to scan efficiently (with minimal set up for documents), and the process was uncomplicated (move the documents towards the sticks, crop later).
With the scan process set up correctly (and happy staff working away on the scanning!), I could now turn my attention to metadata description, to getting an overview of items that were already within the DRI and Cork LGBT Archive, on to some coding and “motivational Cork LGBT Spotify Playlists”!
Have you had similar experiences with scanning setup and workflow? We would love to hear from you, please leave your comments below.