Updated:
October 2017
Email: canary@partners.org
Web: http://canary.bwh.harvard.edu/
Thank you for using Canary Data Converter. This file includes the complete documentation for using the software. For further assistance or bug reports, please contact us via the email address above.
Table of Contents
B)
Starting Canary Data Converter
The Canary Data Converter was created to facilitate conversion of narrative electronic data in different formats to the required format for Canary, a user-friendly information extraction tool designed to extract information from text using user-defined rules.
To install Canary Data Converter, extract the files from the zip folder. Canary Data Converter can be installed (i.e. unzipped) into any directory on the computer. The directory can be moved anywhere else on the computer after Canary Data Converter is unzipped. Canary Data Converter can also work from a flash drive or other portable media.
After installation, instructions for starting the software can be found in a file called “HOW TO RUN” in the installation folder. This file also contains other relevant instructions, such as how to create additional shortcuts.
Canary Data Converter is compatible with Windows 7, Windows 8, Windows 10, and Windows Server 2008/2012/2016. It is compatible with both 32-bit and 64-bit architectures.
The first step in converting your medical records is to
select an input format. Currently, the Canary Data Converter supports plain
text files in four formats: undelimited plain text,
delimited plain text, RPDR format, and Epic Text Format. RPDR format can be
used to process files created by Research Patient Data Registry at Partners
HealthCare. Epic Text format can be used to process files created by Epic
electronic medical record software.
Upon selecting your format, you will need to select the
input source. Currently, the Canary Data Converter supports three input types:
a single file, all files contained in a folder, or all files contained in a
folder and its subfolders. Choose the appropriate option for your files.
Each input format has unique, user-customizable properties that the Canary Data Converter uses to read in your input files. If you are unsure about what a particular option means, you can click the Help button in the lower left corner of the program to open the in-app help window, which will explain each of the options for your chosen input format.
After defining your input options, you will be allowed to click the Next button to move on to the Output tab. The purpose of this tab is to set up the settings for the output Canary Data Converter will produce.
First, you will have to select your desired output format. Currently, the Canary Data Converter supports plain text files in four formats: undelimited plain text, delimited plain text, and Canary format.
Upon selecting the output format, you will need to choose your desired output location. Currently, the Canary Data Converter only supports writing to a folder.
After selecting your output location, you must define your
output options. Like the input options, each output format has unique,
user-customizable properties that the Canary Data Converter uses to write your
output files. If you are unsure about what a particular option means, you can
click the Help button in the lower left corner of the program to open the
in-app help window, which will explain each of the options for your chosen output
format.
After defining your output options, you will be able to click the Next button and move on to the Convert tab.
Before converting your files, you can choose how many files the Canary Data Converter should process at a time. By default, the converter will simultaneously process one less file than the total number of CPU cores on your machine. This is so that there will be an extra core available to prevent your machine from becoming sluggish as you use other programs. Should you decide to spawn more processes than the number of CPU cores available on your machine, you can type in any number of process into the dropdown box, however this is not recommended.
After selecting the number of files to process, you are ready to convert your files.
When you
are on the Convert tab, you will notice that the Next button now says Convert.
To begin converting your files, you must press this button. Upon starting the
conversion, this button will say Cancel All. You can press this button at any
time during the conversion to stop converting all files. After cancelling a
conversion, however, you may not resume converting.
The
Convert tab has four sub-tabs: Conversion, Queue, Completed, Warnings, and
Errors.
The Conversion tab is where you can see progress for running conversion jobs, as well as where you can pause, resume or cancel running conversion jobs. Upon pressing the Convert button, you will see this tab populate with an overall progress bar, which reports overall progress for all files, and an individual progress bar for each file that is being converted or waiting to be converted. When a job completes, its progress bar will disappear and it will be logged in the Completed tab.
The Queue tab is where you can find all files that are waiting to be processed. If you click the “Clear Queue” button, all queued files will be cancelled. The “Cancel All” button will clear the queue in addition to cancelling the files that are currently processing.
The
Completed tab is where the files that have completed processing are logged. All
files will eventually end up in this tab, even if they are cancelled or
encounter an error.
The Warnings tab is where the Canary Data Converter logs any warnings that come up when processing your files. Warnings differ from errors in that they do not prevent the Canary Data Converter from converting the rest of your file.
The
Errors tab is where the Canary Data Converter logs any errors that come up when
processing your files. Errors are problems that prevents the Canary Data
Converter from processing your files. To convert these files, you will need to
resolve these errors and run the Canary Data Converter again on the corrected
files. Errors will only stop processing of the specific file where the error
occurred, and the processing of the rest of files in the batch will continue.
Canary Data Converter has a
number of settings that can be altered in the program’s configuration file. The
configuration file is named “canarydc.ini” and can be found in the folder in
which Canary Data Converter was installed. If you need to alter the
configuration file, you can click the “Open Configuration File” option in the
GUI “File” menu, but note that you must restart Canary Data Converter before
changes will take effect. If you accidentally delete the configuration file or
want to restore it to its default format, select the “Reset Configuration File”
option in the GUI “File” menu. Again, you will have to restart before any
changes will take effect. Both of these options are also available in the CLI
using the --config and --reset-config
options.
Canary Data Converter comes with a cross-platform
command line interface. On Windows, if you open the folder in which Canary Data
Converter was installed, you can run the command line interface using the
CanaryDC-cli.exe executable. To get a full overview of the available options,
use the --help flag.
The main options are listed below:
-h, --help
show this help message and exit
--input-format {}, --in {}, -i {}, --input {} Format to convert from
--output-format {}, -o {}, -out {}, -output {} Format to convert to
--cpu-processes PROCESSES, --process PROCESSES, --processes PROCESSES The number of files to process at a time
--gui, -g Launch the graphical user
interface
--report-progress,
-p, --show-progress, --progress Report on the command line
--open-config-file, --config, -c Open the configuration file
--reset-config-file, --reset-config,
--reset, -r Reset the configuration file to
default
--quiet, -q Suppress all output to standard
out
--version, -v Get Canary Data Converter version
number
All other options are format-specific and can be
found using the --help or -h
flags.
To use Canary Data Converter from the command line
on Mac or Linux, you must have Python 3 and pip installed. To install Python 3
and pip, visit https://www.python.org/downloads/
and install the latest version of Python 3. Make sure that the option in the
installation dialog for Python to install pip is selected. Once installed, run
the following command from the command line:
pip3 install canarydc
After running that command, you should be able to
run Canary Data Converter using the canarydc command with the flags listed above. To get a full
list of the available options, run:
canarydc --help
1) What
is Canary Data Converter?
Canary Data Converter is a free, open-source,
extendable converter intended to help users convert other file formats to the
format used by Canary, a user-friendly information extraction software.
Canary is a free, open-source software program for
extraction of information from narrative text. Canary was designed for users
without software engineering or computer science background. Canary uses a
graphic user interface to guide users through creation of a language model (a
set of rules) that will allow them to extract a concept of interest from
electronic text (e.g. whether or not the patient had a mass reported on their
brain MRI; what their blood pressure was; whether the patient suffered an
adverse reaction to a medication; etc.). Canary was initially designed for use
with medical texts, but can be used outside of medicine as well.
2) How
do I install the Canary Data Converter?
Canary Data Converter can be installed in any folder
or even on a flash drive. You can extract the files from the zip folder into
any folder.
3) What
operating systems does Canary Data Converter run on?
Canary Data Converter should work on any Windows
computer. It has been tested on Windows 7, Windows 8, Windows 10 as well as
Windows Server 2012.
4) Can
I run Canary Data Converter on a Mac or Linux?
Canary Data Converter is written entirely in Python
and is cross-platform. To install Canary Data Converter on Mac or Linux, you
must have Python 3 and pip, the Python package manager, installed on your
computer. To do this, go to https://www.python.org/downloads/
and install the latest version of Python. If there is an option to install pip
in the Python installation dialog, ensure that it is checked.
Once Python and pip are installed, open the
terminal. On a Mac, you can do this by going to the Utilities folder located in
the Applications folder and opening the Terminal application. On Linux, search
for Terminal and open the application. Once you are in Terminal, run the
following command:
pip3 install canarydc
After running that command, you should be all set to
run Canary Data Converter from the command line. To use the graphical user
interface, run the following command:
canarydc –gui
5) Can
I run Canary Data Converter on a cluster?
Canary Data Converter cannot currently make use of
distributed computing systems.
6) Can
I run Canary Data Converter from the command line?
Canary Data Converter does have a command line
interface. On Windows, if you open the folder in which Canary Data Converter
was installed, you can run the command line interface using the
CanaryDC-cli.exe executable. To get a full overview of the available options,
use the --help flag.
The main options are listed below:
-h, --help
show this help message and exit
--input-format {}, --in {}, -i {}, --input {} Format to convert from
--output-format
{}, -o {}, -out {}, -output {} Format to convert to
--cpu-processes PROCESSES, --process PROCESSES, --processes PROCESSES The number of files to process at a time
--gui, -g Launch the graphical user
interface
--report-progress, -p, --show-progress, --progress Report on the command line
--open-config-file, --config, -c Open the configuration file
--reset-config-file, --reset-config,
--reset, -r Reset the configuration file to
default
--quiet, -q Suppress all output to standard
out
--version, -v Get Canary Data Converter version
number
All other options are format-specific and can be
found using the --help or -h
flags.
To use Canary Data Converter from the command line
on Mac or Linux, you must have Python 3 and pip installed. To install Python 3
and pip, visit https://www.python.org/downloads/
and install the latest version of Python 3. Make sure that the option in the
installation dialog for Python to install pip is selected. Once installed, run
the following command from the command line:
pip3 install canarydc
After running that command, you should be able to
run Canary Data Converter using the canarydc command with the flags listed above. To get a full
list of the available options, run:
canarydc --help
7) How
much hard drive space does Canary Data Converter need?
Currently the software requires around 20mb of disk
space.
8) How
fast is Canary Data Converter?
This depends on the number of threads you are using
in parallel to process the text (more threads = faster processing). For
example, Canary Data Converter can process 1 gigabyte of text in approximately
1 minute.
9) Does Canary Data
Converter require a UMLS or any other licenses?
No, Canary does not require any licenses.
Contact us at canary@partners.org
11) Does
Canary Data Converter connect directly to any electronic medical records
systems?
Not at this point. You will need to extract EMR data
from your system manually and process it with Canary Data Converter.
12) Can
Canary Data Converter be used with languages other than English?
Canary Data Converter allows the user to select the
input and output encoding, so it does support languages other than English.
13) How
much does Canary Data Converter cost?
Canary Data Converter is free software provided at
no cost to the user.