Canary Data Converter Help

Updated: October 2017
Email: canary@partners.org
Web:
http://canary.bwh.harvard.edu/

Thank you for using Canary Data Converter. This file includes the complete documentation for using the software. For further assistance or bug reports, please contact us via the email address above.

·       What’s new? Canary Data Converter Version History


Table of Contents

Canary Data Converter Help. 1

A) Introduction. 1

B) Starting Canary Data Converter 1

C) Input Tab. 1

D) Output Tab. 1

E) Convert Tab. 1

F) Converting Your Files. 1

G) Configuring Canary Data Converter 1

H) The Command Line Interface. 1

Frequently Asked Questions. 1

 


A) Introduction

The Canary Data Converter was created to facilitate conversion of narrative electronic data in different formats to the required format for Canary, a user-friendly information extraction tool designed to extract information from text using user-defined rules.


B) Starting Canary Data Converter

To install Canary Data Converter, extract the files from the zip folder. Canary Data Converter can be installed (i.e. unzipped) into any directory on the computer. The directory can be moved anywhere else on the computer after Canary Data Converter is unzipped. Canary Data Converter can also work from a flash drive or other portable media.

After installation, instructions for starting the software can be found in a file called “HOW TO RUN” in the installation folder. This file also contains other relevant instructions, such as how to create additional shortcuts.

Canary Data Converter is compatible with Windows 7, Windows 8, Windows 10, and Windows Server 2008/2012/2016. It is compatible with both 32-bit and 64-bit architectures.


C) Input Tab

The first step in converting your medical records is to select an input format. Currently, the Canary Data Converter supports plain text files in four formats: undelimited plain text, delimited plain text, RPDR format, and Epic Text Format. RPDR format can be used to process files created by Research Patient Data Registry at Partners HealthCare. Epic Text format can be used to process files created by Epic electronic medical record software.

Upon selecting your format, you will need to select the input source. Currently, the Canary Data Converter supports three input types: a single file, all files contained in a folder, or all files contained in a folder and its subfolders. Choose the appropriate option for your files.

Each input format has unique, user-customizable properties that the Canary Data Converter uses to read in your input files. If you are unsure about what a particular option means, you can click the Help button in the lower left corner of the program to open the in-app help window, which will explain each of the options for your chosen input format.

 


D) Output Tab

After defining your input options, you will be allowed to click the Next button to move on to the Output tab. The purpose of this tab is to set up the settings for the output Canary Data Converter will produce.

First, you will have to select your desired output format. Currently, the Canary Data Converter supports plain text files in four formats: undelimited plain text, delimited plain text, and Canary format.

Upon selecting the output format, you will need to choose your desired output location. Currently, the Canary Data Converter only supports writing to a folder.

After selecting your output location, you must define your output options. Like the input options, each output format has unique, user-customizable properties that the Canary Data Converter uses to write your output files. If you are unsure about what a particular option means, you can click the Help button in the lower left corner of the program to open the in-app help window, which will explain each of the options for your chosen output format.


E) Convert Tab

After defining your output options, you will be able to click the Next button and move on to the Convert tab.

Before converting your files, you can choose how many files the Canary Data Converter should process at a time. By default, the converter will simultaneously process one less file than the total number of CPU cores on your machine. This is so that there will be an extra core available to prevent your machine from becoming sluggish as you use other programs. Should you decide to spawn more processes than the number of CPU cores available on your machine, you can type in any number of process into the dropdown box, however this is not recommended.

After selecting the number of files to process, you are ready to convert your files.


F) Converting Your Files

When you are on the Convert tab, you will notice that the Next button now says Convert. To begin converting your files, you must press this button. Upon starting the conversion, this button will say Cancel All. You can press this button at any time during the conversion to stop converting all files. After cancelling a conversion, however, you may not resume converting.

The Convert tab has four sub-tabs: Conversion, Queue, Completed, Warnings, and Errors.

Conversion

The Conversion tab is where you can see progress for running conversion jobs, as well as where you can pause, resume or cancel running conversion jobs. Upon pressing the Convert button, you will see this tab populate with an overall progress bar, which reports overall progress for all files, and an individual progress bar for each file that is being converted or waiting to be converted. When a job completes, its progress bar will disappear and it will be logged in the Completed tab.

Queue

The Queue tab is where you can find all files that are waiting to be processed. If you click the “Clear Queue” button, all queued files will be cancelled. The “Cancel All” button will clear the queue in addition to cancelling the files that are currently processing.

Completed

The Completed tab is where the files that have completed processing are logged. All files will eventually end up in this tab, even if they are cancelled or encounter an error.

Warnings

The Warnings tab is where the Canary Data Converter logs any warnings that come up when processing your files. Warnings differ from errors in that they do not prevent the Canary Data Converter from converting the rest of your file.

Errors

The Errors tab is where the Canary Data Converter logs any errors that come up when processing your files. Errors are problems that prevents the Canary Data Converter from processing your files. To convert these files, you will need to resolve these errors and run the Canary Data Converter again on the corrected files. Errors will only stop processing of the specific file where the error occurred, and the processing of the rest of files in the batch will continue.


G) Configuring Canary Data Converter

Canary Data Converter has a number of settings that can be altered in the program’s configuration file. The configuration file is named “canarydc.ini” and can be found in the folder in which Canary Data Converter was installed. If you need to alter the configuration file, you can click the “Open Configuration File” option in the GUI “File” menu, but note that you must restart Canary Data Converter before changes will take effect. If you accidentally delete the configuration file or want to restore it to its default format, select the “Reset Configuration File” option in the GUI “File” menu. Again, you will have to restart before any changes will take effect. Both of these options are also available in the CLI using the --config and --reset-config options.

 


 

H) The Command Line Interface

Canary Data Converter comes with a cross-platform command line interface. On Windows, if you open the folder in which Canary Data Converter was installed, you can run the command line interface using the CanaryDC-cli.exe executable. To get a full overview of the available options, use the --help flag. The main options are listed below:

  -h, --help            show this help message and exit

  --input-format {}, --in {}, -i {}, --input {}                                                                  Format to convert from

  --output-format {}, -o {}, -out {}, -output {}                                                              Format to convert to

  --cpu-processes PROCESSES, --process PROCESSES, --processes PROCESSES               The number of files to process at a time

  --gui, -g                                                                                                                          Launch the graphical user interface

  --report-progress, -p, --show-progress, --progress                                                       Report on the command line

  --open-config-file, --config, -c                                                                                     Open the configuration file

  --reset-config-file, --reset-config, --reset, -r                                                                Reset the configuration file to default

  --quiet, -q                                                                                                                       Suppress all output to standard out

  --version, -v                                                                                                                   Get Canary Data Converter version number

All other options are format-specific and can be found using the --help or -h flags.

To use Canary Data Converter from the command line on Mac or Linux, you must have Python 3 and pip installed. To install Python 3 and pip, visit https://www.python.org/downloads/ and install the latest version of Python 3. Make sure that the option in the installation dialog for Python to install pip is selected. Once installed, run the following command from the command line:

               pip3 install canarydc

After running that command, you should be able to run Canary Data Converter using the canarydc command with the flags listed above. To get a full list of the available options, run:

               canarydc --help


Frequently Asked Questions

1)      What is Canary Data Converter?

Canary Data Converter is a free, open-source, extendable converter intended to help users convert other file formats to the format used by Canary, a user-friendly information extraction software.

Canary is a free, open-source software program for extraction of information from narrative text. Canary was designed for users without software engineering or computer science background. Canary uses a graphic user interface to guide users through creation of a language model (a set of rules) that will allow them to extract a concept of interest from electronic text (e.g. whether or not the patient had a mass reported on their brain MRI; what their blood pressure was; whether the patient suffered an adverse reaction to a medication; etc.). Canary was initially designed for use with medical texts, but can be used outside of medicine as well.

2)      How do I install the Canary Data Converter?

Canary Data Converter can be installed in any folder or even on a flash drive. You can extract the files from the zip folder into any folder.

3)      What operating systems does Canary Data Converter run on?

Canary Data Converter should work on any Windows computer. It has been tested on Windows 7, Windows 8, Windows 10 as well as Windows Server 2012.

4)      Can I run Canary Data Converter on a Mac or Linux?

Canary Data Converter is written entirely in Python and is cross-platform. To install Canary Data Converter on Mac or Linux, you must have Python 3 and pip, the Python package manager, installed on your computer. To do this, go to https://www.python.org/downloads/ and install the latest version of Python. If there is an option to install pip in the Python installation dialog, ensure that it is checked.

Once Python and pip are installed, open the terminal. On a Mac, you can do this by going to the Utilities folder located in the Applications folder and opening the Terminal application. On Linux, search for Terminal and open the application. Once you are in Terminal, run the following command:

               pip3 install canarydc

After running that command, you should be all set to run Canary Data Converter from the command line. To use the graphical user interface, run the following command:

               canarydcgui

 

5)      Can I run Canary Data Converter on a cluster?

Canary Data Converter cannot currently make use of distributed computing systems.

6)      Can I run Canary Data Converter from the command line?

Canary Data Converter does have a command line interface. On Windows, if you open the folder in which Canary Data Converter was installed, you can run the command line interface using the CanaryDC-cli.exe executable. To get a full overview of the available options, use the --help flag. The main options are listed below:

  -h, --help            show this help message and exit

  --input-format {}, --in {}, -i {}, --input {}                                                                  Format to convert from

  --output-format {}, -o {}, -out {}, -output {}                                                              Format to convert to

  --cpu-processes PROCESSES, --process PROCESSES, --processes PROCESSES               The number of files to process at a time

  --gui, -g                                                                                                                          Launch the graphical user interface

  --report-progress, -p, --show-progress, --progress                                                       Report on the command line

  --open-config-file, --config, -c                                                                                     Open the configuration file

  --reset-config-file, --reset-config, --reset, -r                                                                Reset the configuration file to default

  --quiet, -q                                                                                                                      Suppress all output to standard out

  --version, -v                                                                                                                   Get Canary Data Converter version number

All other options are format-specific and can be found using the --help or -h flags.

 

To use Canary Data Converter from the command line on Mac or Linux, you must have Python 3 and pip installed. To install Python 3 and pip, visit https://www.python.org/downloads/ and install the latest version of Python 3. Make sure that the option in the installation dialog for Python to install pip is selected. Once installed, run the following command from the command line:

               pip3 install canarydc

After running that command, you should be able to run Canary Data Converter using the canarydc command with the flags listed above. To get a full list of the available options, run:

               canarydc --help

7)      How much hard drive space does Canary Data Converter need?

Currently the software requires around 20mb of disk space.

8)      How fast is Canary Data Converter?

This depends on the number of threads you are using in parallel to process the text (more threads = faster processing). For example, Canary Data Converter can process 1 gigabyte of text in approximately 1 minute.

9)       Does Canary Data Converter require a UMLS or any other licenses?

No, Canary does not require any licenses.

10)  How can I get support?

Contact us at canary@partners.org

11)  Does Canary Data Converter connect directly to any electronic medical records systems?

Not at this point. You will need to extract EMR data from your system manually and process it with Canary Data Converter.

12)  Can Canary Data Converter be used with languages other than English?

Canary Data Converter allows the user to select the input and output encoding, so it does support languages other than English.

13)  How much does Canary Data Converter cost?

Canary Data Converter is free software provided at no cost to the user.