Access Keys:
Skip to content (Access Key - 0)

caBIG Model Creation Guide


Table of Contents

Introduction


This guide will walk you through the steps to create an information model using model creation and
semantic annotation tools employed in caBIG.

The development of a caGrid data service that enables access to a database managed by a relational
database system can be accomplished in five main steps:

  1. Creation of an information model. The information model consists of two components, an object
    model and a data model. The object model presents an object-oriented view of the backend
    database and is the Grid-level representation for querying and retrieving data through the data
    service. The data model represents the relational schema of the database. The object model is
    mapped to the data model so that object-oriented queries can be translated into relational database
    queries.
  2. Semantic annotation of the object model. This step is needed to facilitate semantic interoperability.
    The object model allows for syntactic interoperability and programmatic access to the data resource
    (the database) via a common, well-defined, published representation layer. The semantic annotation
    of the object model enables a third-party consumer of the data source to understand the meanings
    of the data elements in the model and correctly consume them.
  3. Harmonization and registration of the information model through the caBIG compatibility process.
    This process ensures the information model is (a) compatible with the common data elements and
    controlled vocabularies employed by the caBIG community, (b) properly re-uses already registered
    data elements for greater interoperability, (c) is annotated with proper concepts from controlled
    vocabularies, and (d) is registered in cancer Data Standards Repository (caDSR) and Enterprise
    Vocabulary Services (EVS) for caBIG-wide availability. We should note that the caBIG compatibility
    process involves registration and review steps carried out by experts and may require changes to
    the initial information model and annotations.
  4. Creation of a data-oriented systems using the caCORE SDK tools. This step creates a client-server
    application that provides support for querying the backend database through an object-oriented
    view expressed by the information model.
  5. Creation of a caGrid data services using the Introduce toolkit. This step creates a data service
    interface with underlying runtime environment and client APIs using the data-oriented system
    generated by the caCORE SDK.

This guide will focus on steps 1 and 2. Specifically, the following tutorial is designed to lead the user
through the steps involved in creating an annotated information model that can be ingested by the
caCORE SDK. The steps involved in creating the model include the following:

  1. creating a domain object model (also referred to as a logical model or domain model)
  2. constructing a data model that matches the schema of the backend database
  3. mapping the logical model to the data model to create a persistent object-relational translation, or
    object-relational mapping.
  4. semantically annotating the model.

Target Audience


This guide is designed for users who would like to understand the use of various tools employed for
developing annotated information models that can be consumed by the caCORE SDK. The guide assumes
that you are familiar with the caBIG caCORE modeling process. More information about this process can
be found at the following URL: caCORE Overview

Required and Recommended Tools


Several tools will be utilized to execute the steps described in Section 1. These tools are listed in this
section.

  • Enterprise Architect version 6.5
    • Enterprise Architect is a UML modeling software system available from Sparx Systems.
      Enterprise Architect is the recommended tool. However, application developers can utilize
      ArgoUML, an open-source, freely available tool, for developing information models in UML.
  • caAdapter MMS version 4.1
    • caAdapter is a caBIG application that makes it easier to map a logical model to a data model.
      caAdapter MMS 4.1 can be launched via a webstart link
  • Semantic Integration Workbench (SIW) version 4.0.1.1
    • Semantic Integration Workbench is a tool that facilitates semantic annotation of your logical
      model. Here is a link to the SIW.

Limitations and Requirements to Note


There are several requirements imposed by the current tooling. It is important for an application or model
developer to meet these requirements, when creating a new model or revising an existing model in UML
for consumption by the caCORE SDK. This will reduce or eliminate possible sources of error during the
implementation of the model. The following list outlines these requirements:

  • Avoid using Java keywords in the UML model. When generating Java code from the UML model, the
    use of keywords can cause elements or packages to be named in unexpected ways. Please see the
    following item on the NCI GForge for a description of the issue: http://gforge.nci.nih.gov/tracker/index.php?func=detail&aid=20104&group_id=25&atid=2252
  • If you want to use ArgoUML for model creation, you must use version 0.24 or version 0.26. ArgoUML
    version 0.28 does not work with the SIW and caCORE SDK. Support for v0.28 is being integrated
    into the tooling and will be available in a future release of the SIW and caCORE SDK.
  • The caCORE SDK programmer's guide does not explain how to create associations for a model. This
    process is explained to some extent in the SIW and UML Loader v4.0.1 documentation at https://gforge.nci.nih.gov/docman/view.php/16/17244/caCORE_SIW_UMLLoaderGuide_v401v2.pdf. In
    particular, the documentation describes what parts of the association are necessary to include in
    the UML model, such as role names and multiplicity. In the following sections, we will explain how
    associations can be created in a model.
  • Long is a valid caCORE SDK data type, although it is not clearly described so in the caCORE
    SDK programmer's guide. The "Long" data type is not defined in the ArgoUML template model
    (SDKArgoTemplate.uml) which is part of the caCORE SDK distribution. It is defined in the sample
    model (sdk.uml) in the caCORE SDK distribution. The Long class is defined in the Model/Logical
    View/Logical Model/java/lang package in that model. It is also referenced as a type in the
    'longValue' attribute of the gov.nih.nci.cacoresdk.domain.other.datatype.AllDataType class located
    under the Model/Logical View/Logical Model/gov/nih/nci/cacoresdk/domain/other/datatype package.
    Once you locate the AllDataType class, select the 'Properties' tab. The Long class should be shown
    as the 'Type'.

Creating an Annotated Information Model


Creating a UML Project

The caCORE SDK requires that the UML model be developed in a specific package layout in a UML project.
There are two ways to create a UML project with the required package layout in Enterprise Architect (EA).
The preferred way is to use an EA project template provided in the caCORE SDK distribution. Alternately,
a new EA project and the required layout can be created manually. There are some advantages to each
approach, which are discussed below.

Using Template

This method is the recommended way to set up a new project in EA. The advantage to this approach is
the minimization of errors in the layout. It also involves less effort and allows the model building process
to proceed quickly.

Start by downloading a project template for EA from an attachment on this page, the EA SDK Template.
Alternatively, the template can be found in the caCORE SDK distributed, which can be downloaded fromhttp://gforge.nci.nih.gov/frs/?group_id=148.

After obtaining the template, start EA and select "File" from the menu, and choose the "Open Project"
entry. Choose the downloaded template from the location where it was saved.

When the template project is loaded, there are a few things to note. The template comes with some of
the needed package defined in the "Project Browser" pane. The "Logical Model" package will hold the
domain classes for the model, while the "Data Model" package will hold all of the data classes for the
model. The template also defines some of the common Java classes used in models, such as String,
Integer, Long, etc. The following image shows this new project.

Using the template is a viable approach if you are going to create a new UML model. However, if you
are starting with an existing UML model, the manual approach discussed next may prove to be a more
suitable approach.

Creating Packages Manually

As noted above, manual creation of the UML project and packages may be a viable approach when
starting with an existing model. However, the existing model needs to be imported into the Enterprise
Architect project in a specific way, to make it compatible with caCORE SDK.

To begin, create a new project in Enterprise Architect by using the "File" menu item, and selecting the
"New Project" item (alternatively, use the Ctrl+N keyboard shortcut). Use the file save dialog to save this
new project file in a location of your choosing.

  • With the new project in place, use the "Project Browser" view in EA (this view can be brought up
    using the "View" menu, and choosing the "Project Browser" item).
  • In that pane, right-click on the "Model" node of the tree, and select "New View".
  • Name this new view "Logical View" in the configuration window that pops up, and choose "Class
    View" as the icon style.
  • Click on the "Logical View" node, and select the "Add" item, and choose "Add Package" from the
    options. Create two packages, named "Data Model", and "Logical Model".

The existing UML model hierarchy should be imported into the "Logical Model" package. The "Data Model"
will be the data model, derived from the logical model. See the image in the "Using the template" section
above to see the completed layout.

Java Classes

The caCORE SDK requires that Java classes be used as the types of attributes in the domain model
classes. The model template provided by the caCORE SDK distribution contains the Java class package as
shown in the figure below.

If the model is being developed without the aid of the template, the Java class package should be created
as shown in the figure.

Creating Logical Model


With the project and the basic package layout having been created, we can begin to create the domain
model.

First, we need to create a model hierarchy, just like a Java class hierarchy. In this guide, the hierarchy will
be gov.nih.nci.training.BootCamp.domain.

  • To begin, right-click on the "Logical Model" package in the "Project Browser" pane, and select "Add"
    and then "Add Package" from the options.
  • In the pop-up window that is presented, enter "gov" and click the "OK" button. Now, right-click on
    the "gov" package, and create a new sub-package named "nih". Repeat this process to create the
    entire hierarchy.

The image below shows the hierarchy, expanded in the "Project Browser" section of the EA window.

The "domain" package will hold the elements in our domain model.

  • Create a diagram to visually represent all of the elements. Right-click on the "domain" package, and
    select "Add".
  • From the given options, choose "Add Diagram". For the name, enter "Logical Model". The other
    options can be left as the defaults. The following image shows the pop-up window used to configure
    the new diagram.

The created diagram will be shown in the main portion of the EA window.

  • Right-click anywhere on the blank diagram and select "Create Element or Connector".
  • From the options, choose "Class".
  • In the configuration window that pops up, enter "ActiveSite" in the "Name" field. The other options
    can be left as the default.

The following image shows this configuration window.

After adding the "ActiveSite" class, we will add another class, called "BindingSite". Follow the example
above to create the class. After that addition, the diagram should look like the following image.

Now, we will create attributes in a class in the UML model.

NOTE: we will use the Java classes from the
Java class package to define the types of the attributes.
* To begin, create a new class called "ProteinFeature", which will be the parent class for classes
"BindingSite" and "ActiveSite".

  • Once the class has been created, right-click on it in the diagram, and select "Attributes".
    In the configuration window that pops up, use the following values for the fields: "Name" is "id",
    "Type" is "Long", and "Scope" is "Protected". The "Notes" field is used to describe the class. For
    example, the "ActiveSite" element can be described as follows: "Amino acid(s) involved in the
    activity of an enzyme.".
  • When done, click the "Save" button to add the attribute to the class. The image below shows this
    input form.

Similarly, add the following attributes to the "ProteinFeature" class:

Name Type Scope
begin Integer Protected
end Integer Protected
description String Protected

After adding the attributes, the model should look like the image below.

We should now define the relationship (association) between "ActiveSite", "BindingSite", and
"ProteinFeature".

  • Select the "Generalize" tool from the toolbox pane.
  • Click and drag the pointer in the diagram, from "BindingSite" to "ProteinFeature".
  • Repeat the process for "ActiveSite".

The result should look like the following image.

Classes in a model can be associated with each other using one-to-one, one-to-many, or many-to-many
relationships. As an example, we can create a many-to-many relationship between the "ProteinFeature"
element, and another element named "Protein".

  • Start by creating a new element per the steps above, and name it "Protein".
  • Choose the "Associate" tool from the toolbox on the left.
  • Click on the "Protein" element with the "Associate" tool selected and drag the association line over
    to the "ProteinFeature" element.
  • To define the properties of the association, double-click on the association line in the diagram. This
    will open a configuration screen.
  • In the first tab, labeled "General", set the "Link Name" field to "(Protein -> ProteinFeature)" and the
    "Direction" field to "Bi-Directional".

The image below shows the tab with relevant information.

There is nothing to fill out on the next tab, "Constraints". Click on the "Source Role" tab, fill in
"proteinCollection" for the "Protein Role" field, and choose "0..*" from the "Multiplicity" drop-down.

The image below displays this "Source Role" tab.

Similarly, in the "Target Role" tab, fill in "proteinFeatureCollection" for the "ProteinFeature Role" field, and
"0..*" from the "Multiplicity" drop-down.

The image below shows the diagram after the association has been configured.

Another example of an association is the 1-to-1 association. To model this, we repeat the process above,
with a different value in the "Multiplicity" drop-down for the source and target roles. For our example, we
create an element named "ProteinSequence", and draw an association from the "Protein" element, to the
new "ProteinSequence" element.

  • Double-click on the association and fill in "(Protein -> ProteinSequence)" as the association name,
    and choose "Bi-directional" as the "Direction" type.
  • In the source role tab, choose "protein" as the name, and "1" in the "Multiplicity" drop-down.
  • For the target role tab, use "proteinSequence" as the name, and "1" in the "Multiplicity" drop-down.

The image below shows this new association in the diagram.

Using the method described in this section, the rest of the model can be built. Please refer to the
completed Boot Camp model for the rest of the logical model information.

Creating Data Model


In this section, we will add in the data model.

The first step is to create a diagram under the "Data Model" element in the "Project Browser" pane of the EA window.

  • Right-click on the "Data Model" element, and select "Add", then "Add Diagram" option.
    • Name the diagram "Data Model", the same as the top level folder.
    • In the diagram, right click and select "Create Element or Connector", and then select "Table".
    • In the form that pops up, fill in "ACTIVE_SITE" for the name, and and select "Oracle" as the
      database type.

The image below shows the configuration dialog box.

  • To define a column in the new table which has been created, right-click on the table in the diagram,
    and select "Attributes".
  • In the dialog box that pops up, enter "ACTIVE_SITE_ID" as the "Name", "Number" as the "Data
    Type", "38" as the "Precision", and "0" as the "Scale". This column will be our primary key, so click
    the checkbox next to "Primary Key", which should also automatically check the "Not Null" checkbox.
  • Click "Save" to add the column to the table.
    The image below shows this completed form.
  • Similarly, add in a table for "BINDING_SITE", with a "BINDING_SITE_ID" column.
  • Create a table for "PROTEIN_FEATURE", and add in the primary key "PROTEIN_FEATURE_ID".

With this table, we need to add in columns for the other attributes in our logical model. The following
table has the appropriate values for those columns.

Name Type Access Additional
BEGIN Number Protected Precision=8, Scale=2
END Number Protected Precision=8, Scale=2
DESCRIPTION VARCHAR2 Protected Length=250

After the elements have been added, the diagram should look like the image below.

Mapping Object Model to Data Model


In this section, we will set up the mapping between the data model and the logical model. We assume
that at this step you have completed the entire boot camp data and logical models.

To map the data model to the logical model, we will use the caAdapter Model Mapping Service tool. You can download the tool from http://ncicb.nci.nih.gov/download/. On the download page, click on the caAdapter download link. After agreeing to the license terms,
the next page will link to the caAdapter Model Mapping Service application zip file. Extract the zip file
contents to a location of your choice.

The caAdapter MMS application cannot consume models in the .EAP type file that Enterprise Architect
uses, so we must export the models in an XMI format. Right-click on the "Logical View" entry in the
Project Browser pane of the EA window, and select "Export Model to XMI".

Note: It is important to use
the "Logical View" node to export, rather than the top level package; incompatibilities will cause errors
in caAdapter.
EA will pop up a dialog box to configure the export. The options can be left as the defaults.
Simply choose the location of the exported XMI file.
In order to take full advantage of the caAdapater MMS application, we need to download a library
manually and add it to caAdapter MMS. The library is JGraph, an open source graphics library.
To download JGraph, visit http://www.jgraph.com. Click on the "Downloads" link, and then click on
the JGraph link (Note: do not use the new JGraph X package, as it is incompatible). The JAR file is an
installer that can be used to install JGraph into a location of your choice.

  • Run the installer by opening a command shell, and typing "java -jar <DOWNLOAD_LOCATION>/
    jgraph-latest-lgpl-src.jar".
  • Select a temporary location to install the application, as it is only needed until we can get the
    necessary library.
  • Copy the jgraph.jar from the <JGRAPH_INSTALL_DIR>/lib/jgraph.jar location to
    <CAADAPTER_DIR>/lib/jgraph.jar location.

Now we are ready to use caAdapter MMS to map the data model to the domain model. caAdapter shows
the domain model in the left-hand pane, and the data model on the right side. To map two entities, you
first need to create a link between the entities. For example, click on 'ActiveSite' in the domain model,
and drag the mouse over to 'ACTIVE_SITE' in the data model, and release the mouse button. This will
create a top-level association. Click on the 'id' field under AciveSite in the domain model, and drag the
pointer over to the 'ACTIVE_SITE_ID' in the data model. This will link the two attribute to each other.

In the image below, you will see how the three entities that were created earlier in the tutorial, are
mapped. Notice that because of the generalization relationship from ActiveSite and BindingSite to
ProteinFeature, caAdapter dispays "(A - derived)" next to the derived features in ActiveSite and
BindingSite. These attributes do not have to be mapped. They are instead mapped in ProteinFeature, the
parent of the generalization relationship.

After associating all of the entities, save the XMI file in caAdapter MMS. We will import this modified
XMI file into the Semantic Integration Workbench application to annotate the model with semantic
information. This step is discussed in the next section.

Semantic Annotation of Model


This section will use the Semantic Integration Workbench (SIW) application to mark up the model with
semantic information.

Start the SIW using Java Web Start by visiting http://cadsrsiw.nci.nih.gov/. When the application starts, a
list of possible steps will be presented. We begin with step 1, "Review Unannotated XMI File".
The startup window is shown in the image below.

After selecting "Step 1 (Review Unannotated XMI File)", you will be presented with a file selection dialog.
Select the XMI file that was written out by Enterprise Architect during the export of the UML model. Leave
the "Choose Classes and Packages" option unchecked. The image below shows the file selection dialog
window.

When the application is finished parsing the file, the left pane of the window will display the elements
declared in the UML file, the bottom right pane of the window will display an informational or error
messages related to the model, and the top right pane will display any semantic information related to
the selected element. If there are any errors (i.e. missing "description" tags), the model should be edited
in EA to fix the errors, and re-exported. Repeat until the model does not have any errors. The image
below shows an unannotated ActiveSite element.

To annotate the model, re-run the SIW and select step 3, "Run Semantic Connector". When the second
screen is presented, select the XMI file that has been verified by SIW's step 1. Clicking the "Next" button
will begin processing the XMI file. The SIW will connect to the Data Standards Repository (caDSR), and
try to connect each element in the model to a possible Command Data Element (CDE), and produce a file
named "FirstPass_<your_file_name>.xmi". This file will contain the model, with any possible annotations
that were found by SIW.

To see how the SIW updated the model, start SIW once again, and choose step 5, "Review Annotated XMI
File". The "ActiveSite" class should now be properly annotated, using NCI Thesaurus concepts.
The image below shows the annotated element.

Now, let us suppose that SIW can not find the "ActiveSite" element while running the Semantic Connector
step. The following steps describe how to add the semantic information by manually searching the
Enterprise Vocabulary System (EVS) for the concepts. To add semantic information to the model, click the
"Add" button. This will bring up a search interface for the Enterprise Vocabulary System (EVS). The image
below displays the search interface.

Clicking on the "Search EVS" button will bring up a new window, which allows you to search for a
semantic concept for your element via several means. For example, you may enter "active site" in the
search field, and search by "Synonyms". The image below shows the results for this type of search.

Double clicking on a result in the SIW search interface will select that concept as the semantic
information to use for the selected element. The "Concept Code", "Concept Preferred Name", "Concept
Definition", and "Concept Definition Source" fields should be filled out. Check the image above that shows
the completed annotation to verify that the results are similar.

After all of the elements have been marked up, save the XMI file from SIW. The XMI file is now annotated
for use by caCORE SDK, to generate services.

Next Steps

This tutorial used classes, tables, and semantic annotations as defined in the caBIG bootcamp model used in Tutorial 3b: caBIG Developer Boot Camp for caCORE SDK 4.1.1. You will find the finished model provided in the caBIG developer bootcamp tutorial.

Last edited by Clayton Clark (4 days ago), ...
Adaptavist Theme Builder Powered by Atlassian Confluence
Free theme builder license