Enterprise Content Management Integration – ECM

There are many solutions involving BPM that involve working with document and file content such as Microsoft Word, PDFs, Excel spreadsheets, image files and other pre-existing information items. Instances of document types such as these may be needed as part of the handling of a process instance. Consider an insurance claim for a car accident. This may involve an initial claim form, a police report, pictures of the accident and faxes from garages with quotes for repair. To process a claim, a clerk may work through a process instance which must make these documents available. Obviously these documents have to be stored somewhere within an IT system and this is where the category of software product called an Enterprise Content Management (ECM) system comes into play. An ECM is a software product that explicitly manages the existence, organization, search and retrieval of instances of documents. A variety of vendors make a variety of such ECMs. These include:


In addition, the BPM product itself provides an in-built document repository.

In this section we will discuss the support provided by IBM BPM for integrating with ECMs such that content from documents can be managed as part of the process solution.

See also:

Content Management Interoperability Services – CMIS

If we consider what we want from an arbitrary ECM we will find a common set of required functions. Capabilities that we obviously want are:


Historically, each provider of an ECM supplied their own sets of APIs to be able to interact with it. Each API was perfect for interacting with a specific brand of ECM however if we wished to change ECM providers, we would be stuck with the job of re-coding our applications to use the different APIs. What we need is an API that is an industry standard that each vendor that implements an ECM can expose. If we can define such an API then we need only write our application that interacts with an abstract ECM once and then simply choose an ECM provider to use based on other qualities such as performance, capacity, price or other features.

Content Management Interoperability Services (CMIS) is an open standards protocol from the standards body called OASIS. CMIS provides a vendor neutral API for interacting with ECMs. CMIS is what IBM BPM uses to interact with back-end ECM systems thus insulating BPM from any single provider. A number of ECM's support this standard including:


CMIS provides the following functions and here we indicate which ones are supported by IBM BPM:


See also:


Document Types

When we think of a file on the file system of an operating system, we think of it as containing content and having a name. There are other attributes of files such as their owner and the date/time they were last modified. However, that is about the limit of the meta data of a file. Documents held within an ECM are different from a simple file. When a document is created within the ECM, we tell the ECM what "type" of document it is. By doing this, we have implicit associated a type template with that document and this is very valuable. By associating the template with the document, we can work with it in a richer manner.

For example, if we create a document which is my tax return for 2015, then we can imagine that if we viewed it, we might see a PDF of a 1099ez (A US tax form). However, if when we create the document within the repository, we explicit state that the document is of "type" 1099ez, we may have defined a model of what that contains. For example, my social security number, my salary, my dependents and other meta information related to the document. By doing that, one can now retrieve such documents using richer search criteria. For example, to retrieve all my tax filings for the last few years, one could formulate a query such as:

select all documents of type 1099ez where the socialSecurityNumber="123-45-6789"

When using FileNet or IBM BPM Content Store, one can use the acce tool to examine the document types to see what are available and to validate that a document type you created is where it is expected to be.

See also:

Configuring IBM BPM for an external ECM

In order to integrate with an external ECM system using CMIS, you must define a connection to it within your Process Application. Within Process Designer, within the Process App Settings, you will find a tab called Servers. We create a new definition here ensuring the "Type" is "Enterprise Content Management Server".

A definition of this type has a number of attributes associated with it:

Hostname – The host-name or TCP/IP address on which the ECM server can be found.

Port – The TCP/IP port number on the host-name where the ECM server is listening for requests it should perform on behalf of the BPM system.

Context Path – The relative URL part of the context path on which the ECM server is willing to accept Web Service requests for ECM work.

Secure Server – Whether or not the HTTPS protocol should be used for communications. HTTPS provides transport level security.

Repository – The name of the repository.

Userid – The userid used to connect to the ECM server.

Password – The password for the user used to connect to the ECM server.

The "Test Connection" button may be used to validate the information entered. If all is well, the following dialog will be shown:

Here is a sample Alfresco definition:

Here is a sample for FileNet:

In order to use ECM functions within a BPM solution, we must add an IBM supplied toolkit as a dependency to our application. Add the content management toolkit which is called "Content Management" which has an acronym ID called "SYSCM":

The use of the same user for multiple ECM connections was considered a security vulnerability and has been augmented with a new service that is described in this technote. The vulnerability has been logged as CVE-2015-1904.

The Document List Coach View

After having defined a BPM process app with knowledge of how to connect to an ECM, we would next turn our attention to thinking about how a user would see managed documents from our solution. IBM has supplied a number of Coach Views that perform such services.

To show documents to the user from a Coach, you can add a "Document List" Coach View to the Coach. From the palette of controls, this looks as follows:

When added to the Coach, it looks like:

It has quite a few configuration parameters:

Open in new window – If selected and if a request to show the content of the document is made, then the content is shown in its own browser window. If we don't check this option and wish to show the content of the document then we will have to use a Document Viewer.

Refresh Trigger – A boolean valued variable which, if it becomes true causes the content of the list to refresh and then toggles the value of the variable back to false.

Number of results to show – The number of results for a document search to show in a page. The default is 10.

Show all content – If not selected (default) then the result will show a fixed number of rows on the page and paging will be enabled. If selected, then the fixed number of rows will still show but paging will not be enabled.

Configure for use with – Select the type of document provider. There are two choices:

BPM document options – When the type of repository is the BPM Document Store, these additional options apply:

Display options – This section acts as a filter to be used when displaying items in the list

Associated with process instance – By default, all documents are shown. Selecting this option causes only documents created within the current process instance to be shown.

Display match rule – A selection of how matches are made. There are three options:


Display properties – A list of name and value pairs where the name is that of a property of a document and value is its corresponding value. These display properties are used in conjunction with the "Display match rule" option to filter the set of documents shown to the user.

Upload options

Default upload name – Default name for an uploaded document. If supplied, the "User editable" property defines whether or not the user can alter this value.

User editable – Defines whether or not the user can edit the name of the document to be created.

Add properties – Should additional properties be added during an upload. If yes, those properties are defined in the "Upload properties" list.

Upload properties

Hide in Portal – If selected, the document will not be visible in the document list.

A BPM data type called BPMDocumentOptions is defined that models the settings just described. This data type contains:

displayOptions (BPMDocumentDisplayOptions)


uploadOptions (BPMDocumentUploadOptions)


A JavaScript initializer for this data type might be:

{ displayOptions: { associatedWithProcessInstance: false, displayMatchRule: "NONE", // Other choices are "ANY" and "ALL" displayProperties: [] }, uploadOptions: { defaultUploadName: "", userEditable: false, addProperties: false, uploadProperties: [], hideInPortal: false } }

ECM document options – An instance of CMISDocumentOptions. This data structure has two properties:

Search (user implementation required for ECM documents) – A service which returns the documents to be shown to the user. The default is the IBM supplied service called "Default ECM Search Service". See Document List – Search service.

Get all document versions – A script which gets the versions of each document retrieved

Get document – A script which retrieves an individual document

Get type definition – A script which retrieves the types of all the documents retrieved.

Get type descendants – A script which retrieves the types of documents.

But what does this Coach View look like when it is shown within a Coach? The following is an example screen shot:

What is displayed is a table containing details of the documents that are found within the ECM. If there are more documents than can be display in the maximum allowable size, the table will provide paging (if desired). As mentioned, the selection of columns included in the table can be customized.

In the "Actions" column are click-able icons that can provide functions to be performed against the document that a row represents. The magnifying glass icon allows us to view the document's content in an associated content viewer widget.

The pencil icon allows us to edit the properties of the document. When clicked, we can change the properties or upload a new version of the content. This option is only available if the "Allow update" configuration option is checked.

The final icon in the Actions column allows us to view the previous revisions of the document. The revision list is shown in the same view:

Also in the view are two further buttons associated with the view as a whole. The first is called "Refresh" and simply refreshes the view from the content managed by the ECM. If new documents have been added, or old ones deleted the view will reflect those changes after a refresh. The second button is called "Create Document".

Here we can provide a number of parts including the type of document, the name we wish to give to the document and a local file whose content will be uploaded and associated with the new document. The newly created document will be created in the ECM folder defined by the "Parent folder path" configuration attribute. The ability to create new documents is only available if the "Allow creates" configuration option is selected.

The Coach View can be bound to a variable of data type "List of ECMDocumentInfo". This list will be populated with information on the retrieved items. The selected item in the list will change when the user selects a document to view.

See also:

Document List – Search service

When the Document List Coach View loads, it has to get the list of documents from the ECM that it will use to display to the user. It performs this task by delegating it to a Service defined by the Search configuration parameter. The Search service is used to return the list of documents to be shown to the end user. For external ECM providers, the Search service must be implemented, there is no default. The Search service can be easily created by clicking the "New" button in the Document List configuration parameters.

However, when using Client Side Human Services, the "New" button is not present.

That isn't good. The solution is to create a temporary Heritage Human Service and add the Document List Coach View into a Coach. From the Heritage Human Service, you will continue to find the "New" button which can be used to create the correct Ajax service. Once done, you can go back to the Client Side Human Service and pick the one you just created. The temporary Heritage Human Service can then be deleted. An alternative would be to copy the service called "Default ECM Search Service" from the "Content Management" toolkit into your local process app and rename/modify that copy. This service can be found in the "User Interface" category as the service is an Ajax service.

The new Ajax service with the correct signature will look as follows:

It is your responsibility as the designer of this service to return a searchResult object from your execution. To achieve this, add a Content Integration component into the service as shown:

In its implementation settings, select your server and the "Search" operation:

Map its required parameters from the signature of the service you are building:

In order to enable the Actions within the list's Action column for the Document List shown rows, the search must include the "cmis:objectId" as one of the returned items. This is used as the key to perform the action. This is added by default unless you have chosen to build your own CMIS query. Failure to return this query will result in the actions grayed with a helpful tool-tip hint shown here:

See also:

Document List – Search Service – Properties

The properties of a document can be found in the "Properties" attribute of CMIS. It seems that the individual properties don't appear by themselves as top level properties but are rather encoded within this single property.

Document List – Search Service – The provided cmisQuery parameter

You will have noticed that one of the parameters passed into the service is called "cmisQuery". This is a string which contains the cmisQuery that reflects the settings made within Process Designer. If we ignore this cmisQuery, we ignore the settings that we made. It is odd to note that the raw settings (an instance of BPMDocumentOptions) is not supplied to the service. Since this is not supplied, we don't have the information needed to factor these settings into our own search. One workaround (read "hack") is to understand the structure of the generated cmisQuery and hope that it remains constant from release to release. If this is the case, we can string manipulate the supplied cmisQuery to add our own properties and columns.

The Document Viewer Coach View

A companion Coach View to the Document List is the Document Viewer. This Coach View's purpose is to act as a container for showing the content of a selected document. From the palette, this Coach View looks like:

When dropped on a Coach canvas, it looks like:

The Document Viewer Coach View has no configuration options. However, it should be be bound to a variable of type "ECMDocumentInfo" that should be a variable also bound to a Document List Coach View. The recommended way is to bind to the "listSelected" element in the Document List Coach View bound list. When the user selects a document in the Document List Coach View, the selected entry in the list changes which is noticed by the Document Viewer Coach View and is the trigger to show the new content.

The height of the Document Viewer is not likely to be the size you want. This can be fixed by changing the style:

.classDocViewFrame { height: 500px; }

See also:

The Document Explorer Coach View

With the arrival of IBM BPM 8.5.5, a new addition was made to the set of Coach Views that are provided to interact with the IBM BPM content management system. The new Coach View is called the "Document Explorer". It provides a new visualization for documents and extends the concept to include folders. This Coach View is only available with the Basic Case Management feature installed.

The palette entry for the Document Explorer looks as follows:

And is only available when the Content Management toolkit has been added to the project. When dragged and dropped onto the canvas, it looks as follows:

It has a rich set of configuration options. These options are illustrated below:


Content Integration node

We should now have a general idea that interaction with an ECM is achieved using a protocol called CMIS and that there are Coach Views that can be used to present the documents to the user. We have also seen that some of these Coach Views call Services in order for them to obtain data. What has not yet been explained is how these Service arrange for the CMIS calls to the ECM provider to be made. IBM BPM provides a node that can be included in a service called the Content Integration node. This is the core to the story.

The Content Integration node provides CMIS access to the ECM provider. It can be added to a variety of Service implementations. When seen in the palette, it looks as follows:

When added to the service implementation canvas, it looks like:

Once added to the canvas, the implementation details of the node must be defined. There are two primary attributes. The first is the server against which the CMIS command is to be performed. This server definition must be create in the Process Application Server options. The second parameter is called "Operation Name" and defines which CMIS operation is to be performed against the server.

Part of the goal of IBM BPM is to try and shield the developer from the worst details of CMIS usage.

The operations that can be performed on a Content Integration node are:


The entries in red are not available to the IBM BPM Document Store.

Each of these commands will be covered in turn.

Depending on which command is selected, the input and output parameters of the Content Integration node will change dramatically.

IBM provides some new Business Object data type definitions for some of those parameters. These new data types are described next

Description of a Document (ECMDocument)

A document is described by the data type called "ECMDocument":


The list of properties associated with the document will be a mixture of ECM specific and application injected.

See also:

Description of a Search Result (ECMSearchResult)

When a search is executed, an object of type ECMSearchResult is returned.

This object contains:


To see how this fits together, think of resultSet as being a list of row properties and each row being a list of column properties.

See also:


Description of ECMSearchResultSet

When a search is performed, we get back an ECMSearchResult. This has a property called "resultSet" which is an instance of ECMSearchResultSet. This data type looks like:

Which as we see is really no more than a list of rows. Each row is an instance of ECMSearchResultRow.


Description of ECMSearchResultRow

This data structure represents a single document instance returned from a search. The instance is a set of properties called columns where each column corresponds to a particular property of the document instance. The order of the columns corresponds to the order of the propertyMetaData in the ECMSearchResult object.

See also:


Description of an ECMID

Description of ECMPropertyMetadata

The Property Metadata is returned for an ECMSearchResult that describes the properties returned to the caller. An instance of this data type contains:

See also:

Description of Document Info (ECMDocumentInfo)

The ECMDocumentInfo contains a single property called "contentURL" which is the Web URL used to retrieve the content of the document. This data type can be seen as a list of variables bound to a Document List Coach View.

Note: It is odd to the author that the ECMDocumentInfo contains only the contentURL. It would seem to have been useful for this data structure to also contain the objectId of the selected documents.

See also:

Description of a folder/document property (ECMProperty)

Both folders and documents may have a list of properties associated with them. These properties are basically name/value pairs.

Description of a folder (ECMFolder)

The details of a folder can be retrieved with "Get folder" or "Get folder by path". This data structure represents the details returned.


See also:


Description of a Content Stream (ECMContentStream)

When one uses the Get Document Content call, we get back an ECMContentStream. If we are creating document, we must also pass in an instance of this structure. This looks as follows (8.5.5):

The description of the data is:


See also:

Description of content event (ECMContentEvent)


See also:

Description of an object type definition (ECMObjectTypeDefinition)

Every document and folder within the ECM has an associated type. This type consists of attributes. An object of type ECMObjectTypeDefinition describes these properties.


For IBM BPM, the property type definitions are:

|| |Name|Type|Updateable|Queryable| |IsReserved|boolean|No|Yes| |IsVersioningEnabled|boolean|No|Yes| |MajorVersionNumber|Integer|No|Yes| |MinorVersionNumber|Integer|No|Yes| |VersionStatus|Integer|No|Yes| |ReservationType|Integer|No|Yes| |DateCheckedIn|Date|No|Yes| |CmIsMarkerForDeletion|Boolean|No|No| |DateContentLastAccessed|Date|No|Yes| |ContentRetentionDate|Date|No|Yes| |CurrentState|String|No|Yes| |IsInExceptionState|Boolean|No|Yes| |ClassificationStatus|Integer|No|Yes| |IndexationId|id|No|Yes| |CmIndexingFailureCode|Integer|No|Yes| |DocumentTitle|String|Yes|Yes| |IBM_BPM_Document_DocumentId|Integer|When Checked Out|Yes| |IBM_BPM_Document_ParentDocumentId|Integer|On Create|Yes| |IBM_BPM_Document_UserId|Integer|When Checked Out|Yes| |IBM_BPM_Document_FileType|String|When Checked Out|Yes| |IBM_BPM_Document_HideInPortal|Boolean|On Create|Yes| |IBM_BPM_Document_ProcessInstanceId|Integer|On Create|Yes| |IBM_BPM_Document_Properties|String[]|When Checked Out|Yes| |IBM_BPM_Document_FileNameURL|String|When Checked Out|Yes| |IBM_BPM_Document_ContentMigrated|Boolean|When Checked Out|Yes| |cmis:name|||| |cmis:objectId|||| |cmis:baseTypeId|||| |cmis:objectTypeId|||| |cmis:createdBy|||| |cmis:creationDate|||| |cmis:lastModifiedBy|||| |cmis:lastModificationDate|||| |cmis:changeToken|||| |cmis:isImmutable|||| |cmis:isLastestVersion|||| |cmis:isMajorVersion|||| |cmis:isLatestMajorVersion|||| |cmis:versionLabel|||| |cmis:versionSeriesId|||| |cmis:isVersionSeriesCheckedOut|||| |cmis:versionSeriesCheckedOutBy|||| |cmis:versionSeriesCheckedOutId|||| |cmis:checkinComment|||| |cmis:conentStreamLength|||| |cmis:contentStreamMimeType|||| |cmis:contentStreamFileName|||| |cmis:contentStreamId||||

Content Integration node – Add document to folder

This function adds a Document already uploaded to an existing folder. The input parameters are the DocumentId of the document to be added to the folder plus the FolderId into which the document will be added.

Content Integration node – Cancel check-out document

Cancel a check-out of a previously checked-out document.

See also:

Content Integration node – Check-in document

Check in a checked-out document.

See also:

Content Integration node – Check-out document

Check out a document. The input parameter is the DocumentId of the document to be checked-out.

When we check out a document, we are given a temporary DocumentId that we can then use to work with it. It is possible to "lose" that DocumentId before we check the document back in. For example as the result of a coding error or system crash. In this case, the original document will remain locked and we will be prevented further access to it. To resolve that issue, we need a mechanism that will allow us to retrieve the temporary document id so that we can unlock it again.

See also:

Content Integration node – Copy document

Make a copy of a document.

Content Integration node – Create document

Create a new instance of a document. Each document has a type associated with it. We must supply the id of the type. A document also lives within a folder. We must supply the id of the folder in which it will be created. The input data for the document can also be supplied.

Inputs:


Outputs:

See also:


Content Integration node – Create folder

Create a new folder as a child of an existing folder. The parameters to this command are


It returns a "Folder ID" that uniquely represents the new folder.

The Parent Folder ID can be retrieved by a prior call using the "Get folder by path" operation. This will return an ECMFolder object. Contained within it is a property called "objectId" which is the ID of the parent folder.

An error will be thrown if the folder already exists.

See also:

Content Integration node – Delete document

Delete a document. The document to be deleted is supplied by a "Document ID". A Boolean called "All Versions" can be used to delete all versions or just some.

Content Integration node – Delete folder

Delete a folder. This command will delete a folder. The folder is identified with its own folder ID. If the folder contains content, it will not be allowed to be deleted. One solution to deleting folders which contain content is to first delete the content and then delete the folder. This can be achieved with the already existing primitives supplied as part of the Content Integration node suite of operations. The algorithm is to get a list of all documents in the folder and then, looping over each item, delete it. Once all the items have been deleted, we can then delete the folder.

CMIS provides a useful function called "deleteTree" which deletes the folder and all descendant objects. Unfortunately, that operation is not mapped to the Content Integration node and can't be used.

Content Integration node – Get all document versions

Retrieve all the versions of a document.

Content Integration node – Get documents in folder

This method takes a folder ID and returns a list of ECMDocument objects where each document corresponds to a document found in the folder defined by the ID.

Content Integration node – Get document

Retrieve the details of a specific document.

Inputs:

Outputs:

See also:

Content Integration node – Get document content

Retrieve the content of a specific document. The input parameter to this operation is a "Document ID" and we get a "Content Stream (ECMContentStream) back".

Content Integration node – Get folder

Retrieve the details of a specific folder.

See also:


Content Integration node – Get folder by path

This method takes the path to a folder as a String and returns an ECMFolder object which is populated with the details of that folder. Importantly, this includes the ObjectID that uniquely identifies the folder.

Inputs:

Outputs:

See also:


Content Integration node – Get folder tree

Content Integration node – Get type definition

Retrieve the definition for a type.

Inputs:

Outputs

See also:

Content Integration node – Get type descendants

Given a specific type definition, we can get the types that inherit from this type. Root types include "cmis:document" for documents and "cmis:folder" for folders.

Inputs:


Outputs:

See also:

Content Integration node – Move document

Content Integration node – Move folder

Move folder from one part of the document tree to another.

See also:


Content Integration node – Remove document from folder

Remove a document from a specific folder.

See also:


Content Integration node – Search

The search mechanism is based on the CMIS Query language. Process Designer provides development tooling and guidance on creating such a search. If a service contains a Content Integration node and the node is flagged as a search request, a new tab can be found at the top of the service definition. This tab is called "Content Filters" and is used to build a CMIS query against the ECM. The query to be built has a number of properties and provides a number of ways in which it can be constructed. Here is a screen shot of a basic setup.

The first thing we will discuss is the entry titled "Method of creating search". This has two possible values:

By using Data Mapping, you have full low-level control over the query executed but in order to use that, you will have to understand and test CMIS queries. The Content Filters setting allows you to build the query "visually". Let us concentrate on how to build queries in that manner.

The "Object Type" area allows us to define which types of objects we wish to search upon. At the highest level, there are two types of object we can list. These are:

We can think of both Documents and Folders as base "types" of more specific types of document. For example, consider a business which has multiple types of documents. Maybe it has "receipts", "orders", "tax returns" and more. Each of these is indeed a "Document" but the ECM system can define specializations of such documents against which constraints and other rules may be applied. The types of documents (all based on the base "Document" type) is defined at the ECM server but can be dynamically retrieved by IBM BPM.

Once the choice is made in the visual wizard builder as to whether we want to return Documents or Folders, we then have the option to further refine the type of Objects to return by selecting a specialization using the Select button. In the following screen shot we have selected that we want Documents (one of the two allowable types) and from there we clicked Select. A query was then made to the ECM provider to retrieve a list of the other types of Documents available. From here we can select the root Document type or a more specialized variant. When the query is finally made, only documents of that specific type (or which inherit from that type) are returned:

Once we have defined the types of Object to be returned by the search, the next aspect is to define which properties of the Object we wish to return and/or display. In the middle of the editor a table is shown. This table should be used as a mental aid to consider the data returned. Each column in the table is an indication of the property of data from an object that will be shown. The Add and Remove buttons can be used to add additional columns or remove existing columns.

Finally, we can define filtering information in the "Search Criteria" area. Here we specify expressions which are applied to the data and only those expressions which evaluate to true are returned to the user.

The Content Integration searching can be filtered by a variety of parameters including:


The return data from a Content Integration search is an object of type ECMSearchResult which contains:


The result of building this query visually will be Data Mappings that are bound for us:

By changing the method of search from "Content Filters" to "Data Mapping", we now have full control over the searching:

The main part of the Content Filters editor is now grayed to indicate that it is now editable while the CMIS query data mapping is now modifiable. Here we can supply a valid CMIS query, the result of which will be used as the query against the ECM provider.

The properties available for Documents are:

|| |Property|Queryable| |cmis:name|●| |cmis:objectId|●| |cmis:baseTypeId|●| |cmis:objectTypeId|●| |cmis:createdBy|●| |cmis:creationDate|●| |cmis:lastModifiedBy|●| |cmis:lastModificationDate|●| |cmis:changeToken|| |cmis:isImmutable|| |cmis:isLatestVersion|| |cmis:isMajorVersion|| |cmis:isLatestMajorVersion|| |cmis:versionLabel|| |cmis:versionSeriesId|| |cmis:isVersionSeriesCheckedOut|| |cmis:versionSeriesCheckedOutBy|| |cmis:versionsSeriesCheckedOutId|| |cmis:checkinComment|| |cmis:contentStreamLength|●| |cmis:contentStreamMimeType|●| |cmis:contentStreamFileName|●| |cmis:contentStreamId||

The properties available for Folders are:

|| |Property|Queryable| |cmis:name|●| |cmis:objectId|●| |cmis:baseTypeId|| |cmis:objectTypeId|●| |cmis:createdBy|●| |cmis:creationDate|●| |cmis:lastModifiedBy|●| |cmis:lastModificationDate|●| |cmis:changeToken|| |cmis:parentId|●| |cmis:path|| |cmis:allowedChildObjectTypeIds||

Here are some sample queries:

SELECT cmis:name, cmis:creationDate, cmis:objectId FROM cmis:document ORDER BY cmis:name ASC

See also:


Content Integration node – Set document content

Set the content of an existing document.

Content Integration node – Update document properties

This operation is used to update the properties of an existing document. The parameters to it are:


See also:

Examples of Content Integration nodes

Here are some examples of using Content Integration nodes

Finding documents in a specific folder

Imagine we wish to find documents located in a specific folder. We know the path name to the folder.

We create a service that looks as follows:

The first Content Integration node performs a "Get folder by path" command using a path variable as input. This returns an ECMFolder object. Contained within that object is a property called objectId which is the ID of the folder corresponding to the path. In the Search Content Integration node, we build a specific CMIS query that looks as follows:

"SELECT cmis:name, cmis:creationDate, cmis:objectId FROM cmis:document WHERE IN_FOLDER('" + tw.local.folder.objectId + "') ORDER BY cmis:name ASC"

This uses the CMIS function called "IN_FOLDER" which constrains the search to be documents within a folder given by its ID. We now have that ID from the previous step and use it.

Creating folder for a case

BPM provides for the creation of a folder for BPM cases but there are times when we want to manage our own case folders. Let us assume that we are creating a "Defect Recording" solution. When a customer calls in, we create a case for the defect they are reporting. When a case is created, we will store content associated with that case in a folder located at:

The parent folders are "/", "/Tests", "/Tests/Reports" and finally "/Tests/Reports/<Case ID>".

We repeat this for another folder called "Reports" that should exist beneath "Tests". The result will be:

Finally, the security permissions on the new folder must be modified to allow BPM to create new sub-folders within.

Creating a document from a managed file

There may be times where we want to create a document from a managed file. This is typical where we have the managed file acting as a template for the document to be worked upon. One solution for performing this task is illustrated next:

The algorithm in play here is that first we get a temporary directory on the file system. From there we augment that with a unique file name. This file will act as our temporary data store. Next we extract the content of the managed file into the file system at the location pointed to by the temporary file. Next we read the file from the file system converting it into base64. We are now finished with the file so we can delete it from the file system. We now prepare the data structure for an ECM document create and call the ECM to create the document.

Examples of CMIS queries

As we gain experience, we see many similar raw CMIS queries being formulated. Here is a selection of the most common or interesting:

All documents in a folder

CMIS has a special predicate called "in_folder(<path>)" which is true for entities in that folder. To search for documents in a specific folder, we can use:

SELECT * FROM cmis:document where IN_FOLDER('<path>')

Inbound ECM events

In addition to being able to cause work to be performed in an external ECM system, IBM BPM can also be notified when external changes happen. This means that changes in the ECM can result in events being sent to IBM BPM that inform it of interesting updates. An indication of a change in an ECM is termed a "content event". It is the responsibility of the external ECM to generate and publish such content events. This means that logic/code needs to be built and injected into the ECM. An instance of such code is called an ECM event handler.

The content event story has some key concepts associated with it. The first is the "Event source ID". This specifies which ECM provided the incoming event because there may potentially be many such providers. For FileNet this will be the Object Store.

When an incoming event arrives, it may be delivered to one or more interested parties within BPM. The selection of who is interested in what is made by a filter upon:


The following lists the events supported by BPM and what types of ECM object they apply to:

|| |Even Type|Allowable Object| |CheckedIn|Document| |CheckedOut|Document| |CheckOutCanceled|Document| |ClassChanged|Folder or Document| |ClassifyCompleted|Document| |Created|Folder or Document| |Deleted|Folder or Document| |Frozen|Document| |Locked|Folder or Document| |PublishCompleted|Document| |PublishRequested|Document| |SecurityUpdated|Folder or Document| |StateChanged|Document| |Unlocked|Folder or Document| |Updated|Folder or Document| |VersionDemoted|Document| |VersionPromoted|Document|

An Event Handler interacts with the BPM run-time by sending REST requests to the server.

The format of the request is:

POST /rest/bpm/wle/v1/event?eventSourceId=?&objectTypeId=?&eventType=?&objectId=?

IBM supplies an event handler for the IBM FileNet product. This is supplied in source form in a file found at:

C:\IBM\WebSphere\AppServer\BPM\EventHandlers\ECM\FileNet\filenet-bpm-event-handler-51-src.jar

To receive incoming events, we must define an Event Subscription.

The details of an event subscription look as follows:

First we must select the ECM Server against which we are listening. This is an ECM server that will have been defined in the Process Application Settings.

We next define the Event Type for which this particular subscription is listening.

Finally, we define a service that will be called when a subscription event is detected. We can do this by clicking the "New..." button.

The options here allow for two distinct implementation bodies. The first which is named "Create a service that directly invokes a Content UCA" creates an implementation that looks like:

This has the added side effect of creating a same named UCA definition. I am not convinced that this is the correct semantics however it is easily corrected by deleting the newly created UCA and mapping the call to your own UCA.

See also:

Debugging ECM calls

Since IBM BPM is making CMIS calls using the Web Services protocol, we can intercept those calls and view the requests and the responses. To achieve this, Apache TCPMon can be used. In the ECM Server definitions, specify the port number for TCPMon and have TCPMon forward those requests to the final destination for the ECM provider.

Note: 2012-11-25 – It seems that the introduction of TCPMon is causing more problems than it solves. TCPMon has been seen to truncate data and only pass a subset onwards... ouch!!

As an alternative to TCPMon, the IID TCP/IP Monitor view may be used. It seems unable to parse the content as XML but a copy/paste of text into an XML editor seems to do the trick.

CMIS Tools

When learning or debugging interactions with ECMs through CMIS we can use a rich selection of existing CMIS based tools. This section provides some notes on using these.

CMIS Workbench

The CMIS Workbench tool can be used to interact with an ECM Server using CMIS directly using the CMIS protocol. CMIS Workbench is part of the Apache Chemistry project and can be found here:

http://chemistry.apache.org/java/developing/tools/dev-tools-workbench.html

If you are going to be working extensively with IBM BPM and ECM, it is suggested that you become familiar with this tool as it will reveal a wealth of information.

To access Alfresco, use:

The URLs are:

Alfresco

To access FileNet (including IBM BPM Content Store) use:

API Programming for ECM

The primary way for interacting with the ECM is via CMIS. However, there are other opportunities to work with the ECM outside of this area.

Uploading a new document

The core appears to be an HTTP request to:

/portal/jsp/ecmDocument?operation=ajax_createDocument

This needs important HTTP header data including:

The properties sent by the browser seem to include:


Directly retrieving document content via URL

Although not documented as being supported, and hence of high risk to use, there appears to be a mechanism where we can retrieve the content of a document directly via a URL. This is achievable as a REST request with the following syntax:

GET /portal/jsp/ecmDocument?operation=ajax_getDocumentContent&snapshotId=<id>&ecmServerConfigurationName=<Name>&documentId=<documentId>

snapshotId – A snapshot ID. It is likely that this is paired with the ecmServerConfigurationName property so that the server can be resolved.

ecmServerConfigurationName – The identity of the ECM server that is to be contacted. Examples seen of this include:

documentId – The id of the document whose content is to be retrieved.

ECM Providers

IBM BPM Document Store

Starting with version 8.5 of IBM BPM, a component called the "BPM Document Store" was supplied with the IBPM product. This is an implementation of a CMIS aware document repository that simply "exists" within the BPM product. For some purposes, this will provide all that is needed to work with documents without having to have an external document management system available to you.

It is important to note that the BPM Document Store does not support folder based operations. To leverage those functions, a full function CMIS compliant ECM provider should be used.

A document managed by the BPM Document Store has a data structure associated with it that is called "IBM_BPM_Document". It's properties are shown in the following table:

|| |Property|Type|Description| |IBM_BPM_Document_DocumentId|Integer|A unique Id for this document instance| |IBM_BPM_Document_ParentDocumentId|Integer|A unique Id for the parent of this document instance| |IBM_BPM_Document_FileType|String|What kind of document is this? Choices are:

    • | |IBM_BPM_Document_FileNameURL|String|The original file name of the document| |IBM_BPM_Document_HideInPortal|Boolean|??| |IBM_BPM_Document_ProcessInstanceId|Integer|??| |IBM_BPM_Document_Properties|String[]|Additional properties of the document. In a search result, the data will contain an array of properties. Each item will be of the format "<name>,<value>".

For example, if a searchResult is returned, the the following can be used to access a list of Strings for each property:

searchResult.resultSet.row[i].column[j].value[k]

Experience also seems to show that if there is only ONE property, instead of an array of Strings being returned, only a single string is returned. We need to accommodate this in the logic.| |IBM_BPM_Document_UserId|Integer|Creator or last modifier of the document|

When building a CMIS query by hand, it seems that the query must execute against "IBM_BPM_Document" as opposed to "cmis:document" which would be the normal CMIS target. For example, the following is a valid CMIS query against the IBM BPM Document Store:

SELECT cmis:name, cmis:objectId, cmis:versionSeriesId, IBM_BPM_Document_FileNameURL, IBM_BPM_Document_FileType, IBM_BPM_Document_Properties FROM IBM_BPM_Document ORDER BY cmis:name ASC

IBM BPM Content Store

If you have IBM BPM Advanced plus Basic Case Management installed, then you have access to an additional IBM BPM repository called the "IBM BPM Content Store". This is an embedded instance of FileNet licensed for use solely in conjunction with IBM BPM. Unlike the IBM BPM Document Store, the IBM BPM Content Store also provides folder support and comes with content administration tools such as "acce".

BPM security access to the BPM Content Store

As IBM BPM operates, it needs to connect to the BPM Content Store. Even though BPM "owns" the BPM Content Store, it still needs security credentials in order to interact with it. If one needs to change these credentials (I.e change the user or password) there are wsadmin command available to do that.

To add a new technical user which has the authority to connect:

AdminTask.maintainDocumentStoreAuthorization('[-deName <deName> -add <userName>]')

To remove a technical user from the list of users authorized to connect:

AdminTask.maintainDocumentStoreAuthorization('[-deName <deName> -delete <userName>]')

To list the current set of technical users that are authorized to connect:

AdminTask.maintainDocumentStoreAuthorization('[-deName <deName> -list]')

Here is an example of that command being run:

wsadmin>print AdminTask.maintainDocumentStoreAuthorization('[-deName ProcessCenter -list]') Authorization on the domain for the IBM BPM document store CWTDS2034I: Access is granted to the IBM BPM document store domain 'uid=dadmin,o=defaultWIMFileBasedRealm' with access mask '459,267'. Authorization on the object store for the IBM BPM document store CWTDS2035I: Access is granted to the IBM BPM document store object store 'uid=dadmin,o=defaultWIMFileBasedRealm' with access mask '838,205,440'. wsadmin>

Using Acce

The IBM BPM Content Store supplies a browser based content administration tool called "acce". This is NOT meant for end users but is a powerful administration tool. It can be accessed from a browser through the following URL:

https://host:port-https/acce/

Here is an example screen shot:

One of the features of acce is the ability to look at classes (also known as document and folder types). From the docs → Data Design → Classes one can see the different document and folder types available:

FileNet

IBM's production enterprise content management system is called FileNet.

FileNet Installation

On a few occasions I found that I needed to install FileNet on various systems to illustrate IBM BPM interaction with it. FileNet is a rich and powerful product in its own right and has a world of installation options. Unless one trains as a FileNet person, it is unlikely one can sit down with the software and simply "get it running" without some form of a cheat sheet. As I learned to install FileNet, I made notes on what I did to achieve this task. These notes follow.

In the following story there were many downloads involved these included:

|| |Part|Description| |FN_CE_5.2_WINDOWS_ML|Content Engine| |5.2.0.3-P8CPE-WIN-FP003a|Content Engine FixPack. It may be that this can be installed as an alternative to the above.| |FN_CEC_5.2_WINDOWS_EN|Content Engine Client| |5.2.0.3-P8CPE-CLIENT-WIN-FP003a|Content Engine Client FixPack. It may be that this can be installed as an alternative to the above.| |CMIS – CZP1BML|Replaced by maintenance| |1.0.0.2-FNCMIS-FP002-Windows|CMIS maintenance (replaces CMIS - ZP1BML)| |WAS 7.0 – C1G0TML|WAS 7.0 base| |7.0.0.33-WS-UPDI-WinAMD64|WAS maintenance installer tool| |7.0.0-WS-WASSDK-WinX64-FP000000033.pak|WAS SDK maintenance| |7.0.0-WS-WAS-WinX64-FP000000033.pak|WAS base maintenance|

The eAssembly for FileNet 5.2 is CRP0SML.

Once extracted, we can begin the binary installation procedures. This will lay down the binaries necessary for FileNet configuration and subsequent execution.

Next we have the obligatory software license agreement.

Now we can choose which components to install.

Now we choose where on the filesystem we wish the FileNet binaries to be placed.

This next entry is interesting. I left it "alone".

We have now installed the IBM FileNet Content Platform Engine binaries. This does not mean that we have a FileNet "ready to go", we still need to configure an instance … however, the binaries are now in place.

Given that IBM BPM currently uses WAS 8.5.5 it would be great if we could use that WAS version as the basis for deploying FileNet, however there is a problem. The CMIS component that we wish to run only executes on WAS 7.0. This means that we need at least one WAS 7.0 server in our environment. In my recipe, I installed a WAS 7.0 profile for both FileNet and CMIS usage. Note that we must use WAS 7.0.0.9 or better. In my tests I used WAS 7.0.0.33.

After installation of FileNet, we can now create a WAS 7.0 profile. Using the PMT command is good for this.

Start the WAS 7.0 Server and wait for it to complete. Now we can launch the FileNet Configuration Manager tool. It starts by asking us some basic details including which flavor of WAS we intend to use.

Next we are asked which components we wish to configure. I'll state right here that I don't know what these all mean … however, note that I explicitly excluded LDAP configuration as I was hoping to not use LDAP at all. This succeeded just fine.

The WAS 7.0 profile needs to know where to find DB2. I had to update the DB2 environment variables in WAS through the WAS admin console.

Next I created a database in DB2 called FNGCD.

The first major task in FileNet configuration is shown next. To complete this we need to know the server hosting DB2, the port number it is listening upon and the database name (which we just created).

Create a second Database called FNOS with a non-default page size of 32k. Once done, we can now perform the next step in our configuration of FileNet.

The next configuration step is a mystery to me but I ran it anyway with no negative impact.

The next step is still much of a mystery to me but I believe it builds the EAR necessary for deployment.

We next explicitly enable the "Deploy Application" entry. When run, this deploys the application which "is" FileNet into the WAS 7.0 profile.

It may not strictly be necessary, but I stopped and restarted the WAS server at this point. Once the WAS server had been restarted, my belief was that FileNet had been configured. There were a few additional one-time steps needed to be performed. We start by attaching the FileNet ACCE console:

http://<host>:<port>/acce

We are next prompted to define a domain:

As we follow through the wizard, we are asked to name the security system that FileNet should use. Since I was notusing LDAP, I supplied IBM Virtual Member Manager (VMM) as my answer.

Next we must define an an Object Store.

Install FileNet CMIS support

By default, FileNet does not come with CMIS support baked in. Instead, one must install an additional part in order to use this feature. For FileNet 5.2, this is part number CZP1BML. Bizarly, this component will ONLY run on IBM WAS 7.0 (C1G0TML).

After extracting the ZIP, run the "install.bat" script.

Once CMIS is installed, the FileNet Client files must be installed into the same installation directory as CMIS. The FileNet Content Platform Engine Client for 5.2 is part number CIV1JEN.

This is an application called 5.2.0-P8CE-CLIENT-WIN-FP003.

Once the client is installed, CMIS must be configured. In the CMIS install directory there is a folder called "config" and within there, there is a "config.bat" script.

See also:

Installation of FileNet Inbound Event support

The creation, modification or deletion of a document or folder within the FileNet ECM generates a FileNet event. We wish to be able to map those FileNet events to the corresponding events within IBM BPM. To achieve that, we need to configure FileNet so that when a FileNet event occurs, the corresponding BPM event is triggered. Fortunately, IBM provides a sample application to achieve just that. The application is contained in a JAR that can be found at:

<BPM>/BPM/EventHandlers/ECM/FileNet/filenet-bpm-event-handler-51.jar

Within this JAR is the implementation code for a FileNet event handler. The source code for this sample is also supplied in the file called:

<BPM>/BPM/EventHandlers/ECM/FileNet/filenet-bpm-event-handler-51-src.jar

To start the configuration, we need to create a text file which we call "bpmserver1.properties". It does not appear necessary that the file have a special name but we will stick with this choice. Within this file we code the userid/password that FileNet will use to connect to IBM BPM via REST as well as the prefix URL used to connect to the BPM server. For example:

bpm.server.username=kolban bpm.server.password=password bpm.server.uri=http\://localhost\:9081

Using a FileNet admin tool such as acce, create a new FileNet document that contains this text file as content. The document should be in the same Object Store as the documents you are going to watch. It does not matter what the name of the file is called. In our example, we will call it "bpmserver1".

Once the document containing the properties has been uploaded, obtain the Object ID for that document. This will be passed as a parameter later on to the Java class that will be driven when a document is changed. In the following screen shot we see "acce" being used to examine the uploaded document and we note its document ID which is the string:

{12DC657A-BA31-4792-B313-ED85428B076C}

See the following:

Our next step is the definition of an "Event Action". This is the definition of "what" to do when an event is produced. Again in acce, under "Events, Actions and Processes > Event Actions", we define a new entry:

In the Event Action definition, we are going to declare that when an event happens, we want a Java class to be invoked. The name of the Java class is:

com.ibm.bpm.integration.filenet.BPMEventHandler

Since this Java class is supplied by IBM BPM and not FileNet, we also need to check the box that we are going to configure a "code module" which is how FileNet finds external Java classes.

The content for our new Code Module will come from the file called "filenet-bpm-event-handler-51.jar".

We end up with a summary screen at the end:

Now it is time to create the Subscription definition against a document type. The document types are found in "Data Design > Classes".

This starts the wizard for the subscription definition:

Once the subscription definition has been made, one further item is required. The subscription can pass a parameter to the Event Action definition. Since our Event Action is the IBM BPM supplied Java class which wishes to call IBM BPM, we need to tell that class about our BPM server. The way that is done is to pass in a FileNet document ID that itself contains the properties that define the connection. We built such a document in a previous step and recorded the document ID. This is now supplied in the "User String" property of the subscription definition:

See also:

Alfresco

Alfresco is an open source document management system. Details about this project can be found:

http://www.alfresco.com/

Alfresco Installation

Here is an installation walk-through for Alfresco. Ensure PostgreSQL is installed as that will be used as the database for Alfresco's own operations. Next we will perform a walk-through of an Alfresco installation.

Once launched, we are asked which language we wish to use during installation:

We are shown a welcome screen:

We are given the choice of what type of installation we wish to perform. The choices are Easy or Advanced. I chose Advanced:

We select which components we wish to add:

We are asked where on the local file system we wish the product to be installed:

Next we are asked what port number should be used for the database:

Since Alfresco runs on Tomcat, we are asked to provide properties of the Tomcat environment:

We are asked which port for FTP should be used:

Now we are asked for the port for remote commands:

Now we supply an administrator password. I recommend admin/admin for testing purposes.

Next we are asked whether Alfresco should start at system start-up time:

We have now supplied all the information necessary for the installation and configuration to proceed:

When the installation completes, we are presented with a next-steps screen:

We can now login to the Alfresco system dashboard. This can be accessed from a browser at:

http://localhost:8080/share/

The first start of an Alfresco server may take a little while as the databases appears to be created and populated.

When documents are added to Alfresco, those documents are not immediately seen in a search. This means that if we create a document and then perform a query, the new document may not show up. This has serious implications for IBM BPM. The Document List Coach View allows us to upload documents and then immediately refreshes its view by performing a search. If the search does not show the document, the end user may not realize that the document has already been uploaded.

Digging into this story, we find that the search engine used by default in Alfresco is called "solr". This search engine periodically indices its data (by default once every 15 seconds). This means there is a latency between a new document addition and it showing in a search. If we can not accept this, we can ask Alfresco to utilize an alternative search engine called "lucene" that maintains a constant index and hence searches for newly uploaded documents show immediately. Here is a recipe to enable the "lucene" engine.

index.subsystem.name=lucene

index.recovery.mode=FULL

/alfresco/alf_data/solr /tomcat/conf/Catalina/localhost/solr.xml /tomcat/webapps/solr

index.recovery.mode=AUTO

Alfresco debugging

When calls are made to Alfresco and those calls fail, additional log information may be found in

<AlfrescoRoot>/tomcat/logs

specifically, the file called:

alfrescotomcat-stdout.<date>

seems to contain a wealth of goodness.

Daeja ViewONE

The steps that need to be followed to get Daeja ViewONE installed and running has been commented upon as being less than obvious. What follows are a cheat-sheet in getting the product operational.

After downloading the installation package which is a file called "DJ_VO_VRTL_4.1_WIN_ML.zip", we should extract its content. Once extracted, we will find a file called "DAEJA_VIEWONE_VIRTUAL_4.1.0_WIN_ML.exe".

Run that application and the installation will proceed.

The first screen show to us is the introduction screen. No new data need be entered or chosen here.

Next comes the obligatory license agreement. Read it carefully and ensure that you agree with all the terms and conditions. If you don't, cancel out immediately and no further action is required.

Next we choose the variant of the install. Selected both IBM Daeja ViewONE Professional and IBM Daeja ViewONE Virtual.

We are asked if we wish to install a production or non-production environment. To be honest, I have no idea what the difference may be. In my testing I did NOT select a non-production environment (does that meant I selected a production environment?).

Next we can select which install options we need. I selected them all.

We are now asked where on our file system we want to install the product. Personally, I install IBM products under "C:\IBM" but you can choose your own location.

We are next presented with a summary of what will be installed.

The installation proceeds and we will be notified when the installation completes.

We now have the product installed however we can't use it yet. If we were to consult the documentation on the product, it would now ask us to create a WAR file from some of the JARs of the product and deploy that to our Java EE server. Unfortunately, that leaves a lot to the imagination. Here is the recipe I followed to achieve that task.

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd"> <web-app> <servlet> <servlet-name>viewone</servlet-name> <display-name>ViewONE Servlet</display-name> <description>Servlet for returning image tiles</description> <servlet-class>com.ibm.dv.server.Platform</servlet-class> </servlet> <servlet-mapping> <servlet-name>viewone</servlet-name> <url-pattern>/v1files/*</url-pattern> </servlet-mapping> </web-app>

logfile=WEB-INF/viewone.log modifiedDocumentCheckMethod=uniqueid finalImagePath=burnt tiffSaveColor=true tiffSaveMonoG4=true responseType=statusAndImage

http://<host>:<port>/viewone/v1files/viewone.js

An IBM BPM Coach View sample has been provided that integrates Deaja with BPM. This can be found on the IBM BPM samples repository here:

https://hub.jazz.net/project/SaschaSchwarze/bpm-ui-Daeja Document Viewer/overview

Page 58

No Comments
Back to top