You are here

Talend Data Integration with NuoDB

Error message

Note: This blog was published over a year ago. Content may be out of date.

In this TechBlog, we will demonstrate how to use Talend Data Integration to create and execute a Talend Data Integration job with NuoDB using Talend Open Studio. Talend’s data integration provides an extensible, highly-performant, open source set of tools to access, transform and integrate data from any business system in real time or batch to meet both operational and analytical data integration needs. JDBC is used to connect to the underlying NuoDB database.

This example loads a NuoDB table with contact data from a CSV file using Talend Open Studio. Talend Open Studio is an Eclipse-based open source integration tool that handles many different formats and provides a rich library of objects for transformation and scripting. 

Start with a clean Ubuntu 12.04 machine.     

Download and Install Java


 
$ sudo apt-get update
$ sudo apt-get install openjdk-7-jre

Make sure you have Java 7 installed on your Ubuntu Machine. Check this with:


 
$ javac -version 
​$ java -version 

Java 7 is necessary for Talend DI. Talend DI cannot run on Java 6.

Download and Install NuoDB:

Install and Configure NuoDB:

Download the NuoDB Ubuntu/Debian package and install it using the following command (please check the below path for any errors while installing):


 
sudo dpkg -i $HOME/Downloads/nuodb_2.n.n.n_amd64.deb

Edit the NuoDB default.properties file to provide a domain password (we will use the password "bird") and then start the NuoDB services:


 
sudo sed -i 's/#domainPassword =/domainPassword = bird/g' /opt/nuodb/etc/default.properties
sudo service nuoagent start
sudo service nuorestsvc start

For any additional details and discrepancies, you can access our Linux Installation guide here.

After successfully installing, run QuickStart. Enter "domain" for the User field and the "bird" value you set above, for the Password field.

For any installation errors, please refer below:

a. Incorrect Credential Error:

Please edit /opt/nuodb/etc/default.properties file using your favorite editor. Uncomment the property domainPassword and assign a value

domainPassword=bird

b. Disable Transparent Huge Pages Error:

Ubuntu by default disables Transparent Huge Pages. If it's not disabled, log in as a root user and turn off THP's. Please refer for any THP's details http://dev.nuodb.com/techblog/linux-transparent-huge-pages-jemalloc-and-...


 
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Download and Install Talend Open Studio (5.6)

Download Talend Open Studio for Data Integration and unzip.

Launching Talend DI Open Studio

1. Launch Talend. Change directories to the one in which you unzipped the Talend download. Run:

<span style="color: #666666;">$ </span>.<span style="color: #000000; font-weight: bold;">/</span>TOS_DI-linux-gtk-x86.sh

The first time you run this script, there will be some extra steps. Accept the license if prompted to do so. You will be prompted to Create a New Project. Accept the defaults and click Create... You will be prompted for a Project Name. You may accept the defaults and select Finish.

2. The next dialog will be Talend Open Studio for Data Intergration. Select Open.

3. The Connect to TalendForge dialog will be displayed. You may select Skip. 

Again, if this is the first time you are running Talend Open Studio, you will see some extra steps here. You will be prompted to install Additional Talend Packages. Select Do not show this again and Finish. You will be asked to Review Licenses. Accept the terms by selecting Accept All. Installing new Talend features will take a while. During this time you will be asked to review licenses again. Again, select Accept All.

Talend Open Studio will open.

Install additional packages to support third party libraries:

You may need to install additional packages. Select the Help menu at the top of the screen and then Install Additional Packages. If any packages are listed, install them.

For more information, visit the Talend Help Center.

Creating the NuoDB CONTACT Table

Create the sample table CONTACT with no data as shown in the DDL below: 


 
CREATE TABLE
 
CONTACT
(
CONTACT_ID INTEGER,
FIRST_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
EMAIL VARCHAR(50),
PHONE VARCHAR(50),
STREET_ADDR1 VARCHAR(50),
STREET_ADDR2 VARCHAR(50),
CITY VARCHAR(50),
STATE VARCHAR(50),
POSTAL_CODE VARCHAR(50)
);

 
$ /opt/nuodb/bin/nuosql test --user dba --password goalie
 
SQL> CREATE TABLE CONTACT (CONTACT_ID INTEGER, FIRST_NAME VARCHAR(50), LAST_NAME VARCHAR(50), EMAIL VARCHAR(50), PHONE VARCHAR(50), STREET_ADDR1 VARCHAR(50), STREET_ADDR2 VARCHAR(50), CITY VARCHAR(50), STATE VARCHAR(50), POSTAL_CODE VARCHAR(50) );

Create the Input CSV File

Copy the text here into a CSV file called Contacts.csv (copy it somewhere and save it with .csv) to be used by Talend DI. Note the columns don't match the DDL above, but don't worry about that (that's the point!).

Contacts.csv

First Name,Last Name,Prefix,Suffix,Email,Phone,Street Address 1,Street Address 2,City,State,Country,Zip David,King,Mr.,,dking3251@gmail.com,(747) 555-5555,,,Los Angeles,CA,US,13415 Sybil,Bedford,Ms.,,sybil91@msn.com,(301) 555-5555,,,Rockville,MD,US,91710 Jocelyn,Brooke,Ms.,,brooke@someconsultants.com,(212) 555-5555,,,New York,NY,US,10941 Stephen,Spender,Mr.,,spender@virginia.edu,(703) 555-5555,,,Arlington,VA,US,55403 Lee,Bartlet,Mr.,Jr.,lee.bartlet@summerhotels.com,(305) 555-5555,,,Jacksonville,FL,US,98508 Mary,Amberly,Ms.,,mary_aberly@gmail.com,(404) 555-5555,,,Atlanta,GA,US,59201 Helen,Ledwidge,Ms.,,helen_ledwidget@yahoo.com,(747) 555-5555,,,Los Angeles,CA,US,91401 Mary,Champernowne,Ms.,,champma@ghly.com,(602) 555-5555,,,Phoenix,AZ,US,37177 Mark,Straithes,Mr.,,mark.straithes@dun.com,(206) 555-5555,,,Seattle,WA,US,41234 Brian,Fox,Mr.,,fox1415@gmail.com,(212) 555-5555,,,New York,NY,US,98134 Louie,Piper,Mr.,,louie.piper@news.com,(360) 555-5555,,,Seattle,WA,US,13411 Bernie,Oberland,Mr.,,bernie98@yahoo.com,(301) 555-5555,,,Germantown,MD,US,63735 Edward,Thomas,Mr.,M.D.,dredwardthomas@gmail.com,(408) 555-5555,,,Sunnyvale,CA,US,34563 Joan,Thursley,Ms.,,joanthrsley@gmail.com,(410) 555-5555,,,Baltimore,MD,US,38876 Gerry,Wachett,Mr.,,gerry.wachett@security.com,(203) 555-5555,,,Bridgeport,CT,US,66555 Thomas,Caldwell,Mr.,,thomas.caldwell@davisprogrammers.com,(518) 555-5555,,,Troy,NY,US,87914 Dan,Croylon,Mr.,,dan.croylon@mmii.com,(561) 555-5555,,,West Palm Beach,FL,US,55552 Mike,Miller,Mr.,,miller@aol.com,(919) 555-5555,,,Chapel Hill,NC,US,41341 David,Evans-Wentz,Mr.,,david3221@gmail.com,(562) 555-5555,,,Paramount,CA,US,55647 Tracy,Weeks,Ms.,,tracy.weeks@bostonhealingllc.com,(602) 555-5555,,,Phoenix,AZ,US,26737 Ekki,Giesebrecht,Mr.,,ekki@dtnn.de,07431451,,,Stuttgart,,DE,73454 Joe,Stoyte,Mr.,,joe.stoyte@cryptosecurity.com,(212) 555-5555,,,New York,NY,US,66545 Jeremy,Pordage,Mr.,,jp4536@gmail.com,(919) 555-5555,,,Chapel Hill,NC,US,23456 Harold,Lloyd,Mr.,,lloyd.harold@airsoft.co.uk,1138761733,,,Leeds,,UK,27886 Joe,Propter,Mr.,,joe.propter@summerhotels.com,(626) 555-5555,,,Pasadena,CA,US,45676 Hans,Roebuck,Mr.,,roebuch@tennisplayers.com,(843) 555-5555,,,Myrtle Beach,SC,US,47764 Rina,Hauberk,Ms.,,rina.hauberk@franklin-motors.com,(212) 555-5555,,,New York,NY,US,66545 Luis,Obispo,Mr.,,obispol@techstuff.com,(302) 555-5555,,,Willmington,DE,US,35633 Pete,Boone,Mr.,,boone@aol.com,(202) 555-5555,,,Washington,DC,US,34563

Configuring Your Talend Job 

Specifically, I’m going to go through the steps:

Step 1: Create File delimited
Step 2: Create the job design
Step 3: Create the Database Connection
Step 4: Design the job
Step 5: Connecting the components
Step 6: Configure the tMap component
Step 7: Fix up CONTACT schema in mapping
Step 8: Execute the job

Step 1: Create File delimited

Open the Talend Open Studio for Data Integration and complete the following configurations

In the Repository tab on the left of the Talend Open Studio main screen:

1. Expand the Metadata node.

2. Right-click on File delimited and select Create file delimited

In the Name field, enter NuoDB (or any other name). Click Next to continue.

Click Browse and select the Contact.csv file. See the screenshot below for reference.

From the Format list, select your computer OS and click Next.

In the next dialog, set Field Separator to Comma, set Row Separator to Custom String ("\n"), and select CSV under Escape Char Settings. In the Preview area at the bottom, select the Set heading row as column names box and click Refresh Preview button to update the structure and data preview as shown in the screenshot below:

 

Select Next. In the Define the Schema dialog, set Name to CSVInput and ensure that the Type for each Column is String.

 

 

In particular, ensure the column Zip gets changed from Integer to String type.

 

Finally, select Finish.

Step 2: Create the Job Design

In the Repository tab on the left of the Talend Open Studio main screen, right click on Job Designs and select Create Job to open the New Job wizard.

Enter CSVNuoDB for Name and click Finish to close the wizard and create the job.

Step 3: Create the Database Connection:

In the Repository tab on the left of the Talend Open Studio main screen, right click on the Db Connections and select Create connection to create the NuoDB database connection. Enter NuoDB for the Name field and select Next. Set the fields as shown below:

                  DBType: General JDBC

                  JDBC Url: jdbc:com.nuodb://localhost/test

                  Driver jar: /opt/nuodb/jar/nuodbjdbc.jar

                  Class name: com.nuodb.jdbc.Driver

                  User name: dba

                  Password: goalie

In the end, check the connection by clicking the Check button.

Step 4: Design the Job 

Following are the steps to design a job

1. Define a source schema based on a delimited text file (CSV),

2. Define a target schema based on a relational table (NuoDB), and

3. Use tMap component to map the source fields to the target fields.

In the Repository tab on the left of the Talend Open Studio main screen, expand Metadata | File delimited | NuoDB 0.1 and drag CSVInput onto the created Job CSVNuoDB 0.1 screen. On the popup screen, select tfileInputDelimited as shown below and click Ok.

In the Repository tab on the left of the Talend Open Studio main screen, expand Metadata | Db Connections. Right click on the newly created NuoDB 0.1 and select Retrieve Schema.

In the Filter dialog, accept the defaults, and select Next. In the Add Schema dialog, expand USER and select CONTACT and Next.

The next dialog should look as follows. Select Finish.

In the Repository tab on the left of the Talend Open Studio main screen, expand Db Connections | NUODB 0.1 | Table schemas, select CONTACT and drag onto the Job CSVNuoDB 0.1 screen. In the Components popup dialog, select tJDBCOutput and select OK:

In the Palette panel to the right of the Talend Open Studio main screen, expand Processing. Select tMap and drag onto the Job CSVNuoDB 0.1 screen in between CSVInput and CONTACT.

Step 5: Connect Your Components

Right-click on CSVInput, select Row and then Main. You will now see a line drawn from the CSVInput component to the cursor. Move the cursor to the tMap_1 component and select it.

Now, right click on tMap_1 and select Row and *New Output* (Main). You will now see a line drawn from the tMap_1 component to the cursor. Move the cursor to the CONTACT component and select it. You will be asked for a new output name; enter nuodb. You will then be asked if you want to get the schema for the target component; select Yes. Your job screen should now look like:

After connecting the components save the job: File | Save

Step 6: Configure the tMap_1

On the Job screen, double click on the tMap_1 component to open the Map Editor. It should look like this:

First, make CONTACT_ID in the NuoDB schema a primary key by selecting the checkbox under Key in the bottom right panel.

Now, match each column from left to right by selecting it on the left and dragging it to the Expression column in the right panel. For example, select First_Name on the left and drag it to the Expression column next to FIRST_NAME on the right.

You could also select AutoMap to map as many columns automatically as possible.

The columns Street_Address_1Street_Address_2, and Zip still need to be mapped manually:

Once the mapping is complete, select OK to finish.

Step 7: Fix Up CONTACT Schema in Mapping.

We need to access the CONTACT table in the USER schema (which is the default). In the Job panel, double-click on CONTACT.  In the Component panel below, change the table name from "CONTACT" to "USER.CONTACT".  Again, save the job: File | Save.

Step 8: Execute the Job

Press F6 to run the job. Alternatively, in the Run (Job CSVNuoDB) panel, click the Run button.  A successful run will result in the following output in Job panel:

The CONTACT table is now populated in NuoDB and can be accessed via nuosql: 


 
$ /opt/nuodb/bin/nuosql test --user dba --password goalie
 
SQL> select * from CONTACT;
 
 CONTACT_ID  FIRST_NAME  LAST_NAME    EMAIL                                PHONE          STREET_ADDR1  STREET_ADDR2  CITY            STATE  POSTAL_CODE 
 ----------- ----------- ------------ ------------------------------------ -------------- ------------- ------------- --------------- ------ ------------
       David       King         dking3251@gmail.com                  (747) 555-5555                             Los Angeles     CA     13415 
       Sybil       Bedford      sybil91@msn.com                      (301) 555-5555                             Rockville       MD     91710 
       Jocelyn     Brooke       brooke@someconsultants.com           (212) 555-5555                             New York        NY     10941 
... 

NuoDB Automation Web Console:

http://localhost:8080 is the default URL for logging into NuoDB web console. On NuoDB automation console, enter username/password which is “domain/domain_password”.

Links for Further Reference:

TalendForge: How to install required modules in Talend
Talend Help Center: How to install external modules in the Talend Products

Add new comment