开源大数据etl工具

BigData ETL Tools

datatorrent(apex)

执行./datatorrent-rts-community-3.7.0.bin --help打印帮助项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[[email protected] install]$ sudo -u admin ./datatorrent-rts-community-3.7.0.bin 
-B /usr/install/datatorrent-rts -g 9094
-E DT_LOG_DIR=/home/admin/datatorrent
-E DT_RUN_DIR=/home/admin/run/datatorrent

Verifying archive integrity... All good.
Uncompressing DataTorrent Distribution 100%

DataTorrent Platform 3.7.0 will be installed under /usr/install/datatorrent-rts/releases/3.7.0

dtGateway can be managed with: /usr/install/datatorrent-rts/releases/3.7.0/bin/dtgateway [start|stop|status]
DTGateway is running as pid 24571 and listening on 0.0.0.0:9094

Please finish the remaining installation steps via DataTorrent Console at: http://dp0653:9094/

创建apex项目,并打包

1
2
3
4
5
6
7
8
9
10
11
name=salesapp
version=3.5.0

mvn -B archetype:generate
-DarchetypeGroupId=org.apache.apex
-DarchetypeArtifactId=apex-app-archetype
-DarchetypeVersion=$version
-DgroupId=com.example
-Dpackage=com.example.$name
-DartifactId=$name
-Dversion=1.0-SNAPSHOT

上传到datatorrent平台

StreamSets(https://github.com/streamsets/datacollector)

StreamFlow(https://github.com/lmco/streamflow)

CDAP(https://github.com/caskdata/cdap)