DevTech101

DevTech101

BDA

Oracle BDA iPython, Notebook and Jupytar Configuration

How to install iPython/Notebook on an Oracle BDA Install the below packages pip install ibackports.ssl_match_hostname-3.5.0.1.tar.gz ipython-1.2.1.tar.gz pyzmq-15.2.0.zip tornado-3.2.1.tar.gz add parcels in CDH GUI parcel address Remote Parcel Repository URLs https://repo.continuum.io/pkgs/misc/parcels/ donwload, distrubite, activate System startup script /usr/local/notebook/bin/start_notebook.sh #!/bin/bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rh/python27/root/usr/lib64 export PATH=/opt/rh/python27/root/usr/bin:$PATH # Notebook export PYSPARK_DRIVER_PYTHON=ipython export PYSPARK_DRIVER_PYTHON_OPTS=”notebook –NotebookApp.open_browser=False –NotebookApp.ip=’*’ –NotebookApp.port=8880″ # Jupyter #export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/jupyter …

Oracle BDA iPython, Notebook and Jupytar Configuration Read More »

To increase SPARK kyroserializer.buffer.max

To address the errors – like the one below 6 WARN scheduler.TaskSetManager: Lost task 0.3 in stage 2.0 (TID 16, n06.domain.com): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 3. To avoid this, increase spark.kryoserializer.buffer.max value. Set In CDH under SPARK, look for spark-defaults.conf, add the below. One of the two values below shuld …

To increase SPARK kyroserializer.buffer.max Read More »

ELK and Kafka Zookeeper Configuration on Oracle BDA

First lets configure Kafka Note: In Kafka 8.1 (and lower) you can not delete a topic, make sure to update to a latter version then 8.2+ Modify the below Kafka settings message.max.bytes: 5000000 replica.fetch.max.bytes: 5 max.connections.per.ip: 500 To make kafka listing on all interfaces Added the following to the Advanced Snippet text field: listeners=PLAINTEXT://0.0.0.0:9092,SSL://127.0.0.1:9093 #Note: …

ELK and Kafka Zookeeper Configuration on Oracle BDA Read More »

How to configure beeline for hive on remote client

http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html cd /usr/java/latest/jre/lib/security/ ;unzip UnlimitedJCEPolicyJDK7.zip;mv UnlimitedJCEPolicy/* . cd /opt;wget http://archive.cloudera.com/cdh5/cdh/5/hive-0.12.0-cdh5.0.0.tar.gz tar zxf hive-0.12.0-cdh5.0.0.tar.gz;mv hive-0.12.0-cdh5.0.0 hive export HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.4.0 wget wget -O hive-conf.zip http://host.domain.com:7180/cmf/services/5/client-config unzip hive-conf.zip export HIVE_HOME=/export/home/cognos/hive-0.12.0-cdh5.0.0 export HIVE_CONF_DIR=/export/home/cognos/hive-conf alias hive=$HIVE_HOME/bin/hive export HADOOP_USER_CLASSPATH_FIRST=true export PATH=/usr/gnu/bin:$PATH:/export/home/cognos/hive-0.12.0-cdh5.0.0/bin:/opt/hadoop-2.6.0-cdh5.4.0/bin Source https://docs.oracle.com/cd/E63064_01/doc.42/e63062/users.htm#BIGUG328

How to load/add a jar file in Hive – Installing JSONSerDe in a CDH environment

How to install, configure Json capability in CDH distribution The Cloudera(CDH) distribution is not coming with json capability, in order to use that you will need to add/install your own There are 3 options to add/load a jar file if you use the cloudera(CDH) distribution add yourjar.jar Create an .hiverc, an example in the link …

How to load/add a jar file in Hive – Installing JSONSerDe in a CDH environment Read More »

Hadoop hdfs (BDA) setting extended ACL’s

How to set extended ACL’s hdfs dfs -setfacl -R -m default:user:hive:rwx /data/report_data Check extended ACL’s hadoop fs -getfacl /data/report_data Set the mask to get the full default inherent access hadoop fs -setfacl [-R] -m mask::rwx /data/report_data Reference https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html