Contract Intelligence

Streamlit and a Bert implementation for contract analysis and clause extraction.

  • Tools used: Streamlit, Python
  • Category: DS, NLP, Transformers
  • Year: May 2022

Software Engineering for Data Science

Continuous Integration and Continuous Delivery applied to a Data Science project.

  • Tools used: Github actions
  • Category: DS, CI/CD
  • Year: March 2021 (In progress)

Command Line Data Science

Introducciont to bash scripting to perform Data Science tasks.

  • Tools used: Bash, Command Line.
  • Category: DS, EDA
  • Year: Jan 2021 (In progress)

NLP Sentiment Analysis Handbook

A Step-By-Step Approach to Understand TextBlob, NLTK, Scikit-Learn, and LSTM networks applied to Sentiment Analysis.

  • Tools used: TextBlob, NLTK, Scikit-Learn, LSTM on Interactive Python.
  • Category: NLP, Interactive Python
  • Year: March 2021 (In progress)

NLP as a [micro]service

This project is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.

  • Tools used: Docker Compose, Streamlit, SpaCy, Transformers (T5, Roberta and Bert)
  • Category: WebApp, NLP, Microservices
  • Year: Oct 2020

NLP as a Service

I used spaCy package to identify the entities on a body of text. The functions available in spaCy are Token and lemmas, Name Entity Recognition, Sentiment Analysis and Summarization.

  • Tools used: Python and spaCy, Streamlit, NLTK
  • Category: WebApp, NLP
  • Year: Sep 2020

HPAC Hot-Path-Analytic-Container

I used Docker containers to develop an Analytics solution in IoT able to run closed to the origin of the data. I used Lambda architecture to differentiate hot data from historical data. The container has built-in visualizations and a dashboarding software, this allows exploration of past data and visualization of new data. Also, Node-Red is included and has the function of ETL Extraction Transformation and Load and an MQTT broker.

  • Tools used: Node-Red, Grafana, InfluxDb, MQTT, Docker
  • Category: Visualization, Edge Computing, IIoT
  • Year: Jun 2019

CRISP-DM and Rossmann sales data

Explore end-to-end the data mining process using CRISP-DM to analyses and reveals insights from sales transactions. CRISP-DM stands for the cross-industry process for data mining. This methodology provides a structured approach to planning a data mining project. Is included a container available with the data and the Jupyter Notebook.

  • Tools used: Jupyter notebook, Python, Docker
  • Category: CRISP-DM, EDA, Datamining, Time Series Forecasting
  • Year: March 2020

SAP HANA-Express Workshops

Series of snippers and learning material developed to follow the workshops presented to Datagroup clients. The snippers contain the SQL statement to call the different algorithms available in SAP HANA.The analytical engine of SAP HANA can perfome Text Processing and Text Analytics on 32 languages.

  • Tools used: SQL, SAP HANA Express, Virtual Box
  • Category: Text processing, SAP HANA, Algorithms
  • Year: Feb 2020

COVID-19 Dataset Analysis

This notebook focused in Named Entity Recognition (NER) is an application of Natural Language Processing (NLP) that processes and understands large amounts of unstructured human language. Also known as entity identification, entity chunking and entity extraction.

  • Tools used: Jupyter notebook, Python SpaCy package
  • Category: NER - Named Entity Recognition, NLP
  • Year: Mar 2020

Sentiment Analysis using the Twitter API and the Elastic Search Stack

Student project I lead with The University Ulm and Datagroup GmbH. The solution is a container with tools and a dashboard to perform sentimental analysis on tweeter data. The typical use case is brand sentiment on social networks.

  • Tools used: Node.js, Nginx, ELK Stack, API, Docker
  • Category: Text Processing, Dashboarding
  • Year: July 2020

Predictive maintenance use case

This repository contains a Jupyter notebook with EDA exploratory data analysis and a predictive model for maintenance. This is especially useful for an IIoT scenario.

  • Tools used: Python, Jupyter notebook
  • Category: Predictive maintenance, EDA,
  • Year: Aug 2020

Plotly for Data Science

Repository with visualization examples using the Plotly library. This is a good library for interactive dashboards and works very well with Flask in Webapps.

  • Tools used: Python Plotly package, Jupyter Notebook
  • Category: Visualization, EDA
  • Year: May 2019
Photo credits: Unsplash and Pixabay