1. Description Anomaly detection, also known as outlier detection, is a group of problems which purpose is to find out the samples perform differently from the majority. It is applied in so many domains: fraud detection in finance and insurance, default detection in...
Beyond Data
Les collaborateurs d’Invivoo voyagent à travers différents domaines d’expertises à la conquête de nouvelles connaissances et expériences à partager.
How Big Data can contribute in reducing Banks’ churn rate?
According to a study conducted by Efma, one in two customers is willing to change banks in the next six months. The reason being? The lack of personalized products and services. At a time when competition between banks is raging, it is essential that they change their...
Explore and analyze your data with apache zeppelin – part 2
Welcome back to our second part about Apache Zeppelin. In ‘EXPLORE & ANALYSE YOUR DATA WITH APACHE ZEPPELIN - Part 1’ our previous post, we introduced Apache Zeppelin as one of the best Big Data tools to your Data Analytics use cases and shared details about...
Creating your first Apache Airflow DAG
Throughout the past few years, Apache Airflow has established itself as the go-to data workflow management tool within any modern tech ecosystem. One of the main reasons for which Airflow rapidly became this popular is its simplicity and how easy it is to get it up...
Apache Airflow: What is it and why you should start using it
In this data-driven era, the number of open-source Big Data technologies rose exponentially in a matter of a few years. Because of this multitude of options, it results in the introduction of a vast range of patterns and architectures to store, process, and visualize...
Structured Streaming in Spark
Streaming processing is a set of techniques used to extract information from unbounded data (a type of dataset theoretically infinite in size) Some examples of streaming are device monitoring, fault detection, billing...
Discovering recommendation systems
WHAT ARE RECOMMENDATION SYSTEMS ? We all wonder how Amazon or Netflix came to such "power" and success? How can Netflix know about our movie preferences? How did Amazon know the unconditional Games of Thrones’ fan that I am, that I love The North Face and Geography?...
Recommendation engine : from collective to personalized
The recommendation engine is at the heart of the business strategy of all e-commerce giants. For example, 35 percent of Amazon's e-commerce revenue is generated by its referral engine, according to a McKinsey study. We see every day the carousels of products that we...
Kafka: the Big Data streaming platform
In modern information systems, we are confronted with ever-increasing volumes of data requiring to be processed in real time. However, the point-to-point connections commonly used do not allow easy loading scalability. Data producing services have a strong link with...
Paris Big Data conference: Couchbase that other NoSQL database, deserves your attention
During this year’s edition of the Paris Big Data conference, amid an infinite set of booths filled with flashy promises of performance and scalability, one company stood out from the rest. Couchbase, the document-oriented NoSQL database, came to the conference armed...
Why is Spark Fast? And how to make it run faster? Part I: the Spark ABC
This is the first article in a serie that'll discuss the mechanisms behind Apache Spark and how this data-processing Framework disrupted the Big Data ecosystem. While giving you key recommendations to fine-tune your Spark jobs. Spark does things fast. That has always...
Why is Spark Fast? And how to make it run faster? Part III. Getting Spark to the next level
This is the third and last article of the Spark-centered series. Reading the first and second parts is highly recommended before going through this one, in which we’ll discuss how you could optimize Spark jobs from your end of the spectrum. Throughout the past article...
Why is Spark Fast? And how to make it run faster ? Part II: The Spark Magic
This is the second article in the "Why is Spark Fast? And how to make it run faster "series". The serie discusses the mechanisms behind Apacha Spark and how this data-processing Framework disrupted the Big Data ecosystem. Reading the first part beforehand is...
Notebooks are The Missing Piece of the Big Data Revolution
More than a decade ago, what is now commonly known as the Big Data era started with the emergence of Hadoop. Since then, a multitude of technologies were introduced to fulfill multiple tasks within the Hadoop ecosystem, with capabilities ranging from processing data...
Monitoring and detection of anomalies with ELK
MEASURING PERFORMANCE INDICATORS WITH ELK Monitoring and measuring IT applications’ performance indicators are a major challenge for companies. The evolution of technologies around qualification, storage and processing big data as well as machine learning has made it...
5 pre-requisites before launching a Big Data project
A BIG DATA PROJECT IS FULL OF PITFALLS Many pitfalls! Who are not only on IT! During my visit to the Big Data Salon, held in Paris on March 6 and 7, 2017, I was able to attend to 14 feedback, informative and operational at a time. I propose a summary of this visit...