Sign in

Machine Learning Engineer and Evangelist @Flyte. Know more about me at

Highly performant means to shard and process large datasets in parallel

Photo by Parrish Freeman on Unsplash

MapReduce is a prominent terminology in the Big Data vocabulary owing to the ease of handling large datasets with the “map” and “reduce” operations. A conventional map-reduce job consists of two phases: one, it performs filtering and sorting of the data chunks, and the other, it collects and processes the outputs generated.

Consider a scenario wherein you want to perform hyperparameter optimization. In such a case, you want to train your model on multiple combinations of hyperparameters to finalize the combination that outperforms the others. …

Here’s why Striveworks chose Flyte as its workflow engine

Photo by Brenna Huff on Unsplash

By: Jake Neyer (Software Engineer at Striveworks)

At Striveworks, we are building a one-of-a-kind data science platform by integrating the building blocks that most data science teams already have in their tool belts and adding a few new ones to streamline the process to get from a business question to a production solution.

Photo by Julian Hochgesang on Unsplash

Run-time dependency is an important consideration when building machine learning or data processing pipelines. Consider a case where you want to query your database x number of times where x can be resolved only at run-time. Here, looping is the ultimate solution (manually writing by hand is unfeasible!). When there’s a loop in the picture, a machine learning or data processing job needs to pick up an unknown variable at run-time and then build a pipeline based on the variable’s value. This process isn’t static and has to happen on the fly at run-time.

Dynamic workflows in Flyte intend to…

Photo by Hu Chen on Unsplash

Machine Learning isn’t cheap. Chiefly, when one massive model or several multiple models are to be built, it eats up your time, money, and effort. Imagine wanting to revert to a previous model artifact if the current model isn’t performing as effectively as the previous model. If you do not have access to the previous model’s results, you would have to go all the way through modifying your parameters, training your models, and estimating the performance. A not-so-easy job, indeed!

This use case is just the tip of an iceberg. There are endless cases where you might want to version…

The ultimate guide to understanding cron jobs

Image by Alexey Lin from Unsplash.

You may often need to spin up your tasks (jobs) automatically without any manual intervention (e.g. when backing up your database). You might not be available to spin the tasks up or maybe you want the machine to pick them up and run them. Either way, you need them to run in the background at a specified time.

Cron jobs help in automating such tasks — more specifically, those repetitive tasks.

This guide will help you set up cron jobs on your Linux instance. …

An eBook for all those who’d like to know the intricacies of the tech industry

I’m Samhita Alla — a Tools Developer and Tester at Oracle. I like developing applications and writing blogs about Python and Machine Learning.

After learning the two most important concepts required for not flunking an interview — data structures and system design, and bombing the interviews despite trying to understand them, I’ve told myself that data structures concept isn’t going to rule my career. Data structures and I are sorts of adversaries, and I had plans of getting into top-notch companies without mastering it. So bit by bit, I’ve started looking into other technical areas that piqued my interest.


By Samhita Alla and Neil Conway

Photo by SOCIAL.CUT from Unsplash

Object Detection is an important task in computer vision. Using deep learning for object detection can result in highly accurate models, but developers can also run into several challenges. First, deep learning models are very expensive to train — even using GPUs, modern object detection models can take many hours of computation to train from scratch. Managing those GPUs and running workloads on many GPUs in parallel gets complicated fast. Second, object detection models typically have many hyperparameters. While algorithms exist to tune hyperparameters automatically, applying those algorithms in practice requires running hundreds or…

The Data Preparation Process

Photo by FOODISM360 from Unsplash

In the first part of this series, you’ve implemented various data loading techniques. You’ve seen how to load images, text, CSV files, and NumPy arrays into your Keras workspace.

Now to enable the model to make a rightful usage of your data, you’d have to convert it into an understandable format which could then be interpreted by a deep learning algorithm. To do this, you’d have to preprocess your data.

Importing Keras

First import the keras library into your workspace.

from tensorflow import keras

Now you’ll how to preprocess various kinds of data. So here’s the challenge. Every section has the task…

Data is vast, so are the data loading techniques

Photo by Nahil Naseer from Unsplash

Keras is a Deep Learning API of TensorFlow 2.0 used for easy and fast experimentation. It is simple to understand, flexible to extend and deploy, and powerful enough to build any neural network.

With the increase in the usage of deep learning to solve real-time problems, it has become quite a necessity to lessen the time consumed to build robust machine learning algorithms, i.e., the time taken from designing an algorithm to putting it into practice to generate the desired model has to be minimal.

Keras has been designed for this very purpose. It is a high-level deep learning API…

What is MLOps? Understand how it is going to help you in building an end-to-end Machine Learning Pipeline

Background Photo from Unsplash

Machine Learning (ML) has forayed into almost all principles of our lives, be it healthcare, finance or education; it’s practically everywhere! There are numerous machine learning engineers and data scientists out there who are well versed in modelling a machine learning algorithm. Nevertheless comes the challenge of deploying a machine learning model in production. Coding a machine learning algorithm is the tip of an iceberg. For a machine learning model to be deployable, configuration, automation, server infrastructure, testing, and process management have to be taken care of. In conventional software engineering, DevOps does the engineering and operations. It bridges development…

Samhita Alla

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store