Running Ollama, Llava-Phi3 on a small AWS EC2 Instance for Image Analysis

Dhaval Nagar / CEO

Ollma is a great way to run models locally, better privacy, performance, and cost utilization. In this post we will are using a small EC2 instance to do simple image analysis with the custom Llava-Phi3 model.

Eventually, small language models will become ubiquitous and easy to integrate as part of regular applications - it will be efficient in terms of privacy, performance, and cost.

We processed a batch images captured at the recent AWS Summit 2024 Bengaluru, with a custom fine-tuned llava-phi3 model running on AWS EC2 t4g.large instance with just 8GB of RAM. Yes, this is not an ideal setup for production-grade applications but the model could still run on the lower configuration, without any GPU attached and still gave sufficiently useful output.

Llava-Phi3 Model

We used the recent Ollama llava-phi3 model for the analysis. Llava models are great for simple image analysis. We are planning to use Llava models for the pre-analysis for one of our pattern recognition use case.

Amazon EC2 Configuration

We decided to try the smallest possible instance for experimentation. Using Amazon Linux, t4g.large, 2vCPU and 8 GB RAM, we were able to run the Ollama, model, and were able to process the images. Images were stored in the S3.

imgEC2Config
This costs around $50 for a month

Due to the lowest possible configuration, it's very poor in terms of model performance. It takes a couple of minutes to process each image, given all the images are larger than 1024x1024 dimensions. At the time of writing this post, Ollama still does not support parallel request processing, but for this use case we are fine with sequential processing.

Output

These images are from one of the recent presentation that I gave at the AWS Summit event.

imgDemo1

...

imgDemo2

Summary

It's amazing the speed at which newer models are available and accessible on wide variety of infrastructures. This experimentation was just to understand if we can utilize smaller models on constrained-infrastructure for simple use cases or not.

A lot has changed in the past one year, and may be, a year from now, a lot will change and models will run along with our regular applications.

More articles

Rapid Prototyping: Building MVPs with Serverless Architecture

In this blog post, we'll explore how serverless architecture can fast-track the way you build early-version of your application, allowing you to focus on what is important: delivering differentiated value to your users.

Read more

Celebrating a Decade of Innovation: Kubernetes and AWS Lambda

The last ten years have been a transformative period in the world of technology, marked by the emergence and maturation of two groundbreaking technologies: Kubernetes and AWS Lambda. As Kubernetes celebrates its 10th anniversary and AWS Lambda approaches the same milestone in coming months, it's an opportune moment to highlight on their substantial impact on application development and management.

Read more

Tell us about your project

Our office

  • 408-409, SNS Platina
    Opp Shrenik Residecy
    Vesu, Surat, India
    Google Map