Running Ollama, Llava-Phi3 on a small AWS EC2 Instance for Image Analysis

Dhaval Nagar / CEO

Ollma is a great way to run models locally, better privacy, performance, and cost utilization. In this post we will are using a small EC2 instance to do simple image analysis with the custom Llava-Phi3 model.

Eventually, small language models will become ubiquitous and easy to integrate as part of regular applications - it will be efficient in terms of privacy, performance, and cost.

We processed a batch images captured at the recent AWS Summit 2024 Bengaluru, with a custom fine-tuned llava-phi3 model running on AWS EC2 t4g.large instance with just 8GB of RAM. Yes, this is not an ideal setup for production-grade applications but the model could still run on the lower configuration, without any GPU attached and still gave sufficiently useful output.

Llava-Phi3 Model

We used the recent Ollama llava-phi3 model for the analysis. Llava models are great for simple image analysis. We are planning to use Llava models for the pre-analysis for one of our pattern recognition use case.

Amazon EC2 Configuration

We decided to try the smallest possible instance for experimentation. Using Amazon Linux, t4g.large, 2vCPU and 8 GB RAM, we were able to run the Ollama, model, and were able to process the images. Images were stored in the S3.

This costs around $50 for a month

Due to the lowest possible configuration, it's very poor in terms of model performance. It takes a couple of minutes to process each image, given all the images are larger than 1024x1024 dimensions. At the time of writing this post, Ollama still does not support parallel request processing, but for this use case we are fine with sequential processing.


These images are from one of the recent presentation that I gave at the AWS Summit event.



It's amazing the speed at which newer models are available and accessible on wide variety of infrastructures. This experimentation was just to understand if we can utilize smaller models on constrained-infrastructure for simple use cases or not.

A lot has changed in the past one year, and may be, a year from now, a lot will change and models will run along with our regular applications.

More articles

How to Build an Android React Native Application Using AWS CodePipeline

In this blog post, we'll walk you through the process of using AWS CodePipeline, along with other AWS services, to build an Android React Native application. We'll be using an AWS CodeCommit Git repository to store our code and an S3 bucket to save the final generated APK.

Read more

Embracing the Serverless Cloud-Native Approach

The evolution of cloud computing has seen a significant shift towards serverless architectures, particularly for building robust, scalable, and cost-efficient applications. Let's explore this paradigm using a ridesharing app as our illustrative example, delving deeper into the technical intricacies and advantages, while leveraging a broader range of serverless services.

Read more

Tell us about your project

Our office

  • 408-409, SNS Platina
    Opp Shrenik Residecy
    Vesu, Surat, India
    Google Map