Running Ollama, Llava-Phi3 on a small AWS EC2 Instance for Image Analysis
Dhaval Nagar / CEO
Ollma is a great way to run models locally, better privacy, performance, and cost utilization. In this post we will are using a small EC2 instance to do simple image analysis with the custom Llava-Phi3 model.
Eventually, small language models will become ubiquitous and easy to integrate as part of regular applications - it will be efficient in terms of privacy, performance, and cost.
We processed a batch images captured at the recent AWS Summit 2024 Bengaluru, with a custom fine-tuned llava-phi3 model running on AWS EC2 t4g.large instance with just 8GB of RAM. Yes, this is not an ideal setup for production-grade applications but the model could still run on the lower configuration, without any GPU attached and still gave sufficiently useful output.
Llava-Phi3 Model
We used the recent Ollama llava-phi3 model for the analysis. Llava models are great for simple image analysis. We are planning to use Llava models for the pre-analysis for one of our pattern recognition use case.
Amazon EC2 Configuration
We decided to try the smallest possible instance for experimentation. Using Amazon Linux, t4g.large, 2vCPU and 8 GB RAM, we were able to run the Ollama, model, and were able to process the images. Images were stored in the S3.
Due to the lowest possible configuration, it's very poor in terms of model performance. It takes a couple of minutes to process each image, given all the images are larger than 1024x1024 dimensions. At the time of writing this post, Ollama still does not support parallel request processing, but for this use case we are fine with sequential processing.
Output
These images are from one of the recent presentation that I gave at the AWS Summit event.
...
Summary
It's amazing the speed at which newer models are available and accessible on wide variety of infrastructures. This experimentation was just to understand if we can utilize smaller models on constrained-infrastructure for simple use cases or not.
A lot has changed in the past one year, and may be, a year from now, a lot will change and models will run along with our regular applications.