Mini Projects

Why Small Projects > Large Projects

small project preview

Introduction

Your ultimate goal as an aspiring data scientist is to land a job in the industry. What is the fastest way to get there? Should you focus on big impressive portfolio projects or small quality projects? In my opinion, small high quality projects far surpass big impresive projects for 2 main reasons - bigger projects face decreasing marginal returns on many fronts, and bigger projects are inherently riskier. Additionally, you gain more skills by taking on high quality small projects.

How big is big?

Consider a 1 week (40hr) project to be a small portfolio project, a 1 - 3 week project as a medium sized project, and a big portfolio project encompasses anything greater than 3 weeks of full time work.

Resume and Github Portfolio

As the size of a project increases, it's impact decreases. In other words, the difference between a 1 and 2 day project is extremely large while a 1 and 2 week project is relatively small in regards to value on a resume. Additionally, a concise, focused project can demonstrate your skills just as effectively as a large, sprawling one. To make matters worse, employers may not necessarily look deeply into your portfolio project. Some employers may place more or all of their emphasis on the technical interview to vet a candidates technical skillset. Unfortunately, the neural network you built from scratch may be less beneficial than initially perceived.

The same is true for your github profile. It only really serves one purpose: help you showcase your skills and land an interview. Some may argue that larger more impressive projects will boost both your resume and your github portfolio making you both more likely to land a job by increasing the odds that a recruiter and hiring manager will move you forward in the process. However, hiring managers and recruiters sift through tons of resumes every day and only really get a chance to scan over your resume. How your project outcomes are communicated will matter far more for this phase of the job search process.

But what about during a technical interview when you have to present a personal project? In my experience, your ability to communicate the projects goals and outcomes is more important than how large the project was. Additionally, if you take on a 1 week project over a 4 week project, that leaves you 3 additional weeks to prepare for interviews and improve your communication skills during these technical interviews. The marginal benefits of your preparation will far exceed the benefits of a larger project.

Learning Tradeoff

Building a portfolio project is not solely about landing a job. How much you learn is certainly an important factor when picking a project. Knowledge gained from a large project can almost always be replicated in a smaller one. Let's say your project was to build an ETL pipeline that extracted all of the weather data from a weather API, transformed the data into your desired schema, and stored the data in a SQL database. Your goal is to learn apply your knowledge of database design, async data fetching in python with asyncio, and, of course, how to build a production ready ETL pipeline. In my opinion, you should scope your project to only take 1 week. Design a database schema that is relatively small. Use only a few of the endpoints available in the API. Focus on being able to clearly explain how the pipeline works, perhapse it runs every day from a cron job. Finally, be able to explain clearly how the data collected daily saves your ML team valuable time that could be better spent optimizing the weather predictions, which will lead to a competitive adantage.

How to Pick Data Science Portfolio Projects?

I hope by now I have convinced you that you should prefer smaller projects over larger ones. If not, you can still use some of these tips to determine which projects to take on.

  1. Pick a domain of interest. Don't worry about not having any prior knowledge within this domain, as you will gain the knowledge through your projects. It simply needs to be exciting to you.

  2. Look for sources of data within that domain. This can be APIs, websites, files, videos, images, you name it. Look at the big companies within the industry and what developer APIs or publicly available data they may offer for good inspiration.

  3. Once you have found a source of data that is a good candidate for an interview, ask yourself why would anyone care about this data? Could it save them time or money?

If you are able to reasonably answer that question, then its time to prove it by either exploring the data and telling a story the data, automating a process with the data, or forecasting a feature within the data. Scope your project to be about 40 - 80 hours. Make sure that you time box the project and take at most 80 hours. If you hit your time limit without any tangible results, ask what went wrong? Do a retrospective on the project and try again with a new dataset. Learn and grow with every experience.

Conclusion

Data science is about solving problems, and your portfolio should reflect your ability to do so efficiently and effectively. Mini projects allow you to showcase a wide range of skills, learn faster, and build a GitHub profile that’s both diverse and approachable. Focus on completing small, interesting projects over large sprawling ones. Pick a domain of expertise and stay up to date within that industry. Learn one project at a time, and if a project fails, be sure to reflect. By following this approach, you’ll not only maximize your chances of landing your dream role, but you will be a seasoned data scientist in no time!

Previous
lsof