The latest technology and digital news on the web

Powered by

5 free assets every data scientist should start using today

  • Tech
  • Career
  • engineer

5 free assets every data scientist should start using today

A accumulation of free assets to grow your data science/engineering career

Yitaek Hwang
Story by
Yitaek Hwang

Subscribe to data newsletters

Before jumping into accepted MOOCs or purchasing recommended books on Amazon, I started by subscribing to assorted data science and data engineering newsletters. At first, I was account every single commodity and taking notes, but over time abstruse to admit the important links shared in assorted newsletters and focus on a few. Newsletters are great to stay up to date with new tools, bookish research, and accepted blog posts shared by large internet giants (e.g. Google, Netflix, Spotify, Airbnb, Uber, etc).

Here are some of my admired newsletters:

I also subscribe to Data Machina, The Analytics Dispatch, and AI Weekly.

Craft your own data curriculum

Next, depending on your focus, you need to craft your data science, data engineer, or data analyst curriculum. This may accommodate acquirements how to affairs in Python or R if you are switching careers from a non-programming role. If budget is not a concern, abutting a bootcamp or taking courses from Udacity and Dataquest may be a great option to get online mentorship from industry experts. However, if you are price-conscious like I was, you can opt to follow open-source guides to create a free curriculum:

One caveat here is that simply taking these courses is not be enough. I about found most courses and tutorials online to focus on either the basal adeptness (e.g. math, statistics, theories) or simplified guides to walk through a atomic example. This is abnormally true in big data since tutorials tend to use a abate subset of the data to run locally instead of walking through a full assembly setup on the cloud.

To supplement the theory with astute scenarios, I beforehand joining Kaggle and using Google’s free tools such as Colab to convenance alive with large datasets. You can also search for Github repos from Udacity acceptance to see what a apogee activity might look like.

Network with experts for free

Any career guide would tell you that networking is important. But how does one go about award industry experts accommodating to mentor or simply answer some questions? Prior to the pandemic, one option was to attend meetups, but that befalling was abundantly bound to association in major tech hubs like the Bay Area, New York, or Seattle (at least in the US). The other option was to attend conferences or workshops focused on data science, apparatus learning, or data engineering. However, the tickets for these events were very expensive, making it abstract for individuals to attend after aggregation sponsorships.

As a startup agent living in Baltimore, my band-aid was to arrangement online by first watching free videos of sessions held by industry ally at tech conferences (e.g. AWS re:Invent, Microsoft Ignite, or Google Cloud Next) and abutting with the speakers on LinkedIn. Aside from the keynotes and the sessions on new cloud artefact releases, there are tons of sessions on best practices and architectonics discussions where a artefact administrator or a lead developer from an industry accomplice (e.g. Lyft, Capital One, Comcast) would present with a solutions artist at AWS/Azure/GCP on analytic a real botheration at scale. I would take notes on the affair and then reach out to all the speakers on LinkedIn with a catechism about their artefact or an architectural accommodation mentioned in the talk. Surprisingly, almost all the speakers were accommodating to acknowledge and abide to chat with me, even though I was just a recent grad alive at an alien startup at the time.

Over time, I steadily grew my arrangement this way and had the added account of blockage up to date with new accessories and industry trends across all the major cloud providers. Considering the accepted bearings with COVID-19 and the connected shift appear basic events, this may become the new norm in networking instead of accessory conferences to meet other stakeholders in person.

Get certified

While cloud certifications are by no means validation for adeptness or data knowledge, I still think there’s value in beforehand in certifications. This is abnormally true if you are aiming to be a data artist as cloud adeptness is acute for active assembly workloads. Even for data scientists, acceptable accustomed with cloud accessories enables you to absolutely focus on allegory the data instead of disturbing to load and clean data at scale.

Another underrated aspect of accepting certified is the arrangement opens up. There are very active associates on LinkedIn, decidedly in tech consulting, announcement about new opportunities in cloud data positions. Some recruiters post anon in LinkedIn groups for acceptance holders only. Acceptance alone won’t lead to a new job or position, but having those badges makes it easier to start a chat with others or recruiters. Personally, I landed a few small consulting projects after accepting the certifications.

Solve real problems

Finally, as with any engineering discipline, you will only beforehand with practice. If you are already alive as a data scientist or data engineer, accepting real-world acquaintance should not be an issue. For others attractive to transition, many will acclaim architectonics a portfolio. But where do you start? Alive with the archetypal Titanic dataset for adaptation allocation or absorption for the iris dataset is likely to hurt your portfolio than help you.

Instead, try to use public Github projects as inspiration. Based on the arrangement you accumulated from LinkedIn via tech sessions and certifications, look at what others are building. Feel free to use examples from Udacity or Coursera projects on Github. Then mix in real datasets from Google Research, Kaggle, or search for an absorbing dataset and start architectonics solutions for real problems.

If you are absorbed in a sector or a specific company, try to search for public datasets and build a sample project. For example, if you are absorbed in fintech, try using Lending Club’s public loan data to build a loan approval algorithm. The better takeaway from alive with real datasets is that these are very messy and noisy compared to ones provided in bookish settings.

Appear August 8, 2020 — 09:00 UTC

Hottest related news

No articles found on this category.