You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python data science tools like pandas, NumPy, and scikit-learn are excellent. However, they use only one core out of the many cores in modern processors and are limited by your computer RAM. In this tutorial, you'll learn to scale your data science workflow to larger datasets+models using Dask, by leveraging the full potential of your laptop, all while staying in the PyData ecosystem. You will learn the fundamentals of parallel and distributed computing, when (and when not) to consider scaling, and work through some hands-on examples.
Brief Description and Contents to be covered
Dask is an open source library for parallel and distributed computing in Python. This tutorial is meant to be an introduction to this super broad and powerful library. We will:
Build vocabulary: What is parallel and distributed computing? What are clusters? What do we mean by "scaling to the cloud"?
Introduce Dask: What is Dask? How does it work? Where is it used?
Learn the Dask DataFrame API, which mimics the pandas API -- how are the two APIs similar, and where do they differ?
Talk about Dask's Distributed Scheduler and explore Dask's (very cool) diagnostic Dashboards
Briefly cover the low-level Dask Delayed API, which can parallelize any general Python code
Conclude with some best practices and discuss resources for learning more
Pre-requisites for the talk
Programming fundamentals in Python (e.g variables, data structures, for loops, etc.)
A bit of or are familiarized with NumpP, pandas, and scikit-learn
My name is Pavithra Eswaramorthy. I currently work as a Community Engagement Manager at Coiled, where I help support Dask users and contributors. I also contribute to the Bokeh project and I've worked on administrating Wikimedia Foundation’s open source outreach programs in the past. In my spare time, I enjoy a good book and hot coffee. :)
Are you comfortable if the talk is recorded and uploaded to PyData Dellhi's YouTube channel?
Yes
The text was updated successfully, but these errors were encountered:
Abstract (2-3 lines)
Python data science tools like pandas, NumPy, and scikit-learn are excellent. However, they use only one core out of the many cores in modern processors and are limited by your computer RAM. In this tutorial, you'll learn to scale your data science workflow to larger datasets+models using Dask, by leveraging the full potential of your laptop, all while staying in the PyData ecosystem. You will learn the fundamentals of parallel and distributed computing, when (and when not) to consider scaling, and work through some hands-on examples.
Brief Description and Contents to be covered
Dask is an open source library for parallel and distributed computing in Python. This tutorial is meant to be an introduction to this super broad and powerful library. We will:
Pre-requisites for the talk
Time required for the talk
1 hr
Link to slides
https://github.com/pavithraes/dask-mini-tutorial/blob/main/slides.pdf
Will you be doing hands-on demo as well?
Yes
Link to ipython notebook (if any)
https://github.com/pavithraes/dask-mini-tutorial
About yourself
My name is Pavithra Eswaramorthy. I currently work as a Community Engagement Manager at Coiled, where I help support Dask users and contributors. I also contribute to the Bokeh project and I've worked on administrating Wikimedia Foundation’s open source outreach programs in the past. In my spare time, I enjoy a good book and hot coffee. :)
Are you comfortable if the talk is recorded and uploaded to PyData Dellhi's YouTube channel?
Yes
The text was updated successfully, but these errors were encountered: