Python data science cheat sheet

As you might already know, I’ve been making Python and R cheat sheets specifically for those who are just starting out with data science or for those who need an extra help when working on data science problems.

Now you can find all of them in one place on the DataCamp Community.

You can find all cheat sheets here.

To recap, these are the data science cheat sheets that we have already made and shared with the community up until now:

Basics

Python Basics Cheat Sheet
Scipy Linear Algebra Cheat Sheet

Data Manipulation

NumPy Basics Cheat Sheet
Pandas Basics Cheat Sheet
Pandas Data Wrangling Cheat Sheet
xts Cheat sheet
data.table Cheat Sheet [updated!]
Tidyverse Cheat Sheet

Machine Learning, Deep Learning, Big Data

Scikit-Learn Cheat Sheet
Keras Cheat Sheet
PySpark RDD Cheat Sheet
PySpark SparkSQL Cheat Sheet

Data Visualization

Matplotlib Cheat Sheet
Seaborn Cheat Sheet
Bokeh Cheat Sheet [updated!]

IDE

Jupyter Notebook Cheat Sheet

Enjoy and feel free to share!

PS. Did you see another data science cheat sheet that you’d like to recommend? Let us know here!

# 2. Import libraries and modules

importnumpy asnp

importpandas aspd

fromsklearn.model_selection importtrain_test_split

fromsklearn importpreprocessing

fromsklearn.ensemble importRandomForestRegressor

fromsklearn.pipeline importmake_pipeline

fromsklearn.model_selection import GridSearchCV

fromsklearn.metrics importmean_squared_error,r2_score

importjoblib

# 3. Load red wine data.

dataset_url= '//archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'

data=pd.read_csv[dataset_url,sep=';']

# 4. Split data into training and test sets

y= data.quality

X=data.drop['quality',axis=1]

X_train,X_test,y_train,y_test= train_test_split[X,y,

test_size=0.2,

random_state=123,

stratify=y]

# 5. Declare data preprocessing steps

pipeline=make_pipeline[preprocessing.StandardScaler[],

RandomForestRegressor[n_estimators=100,

random_state=123]]

# 6. Declare hyperparameters to tune

hyperparameters={'randomforestregressor__max_features':['auto','sqrt','log2'],

'randomforestregressor__max_depth': [None,5,3,1]}

# 7. Tune model using cross-validation pipeline

clf=GridSearchCV[pipeline,hyperparameters, cv=10]

clf.fit[X_train,y_train]

# 8. Refit on the entire training set

# No additional code needed if clf.refit == True [default is True]

# 9. Evaluate model pipeline on test data

pred= clf.predict[X_test]

print[r2_score[y_test,pred]]

print[mean_squared_error[y_test,pred]]

# 10. Save model for future use

joblib.dump[clf,'rf_regressor.pkl']

# To load: clf2 = joblib.load['rf_regressor.pkl']

Toplist mới

Top 9 tập bản đồ lớp 8 bài 31 2023

6 tháng trước

Top 6 kết quả thi hsg đà nẵng 2022 2023

6 tháng trước

Top 9 tủ nhựa đài loan 4 cánh 3d 2023

6 tháng trước

Top 9 chất khí có thể làm mất màu dung dịch nước brom là: a. so2. b. co2. c. o2. d. hcl. 2023

6 tháng trước

Top 8 tìm việc làm tiện, phay bảo q7 2023

6 tháng trước

Top 3 tôi xuyên thành tiểu kiều the của lão đại phản 2 2023

6 tháng trước

Top 9 đổi mới phong cách, thái độ phục vụ của cán bộ y tế hướng tới sự hài lòng của người bệnh 2023

6 tháng trước

Top 2 bài the dục phát triển chung lớp 6 2022 2023

6 tháng trước

Top 3 bài giảng vũ điệu sắc màu (lớp 4) 2023

6 tháng trước

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề