Python data science cheat sheet

As you might already know, I’ve been making Python and R cheat sheets specifically for those who are just starting out with data science or for those who need an extra help when working on data science problems.

Now you can find all of them in one place on the DataCamp Community.

You can find all cheat sheets here.

To recap, these are the data science cheat sheets that we have already made and shared with the community up until now:

Basics

Python Basics Cheat Sheet
Scipy Linear Algebra Cheat Sheet

Data Manipulation

NumPy Basics Cheat Sheet
Pandas Basics Cheat Sheet
Pandas Data Wrangling Cheat Sheet
xts Cheat sheet
data.table Cheat Sheet (updated!)
Tidyverse Cheat Sheet

Machine Learning, Deep Learning, Big Data

Scikit-Learn Cheat Sheet
Keras Cheat Sheet
PySpark RDD Cheat Sheet
PySpark SparkSQL Cheat Sheet

Data Visualization

Matplotlib Cheat Sheet
Seaborn Cheat Sheet
Bokeh Cheat Sheet (updated!)

IDE

Jupyter Notebook Cheat Sheet

Enjoy and feel free to share!

PS. Did you see another data science cheat sheet that you’d like to recommend? Let us know here!

# 2. Import libraries and modules

importnumpy asnp

importpandas aspd

fromsklearn.model_selection importtrain_test_split

fromsklearn importpreprocessing

fromsklearn.ensemble importRandomForestRegressor

fromsklearn.pipeline importmake_pipeline

fromsklearn.model_selection import GridSearchCV

fromsklearn.metrics importmean_squared_error,r2_score

importjoblib

# 3. Load red wine data.

dataset_url= 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'

data=pd.read_csv(dataset_url,sep=';')

# 4. Split data into training and test sets

y= data.quality

X=data.drop('quality',axis=1)

X_train,X_test,y_train,y_test= train_test_split(X,y,

test_size=0.2,

random_state=123,

stratify=y)

# 5. Declare data preprocessing steps

pipeline=make_pipeline(preprocessing.StandardScaler(),

RandomForestRegressor(n_estimators=100,

random_state=123))

# 6. Declare hyperparameters to tune

hyperparameters={'randomforestregressor__max_features':['auto','sqrt','log2'],

'randomforestregressor__max_depth': [None,5,3,1]}

# 7. Tune model using cross-validation pipeline

clf=GridSearchCV(pipeline,hyperparameters, cv=10)

clf.fit(X_train,y_train)

# 8. Refit on the entire training set

# No additional code needed if clf.refit == True (default is True)

# 9. Evaluate model pipeline on test data

pred= clf.predict(X_test)

print(r2_score(y_test,pred))

print(mean_squared_error(y_test,pred))

# 10. Save model for future use

joblib.dump(clf,'rf_regressor.pkl')

# To load: clf2 = joblib.load('rf_regressor.pkl')

Python data science cheat sheet

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội