As you might already know, I’ve been making Python and R cheat sheets specifically for those who are just starting out with data science or for those who need an extra help when working on data science problems.
Now you can find all of them in one place on the DataCamp Community.
You can find all cheat sheets here.
To recap, these are the data science cheat sheets that we have already made and shared with the community up until now:
Basics
- Python Basics Cheat Sheet
- Scipy Linear Algebra Cheat Sheet
Data Manipulation
- NumPy Basics Cheat Sheet
- Pandas Basics Cheat Sheet
- Pandas Data Wrangling Cheat Sheet
- xts Cheat sheet
- data.table Cheat Sheet [updated!]
- Tidyverse Cheat Sheet
Machine Learning, Deep Learning, Big Data
- Scikit-Learn Cheat Sheet
- Keras Cheat Sheet
- PySpark RDD Cheat Sheet
- PySpark SparkSQL Cheat Sheet
Data Visualization
- Matplotlib Cheat Sheet
- Seaborn Cheat Sheet
- Bokeh Cheat Sheet [updated!]
IDE
- Jupyter Notebook Cheat Sheet
Enjoy and feel free to share!
PS. Did you see another data science cheat sheet that you’d like to recommend? Let us know here!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | # 2. Import libraries and modules importnumpy asnp importpandas aspd fromsklearn.model_selection importtrain_test_split fromsklearn importpreprocessing fromsklearn.ensemble importRandomForestRegressor fromsklearn.pipeline importmake_pipeline fromsklearn.model_selection import GridSearchCV fromsklearn.metrics importmean_squared_error,r2_score importjoblib # 3. Load red wine data. dataset_url= '//archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv' data=pd.read_csv[dataset_url,sep=';'] # 4. Split data into training and test sets y= data.quality X=data.drop['quality',axis=1] X_train,X_test,y_train,y_test= train_test_split[X,y, test_size=0.2, random_state=123, stratify=y] # 5. Declare data preprocessing steps pipeline=make_pipeline[preprocessing.StandardScaler[], RandomForestRegressor[n_estimators=100, random_state=123]] # 6. Declare hyperparameters to tune hyperparameters={'randomforestregressor__max_features':['auto','sqrt','log2'], 'randomforestregressor__max_depth': [None,5,3,1]} # 7. Tune model using cross-validation pipeline clf=GridSearchCV[pipeline,hyperparameters, cv=10] clf.fit[X_train,y_train] # 8. Refit on the entire training set # No additional code needed if clf.refit == True [default is True] # 9. Evaluate model pipeline on test data pred= clf.predict[X_test] print[r2_score[y_test,pred]] print[mean_squared_error[y_test,pred]] # 10. Save model for future use joblib.dump[clf,'rf_regressor.pkl'] # To load: clf2 = joblib.load['rf_regressor.pkl'] |