startproject
Description
startproject can be used to create a data science project directory. This helps in easy team collaboration, rapid prototyping, easy reproducibility and fast iteration.The directory structure is by no means a globally recognized standard, but was inspired by the folder structure created by the Azure team (https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview)
Project Structure
data: Stores data used for the experiments, including raw and intermediate processed data.
processed: stores all processed data files after cleaning, analysis, feature creation, etc.
raw: Stores all raw data obtained from databases, file storages, etc.
outputs: Stores all output files from an experiment.
models : Stores trained binary model files. This are models saved after training and evaluation for later use.
src: Stores all source code including scripts and notebook experiments.
scripts : Stores all code scripts usually in Python/R format. This is usually refactored from the notebooks.
modeling: Stores all scripts and code relating to model building, evaluation and saving.
preparation: Stores all scripts used for data preparation and cleaning.
ingest: Stores all scripts used for reading in data from different sources like databases, web or file storage.
notebooks : Stores all Jupyter notebooks used for experimentation.
Examples
Start project from terminal.
If you have datasist installed, you can start a new project from the terminal as shown below:
Start project with Editor
To help us improve this documentation, visit the datasist-doc repository
Last updated