Contributing to datasist
Thanks for considering contributing to Datasist!
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome here.
Have something in mind and not sure, chat with us **~~here**~~
For first time contributors, you can find/raise issues on our GitHub “issues” page. Once you’ve found an interesting issue, and have an improvement in mind, next thing is to set up your development environment.
Working with the code
Now that you have an issue you want to fix, enhancement to add, or documentation to improve, you need to learn how to work with GitHub and the datasist code base.
Version control, Git, and GitHub
The datasist code is hosted on GitHub. To contribute you will need to sign up for a free GitHub account. We use Git for version control to coordinate collaboration of contributors to this project. If you're new to Git and GitHub, the Official GitHub page is a great learning resource.
Forking the Datasist Repository
You will need your own fork to work on the code. Go to the datasist project page and hit the Fork button. After forking the repo,Click on the code drop-down and copy the link, you will use this later.
Create a development environment
It is advisable to create a development environment to test all code changes. This helps isolate datasist settings from other environments on your computer.
You can create a new virtual environment with conda. First, make sure you have either Anaconda or miniconda installed.
Confirm you have Anaconda installed:
Create new virtual environment and install Python 3.5 or a later version:
Activate your environment:
Next, you will clone your forked repository to your local machine. Enter the following command:
then paste the url you copied earlier, next to the git clone command as shown below and press Enter
This creates the directory datasist and connects your repository to the upstream (main project) repository.
Next, change directory to datasist:
build:
and install:
Test that datasist was successfully installed by starting a Python REPL:
then import datasist:
If there you encounter no error after importing datasist, you're ready to start contributing. Now you can launch up your favorite IDE and start implementing your changes.
However, if you encounter a ModuleNotFound error while importing datasist package - as captured below, take note of the missing package name, exit the python interactive prompt and run on your command line.
Proceed to import datasist after the missing package is successfully installed.
Docstrings Guidelines
Docstrings are important parts of coding and we encourage you to write clear and concise docstrings for your functions, methods and classes. Docstrings written for your code are automatically used to generate the datasist documentation.
Our guidelines for writing docstrings are:
Define what the function does.
Define all parameter types and what they do.
State the return values.
Use the correct spacing and indentation as this affects the documentation generated automatically.
Add Example usage.
Below is a sample docstrings for a function that adds two Pandas DataFrame together:
Writing tests
We strongly encourages contributors to write test for their code. Like many other python packages, datasist uses pytest to test its' code.
All tests should go into the tests sub-directory and placed in the corresponding script. The tests folder contains some current examples of tests, and we suggest looking through these for inspiration.
The easiest way to verify that your code is correct is to explicitly construct the result you expect (expected), then compare it to the actual result (output).
Using pytest
Here, we show an example of a test case written for the drop_redundant function in the feature_engineering module. This test is placed in the test_feature_engineering.py file inside the tests folder. The function is shown first below;
The corresponding test for the function above is:
Running the test case
To run the test case, navigate the tests/ subfolder and run the following command.
Learn more about pytest here.
Adding your changes to Datasist and Committing your code
Once you’ve made and saved your changes, you can check the changes by entering the underlisted command in our terminal:
Next, Add the changes you've made by entering:
Next, commit your changes using:
Pushing your changes
Now, you want your changes to appear publicly on your GitHub page, you can push to your forked repo with:
Review your code and finally make the pull request
If everything looks good, you are ready to make a pull request. A pull request is how code from a local repository becomes available to the Datasist community and can be reviewed and eventually merged into the master version. To submit a pull request:
Navigate to your updated repository of datasist on GitHub
Click on the Pull Request button
Write a description of your changes in the Preview Discussion tab
Click Send Pull Request.
This request will be reviewed by datasist maintainers and where it meets the requirements,it will be merged into the master branch.
Hooray! You're now a contributor to datasist. Now go bask in the euphoria!
Last updated