All I am writing here is my personal summary for this blog on medium, and I also reshape the order in which you should improve on.
Why production level code?
Ability to write a production-level code is one of the sought-after skills for a data scientist role— either posted explicitly or not.
The production level code has several features.
- Modular code: Decompose the code into low-level, medium-level and high-level functions. Larger functions are harder to debug.
- Version control: use git with branch.
- Readability: comments/docstring and self-explanatory function and variable names.
- Unit testing: the unitest module in Python.
- Logging: Records only actionable information such as critical failures during runtime and structured data such as intermediate results that will be later used by the code itself.
- Code optimization: better space and time complexity.
- Compatibility with ecosystem
For a new graduate without much experience, you definitely should write your code into smaller functions, make the code more readable for future maintenance (add doc-string and more comments), add the unit test of the functions/class and put it on Github.
The best way for me to practice all that was to rewrite one or more of pet projects into a small python package, web application (Flask in Python), restful API hosted on AWS.
It will practice your coding skills and better visibility when applying for jobs.
Here is another python guide targeted for data scientists, which I think is very practical and awesome.