Explicit understanding of python package building (structuring) -part 2

nipun deelaka
Analytics Vidhya
Published in
8 min readJan 18, 2021

--

This is the second part of the article series for building python packages. In the previous article, we discussed how to version control your codebase and the concept of documentation. In this article, we are going to talk about how to convert that code into an actual package that works like a regular python library you work with.

There are many ways to convert your code into a package. Building the package and creating the package structure are completely different things. Because you have to use a packaging tool to convert your code into a package, but to that correctly happens the code base be in the right shape that compatible with the building tool.

First of all, Why there are several packaging architectures? isn’t there a universal one?. Actually, NO. the reason is that architecture depends on the extent of the artifact you are building and the usage of the package. So, there are a few general layout patterns :

command-line application layout

This is the type of application that can download and install and then import the code as an external module or directly run in the terminal.

  • python module layout
  • python package layout
  • python library layout

p.s. : Above names for layout/architectures are not standard, but those are commonly used terminologies. especially, setuptools library supports to create the above 3 structures easily.

package_folder/

├── .gitignore
├── src/
├── logs/
├── docs/
├── bin/
├── data/
├── LICENSE
├── README.md
├── requirements.txt
├── test/
└── setup.py

This is the bare-minimum layout for any python package. After you build the model there be additional folders. such as; dist/, build/.

.gitignore — As we discussed in the previous article, our version controlling system be the git and we use GitHub as a repository controller. so, there are some files we don’t want to upload to the remote repository. like; logs, bin, data, dist, build. Here we use .gitignore file to lose the git track for those files. there are templates for this file on most of the languages and you can add or delete lines there according to your preference. also there are rules and syntax to edit these files.

src/ — This is the folder that contains your python package codes. The structure of this folder varies on the application layout you have chosen.

logs/ — log files are saved in this folder.

LICENSE — This file shows, what is the license of the package and facilities and restrictions the user has provided by the publisher. it’s a good practice to use a well-known license than a self-created license. like; GNU, Apache license, MIT license, creative commons license.

README.md — Usually, this is a markdown file [ because it’s easy to edit and widely supported.] with the extension of .md or LaTeX file [ pure text files created by python] with the extension of .tex. this is the docs file we discussed in the previous article.

requirements.txt — Here, go your package dependencies. if you are familiar with python virtual environments [pip, conda], you will remember a single command to generate this file from CLI. pip freeze > requirements.txt . However, that not good practice to use that in this context. because of that command adds secondary dependencies to the requirement file. generally, the following method is going over thesrc/ dir files and add all the import libraries in those files manually. [ unittest, tox, like libraries, must not be included in this file also. because those are libraries uses only by developers.]

test/ — If you have written unit tests for your packages then those files go under this folder.

setup.py — This is the most crucial files in the folder directory. because this file says, all most all the details about the package to the package builder. there are several acceptable developer practices to create this file. also, some parts of the file change according to what is your application layout.

If the package is a complex one then you will create the following directories as well.

docs/ — As we discussed in the previous article this folder is for external comprehensive documentation for the library. generally, this folder contains .md, .HTML, .css files. this is the source to your page host on Github pages, MkDocs, or readthedocs sites

bin/ — Here go all the executable files you have used in the package implementation. if your package is a pure python one, there is nothing to put here. But if you have used some C or C++ codes then their executable files must save in here.

data/ — If you have been used some text file to save variables or parameters or data, then those files go under this folder

python module layout

In python import, a code written in an external file causes to name that imported file as “module” in the current namespace. The word namespace is a bit special. namespace plays a major role in languages like python because they use lexical scoping as the scoping rule and functions are first-class objects. as a summary when you type from file_name import function that function clone into the current working variable space instead of processing on the original variable space.

More or less, here our concern is to create the python package as a module.

pgk_name/

├── src/
│ ├── module_1.py
│ └── module_2.py

├── tests/
│ ├── module_1_test.py
│ └── module_2_test.py

├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py

As initially said, there mainly change src/ and setup.py .

Files in the src/ directory are codes for package implementation. it’s good practice to add all the local operations in files like module_1.py under if condition block defined by if __name__ == "__main__" : . Matter of fact, all codes under this block only run when the file itself executed only. so, this prevents running some code when lexical cloning function.

However, how the package behavior depends on the setup.py file configuration as well.

package_dir={'': 'src'},py_modules = ["module_1", "module_2"],

This is the simplest way to do that when you are using setuptools as the packaging module.

python package layout

Actually, packages are a little bit different in structure as well as in behavior from python modules. You can just say import numpy at the top and after that anywhere in the file, you can access any function by just using dot operation like numpy.array() . There is not cloning of function that happens in modules.

If you are familiar with OOP concepts, this is exactly the same as the packaging in the OOP, the only difference is that this package is portable.

If you are using IDE like pycharm, eclipse — pydev you can create a folder structure for a package with few clicks. however, it’s simple.

project_name/

├── pkg_name/
│ ├── __init__.py
│ ├── file_1.py
│ └── internal_pck/
│ ├── __init__.py
│ ├── int_file_1.py
│ └── int_file_2.py

├── tests/
│ └── pkg_name_test
│ ├── file_1_tests.py
│ └── internalpkg_name
│ ├── int_file_1_tests.py
│ └── int_file_2_tests.py

├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py

The main difference is that there is no src/ folder and there is a folder that has the name of your package and starts with __init__.py file always. So, why is that? actually, here you are creating a machine with an interface that can interact with the external world. Also, you can define what machine do, how they do those by writing files in the pkg_name/ directory other than __init__.py file. That file is to define the interface of the package, to say what function can access by an external file when using this package.

from .file_1 import function_1
from .internal_pkg import function_2
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
__all__ = [
'function_1',
'function_2'
]

This is a bit complex example because there is a package inside the package. create the pkginterface with encapsulating the internal package is crucial because else this example goes under our next topic.

Not only that changes in src/ folder but also setup.py must change appropriately.

packages=['pkg_name','pkg_name/internal_pck'],

While building the wheel or the egg file setuptool identifies all the files in the package and imports them into the final build. If there are no internal packages just define the package name is enough. If there are internal packages and the main package uses some functions provided by those internal packages, defining those package directories is essential.

python library layout

This is an extended version of python packaging. Because here the python library contains more than one python package those works independently of each other [mostly] and they have their own interfaces to work with the external. A great example of such a python library is scikit-learn , because scikit-learnlibrary contains packages like Regression, Classification, Preprcess, Ensemble, etc.

There are two main different ways of the folder structure to define, but setup.py file configuration is the same as the above.

project_name/

├── lib_name/
│ ├── __init__.py
│ ├── file_1.py
│ ├── pkg_1/
│ │ ├── __init__.py
│ │ ├── file_1.py
│ │ └── file_2.py
│ │
│ └── pkg_2/
│ ├── __init__.py
│ ├── file_3.py
│ └── file_4.py

├── tests/

├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py

What makes it different from the above example is that here you refer to internal packages interfaces [ __init__.py]from the main library interface. So, then while run time user can access the internal function by using dot operation. however, defining test/ is more complicated than the above structures.

project_name/

├── pkg_1/
│ ├── __init__.py
│ ├── module_1.py
│ └── module_2.py
├── pkg_2/
│ ├── __init__.py
│ ├── module_1.py
│ └── module_2.py

├── tests/

├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py

This is kind of provides many packages under one name. So, here user feels there are different packages but developer package testing is easier than the above method. However, the most formal way to build a python library is that one defined first.

links:

PyPA guide for packaging

difference between namespace package & regular packages

PEP 420 — Namespace packaging guide

We just converted one topic I mentioned in the previous article under what we are going to discuss. So there is a whole list of topics ahead of us:

  • documentations and version controlling
  • packaging architectures
  • python decorators
  • python generators
  • python context managers
  • object-oriented design pattern usages
  • package testing — unit test — without mocking / mocking
  • exception handling and debugging
  • CI/CD pipeline building
  • automate CI/CD pipeline
  • future compatibility

See you soon with another pythonic article!!.

Thank you.

--

--