Python packages

# Python packages # Favorite #Python commandline packages Extract CSV data from Excel files: ```shell pipx install --user --upgrade xlsx2csv ``` However, this fails after a few usages. Better to use headless Libre Office: ```shell /Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to csv "$file" ``` For assertionquery.py (Cascade): ```shell pip install --user --upgrade beautifulsoup4 ``` Essential developer packages: ```shell poetry add --dev pytest ty ruff black pip-audit ``` Essential web developer packages: ```shell brew install httpie ``` Or, on Linux: ```shell apt install httpie ``` - HTTPie desktop app: https://httpie.io/desktop Essential global python development packages (w/o Poetry or for pypi): ```shell pip install --upgrade twine build pipreqs ipython ``` ## Recommendations See [[PyEnv usage]], [[Conda usage]] and [[Poetry usage]] to set up packages, never use the system python/pip directly (with the only exceptions above *possibly*)! The [Project Boilerplate](Tech/Python/Project%20Boilerplate/An%20Overview.md) explains how to set up a state-of-the-art baseline for developing in Python. And [Jupyter environment](Jupyter%20environment.md) explains how to set up a Data Analytics and Machine Learning environment (typically, on top of that baseline). ### Advanced Python - `numpy cython` # high-performance Python packages that can put Python on par with C - `classes returns` # *dry-python* [type classes](https://classes.readthedocs.io/en/latest/pages/why.html) for [ad-hoc polymorphism with late binding](https://en.wikipedia.org/wiki/Ad_hoc_polymorphism) and [railway-oriented programming](https://returns.readthedocs.io/en/latest/pages/railway.html) to handle [None](https://returns.readthedocs.io/en/latest/pages/maybe.html) & [exceptions](https://returns.readthedocs.io/en/latest/pages/result.html), [context-based dependency injection](https://returns.readthedocs.io/en/latest/pages/context.html), [data pipelines](https://returns.readthedocs.io/en/latest/pages/pipeline.html), futures, and serialization, allowing you to implement [higher kinded types](https://returns.readthedocs.io/en/latest/pages/hkt.html) (a.k.a., [Container types](https://returns.readthedocs.io/en/latest/pages/container.html)). - **Do not use this unless you agree that your project & team will use the functional style and you are willing to RTFM.** (You were warned not to make a mess of your shiny new Python project.) - `tox tox-pyenv tox-venv behave hypothesis` # advanced testing libraries - The [tox](https://tox.wiki) ecosystem allows you to test libraries or frameworks that must support multiple versions of Python - [behave](https://behave.readthedocs.io/en/latest/) enables [Gherkin-style](https://behave.readthedocs.io/en/latest/philosophy/?highlight=gherkin#the-gherkin-language) behavior-driven development - [hypothesis](https://hypothesis.readthedocs.io/en/latest/index.html) supports [advanced testing techniques](https://hypothesis.readthedocs.io/en/latest/manifesto.html), such as property-based testing or fuzzing - `pipreqs` # requirements.txt generator that saves *all* packages your current Python environment can see ### Data Science - `pandas openpyxl statsmodels scipy` # statistics and data analytics - `matplotlib seaborn` # data visualization - `dvc[s3,azure,ssh]` # data versioning - `streamlit` # data science UI - `jupyterlab ipywidgets widgetsnbextension jupyter_contrib_nbextensions` # Jupyter - `jupytext` # handle Jupyter notebooks as regular plain-text files - `nbdime` # make Jupyter notebook git[-diff/merge]-friendly ### Developer tooling - `black ruff ty` # type-safe, clean development 101 - `pytest` # testing - `py-spy line_profiler pyre-check` # debugging ### Machine Learning - `pytorch torchvision pytorch-ignite tensorboard transformers` # deep learning stack - `dvc[s3,azure,ssh] wandb mlflow` # experiment, model, and data tracking - `jupyterlab jupytext ipywidgets widgetsnbextension jupyter_contrib_nbextensions` # Jupyter - `sklearn sklearn-pandas tensorflow` # machine learning - `gensim spacy sentence-transformers transformers` # NLP For **PyTorch**, check your CUDA installation and [go to the PyTorch website](https://pytorch.org/get-started/locally/) to find out what to download. For **Jupyter Extended**, call: `jupyter contrib nbextension install # optionally: --user` ### Web development - `fastapi` # asynchronous REST API framework - [Real Python tutorial](https://realpython.com/courses/python-rest-apis-with-fastapi/) - `django` # solid, but oppinionated web framework (recommended for newcomers) - `flask connexion[swagger-ui] SQLAlchemy` # advanced web framework backend - [Real Python tutorial](https://realpython.com/flask-connexion-rest-api/) - `pydantic` # data validation and serialization - `beautifulsoup4` # HTML and XML parsing If you are planning to build [Hypermedia-Driven Applications (HDAs)](https://htmx.org/essays/hypermedia-driven-applications/) with [HTMX](https://htmx.org/), also take a look a [PyHAT](https://github.com/PyHAT-stack/awesome-python-htmx). ### Utilities - `ipython` # better REPL than the default IDLE - `graphtage` # diffing tree-like data (HTML, YAML, JSON, etc.) - `beautifulsoup4 textract goose3` # text extraction To install **Graphviz** use either of the following, depending on wether you are using pip or conda: - `brew install graphviz` followed by `pip install graphviz` - `conda install graphviz` followed by `conde install python-graphviz`