Python Docstrings Crawler

“””
This short python script will open all files in a directory and all subdirectories, search through each file, pull all of the Strings that begin with # (docstrings) and write them to a json file with additional information such as folder level, folder name, file name, #Dosctring.

present the results in a json reader properly formatted using a tree diagram.

To run the file one should have the .py file in the same folder and run like this

python app.py

If no folder name is specified then start in the folder where this file is.
Do not include this file in the crawl.

Save the json file in the same directory.
To achieve the described functionality, we can create a Python script named `app.py`. Below is the implementation of the script:

“`python
import os
import json
import sys

def get_docstrings(file_path):
“””Extract all docstrings (strings starting with #) from a file.”””
with open(file_path, ‘r’, encoding=’utf-8′) as file:
docstrings = [line.strip() for line in file if line.strip().startswith(‘#’)]
return docstrings

def crawl_directory(root_folder):
“””Recursively crawl through the directory and its subdirectories.”””
data = []
for foldername, _, filenames in os.walk(root_folder):
if foldername != ‘.’ and not foldername.startswith(‘./’):
folder_level = foldername.count(os.path.sep)
for filename in filenames:
if filename.endswith(‘.py’) and filename != ‘app.py’:
file_path = os.path.join(foldername, filename)
docstrings = get_docstrings(file_path)
if docstrings:
file_data = {
‘folder_level’: folder_level,
‘folder_name’: os.path.basename(foldername),
‘file_name’: filename,
‘docstrings’: docstrings
}
data.append(file_data)
return data

def save_json(data):
“””Save the extracted data to a JSON file.”””
with open(‘docstrings.json’, ‘w’) as json_file:
json.dump(data, json_file, indent=2)

def main():
folder_to_crawl = sys.argv[1] if len(sys.argv) > 1 else ‘.’
extracted_data = crawl_directory(folder_to_crawl)
save_json(extracted_data)

if __name__ == ‘__main__’:
main()
“`

Make sure to place this `app.py` file in the top-level folder or in the folder you want to start the crawling from. Then, run the script as follows:

“`
python app.py
“`

If you leave “ blank, it will start in the folder where the `app.py` file is located.

The script will search for `.py` files (excluding `app.py`) in the specified directory and its subdirectories, extract the docstrings starting with `#`, and save the results in a JSON file named `docstrings.json`. The JSON file will contain the folder level, folder name, file name, and associated docstrings.

To view the JSON file in a tree diagram format, you can use various JSON viewer tools available online or JSON visualization libraries in Python. One popular Python library is `treelib`, which can help visualize the JSON data as a tree. You can use it as follows:

“`python
import json
from treelib import Node, Tree

# Load the JSON data
with open(‘docstrings.json’, ‘r’) as json_file:
data = json.load(json_file)

# Create a tree and populate it with the data
tree = Tree()
root_node = tree.create_node(tag=’Root’, identifier=’root’)
for item in data:
folder_level = item[‘folder_level’]
folder_name = item[‘folder_name’]
file_name = item[‘file_name’]
docstrings = item[‘docstrings’]
node_id = f'{folder_name}/{file_name}’
tree.create_node(tag=file_name, identifier=node_id, parent=’root’, data=docstrings)

# Display the tree
tree.show()
“`

Make sure to install the `treelib` library using `pip install treelib` before running the tree visualization code. This will display the tree diagram representation of the JSON data on the console.