Reads a text file that has varying numbers of headers (e.g., when skiprows) and columns and returns a pandas DataFrame object.
Project description
semistructuredtxt2df
This Python package reads a text file with unknown rows of headers (e.g., when skiprows can vary) and returns a pandas DataFrame object. With this package, you can "skip first couple of lines while reading lines in Python" (https://stackoverflow.com/questions/9578580/skip-first-couple-of-lines-while-reading-lines-in-python-file).
License
Installation
Install my-project with npm
pip install semistructuredtxt2df
Usage/Examples
For example, this package can read this sample file as a pandas DataFrame by skipping headers that can vary.
Arguments, data types, and default values if any
- filepath_or_buffer: str
- column_names: str, list, tuple, or set
- The first row in the text file that contains all of the elements in it is recognized as the dataframe column names, so you do not have to specify all the columns unless you want to. The order of the column names does not matter.
- max_rows_to_try: int = None
- separator: str = ","
- encode: str = "utf-8"
- is_commented: bool = False
Sample file (sample_data/sample_text_file.csv)
"Game Is FIFA World Cup"
"Year Is 2022"
"Group Is E"
"Timestamp: 1669866143"
"This is a random comment."
"This is also a random comment."
"Country, is also included in this row, but this row will be skipped."
"Country","MP","W","D","L","GF","GA","GD","Pts"
"Spain",1,1,1,0,8,1,7,4
"Japan",2,1,0,1,2,2,0,3
"Costa Rica",2,1,0,1,1,7,-6,3
"Germany",2,0,1,1,2,3,-1,1
Sample code
# Import the package
from semistructuredtxt2df import read_txt
# Try reading sample files with different column_names
df1 = read_txt(r"sample_text_file.csv", "Country")
df2 = read_txt(r"sample_text_file.csv", ["Country"])
df3 = read_txt(r"sample_text_file.csv", ["Country", "Pts"])
# Check if df1, df2, and df3 are the same
print(df1.equals(df2))
print(df1.equals(df3))
print(df1)
Output
True
True
Country MP W D L GF GA GD Pts
0 Spain 1 1 1 0 8 1 7 4
1 Japan 2 1 0 1 2 2 0 3
2 Costa Rica 2 1 0 1 1 7 -6 3
3 Germany 2 0 1 1 2 3 -1 1
Process finished with exit code 0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file semistructuredtxt2df-0.0.8.tar.gz
.
File metadata
- Download URL: semistructuredtxt2df-0.0.8.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a81d7d2ebe20543a9bc0463913d7f47b7c8e4e06bf07da2a1d60413d139c1a31 |
|
MD5 | af1b15ed9f4c9d899dd5cbe659d6b977 |
|
BLAKE2b-256 | 647c8d31b7209913c7bb921b5a0cb08c8f567aabb07ff50e531c51370c6e169a |
File details
Details for the file semistructuredtxt2df-0.0.8-py3-none-any.whl
.
File metadata
- Download URL: semistructuredtxt2df-0.0.8-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d6c67a013f091bec41d65ff41259abd3581369a11945b0b69d336aba763ea33 |
|
MD5 | ef962bc3f9a738c61d21780eee63e917 |
|
BLAKE2b-256 | cb08ed29b24cb4612d7fcdf846ea3629724897382b6925e913efb0d46dea8798 |