Skip to main content

Reads a text file that has varying numbers of headers (e.g., when skiprows) and columns and returns a pandas DataFrame object.

Project description

semistructuredtxt2df

This Python package reads a text file with unknown rows of headers (e.g., when skiprows can vary) and returns a pandas DataFrame object. With this package, you can "skip first couple of lines while reading lines in Python" (https://stackoverflow.com/questions/9578580/skip-first-couple-of-lines-while-reading-lines-in-python-file).

License

GitHub

Installation

Install my-project with npm

pip install semistructuredtxt2df

Usage/Examples

For example, this package can read this sample file as a pandas DataFrame by skipping headers that can vary.

Arguments, data types, and default values if any

  • filepath_or_buffer: str
  • column_names: str, list, tuple, or set
    • The first row in the text file that contains all of the elements in it is recognized as the dataframe column names, so you do not have to specify all the columns unless you want to. The order of the column names does not matter.
  • max_rows_to_try: int = None
  • separator: str = ","
  • encode: str = "utf-8"
  • is_commented: bool = False

Sample file (sample_data/sample_text_file.csv)

"Game Is FIFA World Cup"
"Year Is 2022"
"Group Is E"
"Timestamp: 1669866143"

"This is a random comment."
"This is also a random comment."
"Country, is also included in this row, but this row will be skipped."

"Country","MP","W","D","L","GF","GA","GD","Pts"
"Spain",1,1,1,0,8,1,7,4
"Japan",2,1,0,1,2,2,0,3
"Costa Rica",2,1,0,1,1,7,-6,3
"Germany",2,0,1,1,2,3,-1,1

Sample code

# Import the package
from semistructuredtxt2df import read_txt

# Try reading sample files with different column_names
df1 = read_txt(r"sample_text_file.csv", "Country")
df2 = read_txt(r"sample_text_file.csv", ["Country"])
df3 = read_txt(r"sample_text_file.csv", ["Country", "Pts"])

# Check if df1, df2, and df3 are the same
print(df1.equals(df2))
print(df1.equals(df3))

print(df1)

Output

True
True
      Country  MP  W  D  L  GF  GA  GD  Pts
0       Spain   1  1  1  0   8   1   7    4
1       Japan   2  1  0  1   2   2   0    3
2  Costa Rica   2  1  0  1   1   7  -6    3
3     Germany   2  0  1  1   2   3  -1    1

Process finished with exit code 0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semistructuredtxt2df-0.0.8.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

semistructuredtxt2df-0.0.8-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file semistructuredtxt2df-0.0.8.tar.gz.

File metadata

  • Download URL: semistructuredtxt2df-0.0.8.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for semistructuredtxt2df-0.0.8.tar.gz
Algorithm Hash digest
SHA256 a81d7d2ebe20543a9bc0463913d7f47b7c8e4e06bf07da2a1d60413d139c1a31
MD5 af1b15ed9f4c9d899dd5cbe659d6b977
BLAKE2b-256 647c8d31b7209913c7bb921b5a0cb08c8f567aabb07ff50e531c51370c6e169a

See more details on using hashes here.

File details

Details for the file semistructuredtxt2df-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for semistructuredtxt2df-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 4d6c67a013f091bec41d65ff41259abd3581369a11945b0b69d336aba763ea33
MD5 ef962bc3f9a738c61d21780eee63e917
BLAKE2b-256 cb08ed29b24cb4612d7fcdf846ea3629724897382b6925e913efb0d46dea8798

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page