Skip to content

lp.read_csv

Reads a CSV file and returns a LazyFrame.

Parameters:

Name Type Description Default
path_or_buffer str | StringIO | TextIOBase

Path to the CSV file or a buffer-like object.

required
header bool | int | None

Indicates whether the CSV file has a header row. Can be a boolean or the row number of the header. Defaults to None.

None
compression str | None

Compression type of the file. Options are 'none', 'gzip', or 'zstd'. Defaults to None.

None
sep str | None

Character that separates columns. Alias for 'delimiter'. Defaults to None.

None
delimiter str | None

Character that separates columns. Defaults to None.

None
dtype dict[str, str] | list[str] | None

Specifies column data types. Can be a dictionary with column names and types, or a list of types. Defaults to None.

None
na_values str | list[str] | None

Values to interpret as NA/NaN. Defaults to None.

None
skip_rows int | None

Number of lines to skip at the start of the file. Defaults to None.

None
quote_char str | None

Character used for quoting. Defaults to None.

None
escape_char str | None

Character used for escaping. Defaults to None.

None
encoding str | None

File encoding. Defaults to None.

None
parallel bool | None

Enables or disables parallel reading. Defaults to None.

None
date_format str | None

Format to use when parsing dates. Defaults to None.

None
timestamp_format str | None

Format to use when parsing timestamps. Defaults to None.

None
sample_size int | None

Number of rows to sample for type inference. Defaults to None.

None
all_varchar bool | None

If True, assumes all columns are of type VARCHAR, skipping type inference. Defaults to None.

None
normalize_names bool | None

Normalizes column names to lowercase and replaces spaces with underscores. Defaults to None.

None
null_padding bool | None

If True, adds null padding to text columns. Defaults to None.

None
names list[str] | None

List of column names to use. Defaults to None.

None
line_terminator str | None

Character that indicates the end of a line. Defaults to None.

None
columns dict[str, str] | None

Dictionary specifying column names and types in the CSV file. Defaults to None.

None
auto_type_candidates list[str] | None

List of types for the parser to consider during type inference. Defaults to None.

None
max_line_size int | None

Maximum size of a line in the CSV file. Defaults to None.

None
ignore_errors bool | None

If True, ignores errors during CSV reading. Defaults to None.

None
store_rejects bool | None

If True, stores rejected lines during reading. Defaults to None.

None
rejects_table str | None

Name of the table to store rejected lines. Defaults to None.

None
rejects_scan str | None

Path to store the scan of rejected lines. Defaults to None.

None
rejects_limit int | None

Limit of rejected lines before stopping the read. Defaults to None.

None
force_not_null list[str] | None

List of columns that should not be interpreted as NULL. Defaults to None.

None
buffer_size int | None

Size of the read buffer. Defaults to None.

None
decimal str | None

Decimal separator for numbers. Defaults to None.

None
allow_quoted_nulls bool | None

If True, allows conversion of quoted values to NULL. Defaults to None.

None
include_filename bool | str | None

If True or a string, includes the filename in the output. Defaults to None.

None
hive_partitioning bool | None

Enables Hive partitioning. Defaults to None.

None
union_by_name bool | None

If True, unions files by column name. Defaults to None.

None
hive_types dict[str, str] | None

Dictionary specifying Hive types for columns. Defaults to None.

None
hive_types_autocast bool | None

If True, automatically casts Hive types. Defaults to None.

None
parse_dates list[str] | None

List of column names to parse as dates. Defaults to None.

None

Returns:

Name Type Description
LazyFrame LazyFrame

A LazyFrame containing the data from the CSV file.

Example:

import lazy_pandas as lp
df = lp.read_csv('data.csv', header=True, sep=',', dtype={'column1': 'INTEGER', 'column2': 'VARCHAR'})
df.head()